Glyph: Scaling Context Windows via Visual-Text Compression

(github.com)

24 points | by foruhar 103 days ago

3 comments

phildougherty 101 days ago
Can someone compare/contrast with deepseek-ocr?
kburman 101 days ago
This looks very promising. Are there any downsides or potential gotchas?
ghoul2 100 days ago
I asked this question on another post and was downvoted, trying again: don't we lose the "contextualization" that LLM embeddings do (embedding on Token X contains not just information about X, but also of all tokens that came before X in the context, causing different embedding for "flies" in "time flies like an arrow" vs "fruit flies like a banana")?
The image embeddings, as I currently understand, are just pixel values of a block of pixels.
What am I missing?