Seeing to Generalize: How Visual Data Helps LLM Reasoning
Text-only LLMs cheat by memorizing token positions until long contexts break them. Visual training forces permanent symbolic binding, making VLMs fundamentally better at pure text retrieval.
Text-only LLMs cheat by memorizing token positions until long contexts break them. Visual training forces permanent symbolic binding, making VLMs fundamentally better at pure text retrieval.