Square Minus Square – A coding agent benchmark

(aedm.net)

25 points | by Topfi 42 days ago

1 comments

wariatus 36 days ago
Have you tried to equip those agents with an access to grounded vision model to analyse that image?
In my experience most models can’t understand such imput properly
I am now experimenting with Molmo2 and it looks promising