Square Minus Square – A coding agent benchmark

(aedm.net)

17 points | by Topfi 6 days ago

1 comments

  • wariatus 3 hours ago
    Have you tried to equip those agents with an access to grounded vision model to analyse that image?

    In my experience most models can’t understand such imput properly

    I am now experimenting with Molmo2 and it looks promising