I need a version of this which swears loudly when an assumption it made turns out to be wrong, with the volume/passion/verbosity correlated with how many tokens it's burned on the incorrect approach.
i didnt realize i needed the volume scaling with tokens burned as much as i do now xD
imagine the screaming when it confidently refactors something for 40k tokens and then finds out the thing it deleted was load bearing
Does this actually relate to the code quality being observed by the agent? The readme isn't very clear on that IMO. I have some projects I'd love to try this out on, but only if I am to get an accurate representation of the LLMs suffering.
So it is left up to agent to decide.
So looks like it's mainly looking for FIXME/TODO etc comments, deep nesting, large files, broad catches, stuff like that.