CC-Canary: Detect early signs of regressions in Claude Code

(github.com)

22 points | by tejpalv 3 hours ago

5 comments

evantahler 1 hour ago
I feel like asking the thing that you are measuring, and don’t trust, to measure itself might not produce the best measurements.
[-]
- john_strinlai 1 hour ago
  "we investigated ourselves and found nothing wrong"
Retr0id 34 minutes ago
What is "drift"? It seems to be one of those words that LLMs love to say but it doesn't really mean anything ("gap" is another one).
[-]
- idle_zealot 31 minutes ago
  I believe it's businessspeak for "change." Gap is suittongue for "difference."
aleksiy123 1 hour ago
Interesting approach, I've been particularly interested in tracking and being able to understand if adding skills or tweaking prompts is making things better or worse.
Anyone know of any other similar tools that allow you to track across harnesses, while coding?
Running evals as a solo dev is too cost restrictive I think.
[-]
- FrankRay78 14 minutes ago
  See the very last section in this doc for how I minimise token usage and track savings, all three plugins co-exist fine: https://github.com/FrankRay78/NetPace/blob/main/docs/agentic...
wongarsu 1 hour ago
See also https://marginlab.ai/trackers/claude-code-historical-perform... for a more conventional approach to track regressions
This project is somewhat unconventional in its approach, but that might reveal issues that are masked in typical benchmark datasets
tejpalv 3 hours ago
[dead]