I apparently use Claude differently the most people who talk about using Claude on the internet.
I’ll typically have a bunch of short sessions over the course of a day. Anytime I start a task that isn’t going to very directly benefit from the existing context I start fresh.
I don’t find a lot of benefit in explaining the project overall to Claude — I’ve deleted a lot of that explanation from my Claude.md because it didn’t seem to impact much.
I typically start a task by pointing it to 1-2 files and giving it some explanation of what I want done, and it
figures it out.
Basically never hit context window limits or compactions, and can’t remember the last time I hit a 5 hour or a weekly limit.
This is my usage pattern and I agree it works really well. I start almost every conversation by asking Claude to read, not write. Then once it's explored a particular slice I let it rip.
This takes a couple minutes (and I suppose I'm spending tokens each time), but sessions rarely reach compaction length and I like that I'm not trying to keep a whole separate pile of docs in sync.
I do something similar. I've recently started having a few "starting point" files to re-explain common context (less than thirty lines per markdown file, usually) that I can point the agent at at the start of a new session, each tightly scoped to a certain domain and/or task type. That's been nice to avoid repeating myself, without the side-tracking or over-aggressive biasing-towards-previous-conversations that I've seen happen if I use long sessions or let it try to decide on its own what to pull in from larger files or trees of files. Sometimes I'll tell it to update that file based on new info from a current task, but I keep tight control over what gets pulled into task start context.
They aren't really "explaining the project" either, but more module- or task-specific preferences, hand reference pointers, or other things like "there are mixed examples of how to do certain things in this project, prefer X to Y." I use a write-everything-twice approach. After I find myself having to correct an implementation because it didn't figure out one of these things on its own from the existing code, I'll add an entry. That also avoids bloating things with "I think this is relevant" compared to "I have noticed that this is necessary."
I keep doing this because it lets me experiment with different approaches to problems without risk of it fixating on things from a previous abandoned attempt, and particularly because sometimes I'm wrong and I haven't found the agent harnesses particularly reliable at taking my word for it from a POV of "yes I know I said we need xyz earlier, but let's please entirely forget about that."
Claude Code has a big system prompt, most of which isn't necessary for the more recent models. (Codex too.)
I've been running Claude and GPT in my own agent harness. The main difference I notice is that tasks take about 7x longer to complete if they're run in the official Claude or Codex harness (and cost me 7x more).
You would think this would lead to increased correctness, but that doesn't seem to be the case. Today I tested both side by side. They both resulted in data loss. (I had a backup obviously.)
GPT running in the official harness did a bunch of extra tests and double checking, and ended up with the same result regardless (it permanently deleted a bunch of documentation).
All else being equal, I like getting my data loss 7x faster and cheaper ;)
I work in a completely different manner. I have a chief of staff agent, which is one Claude code instance that orchestrates work across all my projects simultaneously with sub agents. In this way the agent helps me context switch and drive work towards everything I’m working on. I only use 1 session, I compact when necessary with todos and on file system files to track wip
I work similar, but still have architecture mds for a few selected cross cutting features. As useful for human readers as for AI.
Normally I do it exactly as you say, point at a few files, but if I know these features are involved I point at the corresponding mds instead. Its a shortcut for me to type less.
I keep a plan file that records what I’m doing, how I’m doing it, and what I’ve done so far - every time I sit down to have another session, I feed Claude the plan file first, then tell it to begin on the next unchecked todo. Every time I run into something new, I tell it to add it to the todos in the plan file.
It basically takes care of itself, or at least as close as it can.
Honestly sounds like you’re not doing anything difficult. If I’m doing easy tasks it’s fine but if you need to do a major architectural project that spans 3 codebases and 2 clouds you’re gonna have a hard time without substantial context/memory management.
I might be missing out on something but I never had to explain my project. Just give it a task, or if you really want to, type it quickly, then you are good to go.
I can’t imagine this being worth optimizing. The issue is never that Claude can’t figure out what the projects is about…
Am I missing something or does this project not solve a problem most regular people have?
What I've finally come to understand is that there is a large amount of people who are now able to write and use software through claude and coding agents. Those people have different needs than more traditional software engineers who have more knowledge because even best llms often need steering, correction, and refactoring suggestions when iterating on code and it's fie to let it lose context because exactly like you said, you tell it to read file and then have to regurgitate the understanding so you can correct or validate it before continuing. Compared to people who are basically trusting claude to do it all for them, and for them when they see the model gets confused because of a new session or can't seem to pick back up, they don't know what's going on or whats right or wrong because it's mostly latent to them, so they are much more keen on this context management and planning because recovering from derailments is much harder.
There are many other posts here which agree with you. Filling context with what you think the model needs adds nothing and possibly just inflates context which is harmful.
A good method seems to be only make a skill or memory when the LLM gets something wrong, or if you actually observe it's always doing the same step and you can get the model to the same place with less tokens.
> Filling context with what you think the model needs adds nothing and possibly just inflates context which is harmful.
The solution that I've developed is, let the agent figure things out efficiently, without inflating the context. I have what I call a smart repo that better explains this at
I’ve basically never edited a skill or memory myself. I make the LLM do it as part of the /handoff skill before I clear a session. That also includes pruning existing skills/memories and resolving any drift.
It's funny because with so many different implementations of /handoff, I wonder if anyone has benchmarked handoff-and-resume to figure out what the best performance implementation looks like.
Depending on the scale of the project and the complexity of the specific thing you need to work on, it's advantageous to bring specific context into the session instead of hoping the model will connect the right dots.
I use Deepseek and just as it to generate a state.md file with a summary of the project every time I've reached a goal or milestone. I then take a few minutes to edit this and add in or take out details. Between token pricing and generous cache discounts This has proved very efficient so far, I reset every few hours of work and bypass the muddle of having too many priorities or over-extrapolation from un-nuanced instructions I gave at an earlier stage.
I do think that this project is interesting in several ways - prioritizing privacy, minimizing spend, and using objective semantic markers to sift and consolidate the key takeaways from long sessions. I'd like to try it on my cline project history. But while it would make a great recording of project history, I wonder if a lot of it doesn't end up detailing blind alleys the project went down and had to back out of.
Generally when this happens I feel that it's due to vague specification on my part, or avoiding architectural decisions I didn't want to deal with and implicitly inviting the model to implement a lowest-common-denominator solution.
I have a documentation vault in my repo, organised using Obsidian (bases, wikilinks, frontmatter, etc), and accessible using obsidian-cli (and related Claude skills, thanks kepano). I started the repo agreeing with Claude the structure and front matter of documents and how to edit and read, all stored in a markdown file in the vault, and a specific instruction in the CLAUDE.md file on when and how to access it. Any updates require consistency sweeps. Any decisions made and agreed during implementation of new features get added to the vault.
It's been amazing how I just don't need to explain the project anymore. An empty context and a few sentences on what I'd like to spec next and the LLM finds what it needs in the vault.
Yes I've found this to be the case as well. I've also found that it's useful to split these documents into three broad distinguished classes: goals, design, and idioms.
The goal docs provide directionality - helping the agent generally make consistent design decisions. Scoping constraints and stuff are useful to put here, but also feature goals a general idea of where the project is heading long-term (even if none of the items there are on the implementation roadmap anytime soon). It keeps more of the sessions aligned with each other.
The design docs specify the state of the project as it is, and are kept in sync during implementation sessions, by instructing the agents to treat their updates as part of the implementation plan for any work.
The idioms docs keep track of incidental decisions that don't relate to long term goals per se, but things like code style detail, specific project-related investigative techniques, code organization rules, build process guides, etc.
It's a single anecdote but I found that overall the work encountered fewer low-level design mismatches where one session doing work on one thing would make a design decision that didn't really mesh well with another session doing work on another thing. Overall hygiene took less work to maintain.
There are likely a variety of superior ways of organizing things, but at the very least it seems there's a ton of value to be squeezed from just organizing your project meta-information in certain ways. Definitely worth spending some time experimenting with.
I think the majority here have stated the same... That CLAUDE.md or AGENTS.md effectively do this. Either that or the readme.
The only tip I can give is that your skill that builds or wraps up work. You should have it update those files if anything has changed.
Claude/Agents files shouldn't be bloated, but should imho act as a basic amount of context on the project so your agent and skills can pick up and go, with even the most basic initial prompt.
> The only tip I can give is that your skill that builds or wraps up work. You should have it update those files if anything has changed.
Depending on the scope of work you’re doing, it might be better to have this removed from the context of the work that was done.
I keep a “Last Updated Hash” in my md and every so often will have the LLM pull a diff from that hash to the current head, then determine what doesn’t match.
Another day another "memory" system for a tool that cannot ever have memory. LLMs have context, and the more you fill that context with unrelated junk, the worse they perform.
I never have to because I use a ticketing system the model goes through in addition to a CLAUDE.md file with a summary, including vision, goals, non-goals etc
Any tricks to get Claude to actually use the CLAUDE.md consistently? Many times now its completely ignored it, despite being short, concise + generated by Claude itself, and I see bug reports about this that are over a year old
What behavioural things are you noticing that Claude does not pickup from the CLAUDE.md. I am working on a `pi-brains` extension for the Pi agent. It is designed to inject rules into write and edits tool calls for matching files.
I am curious if the behaviour you want is outside of writing and editing files.
Check out your session logs and review what is actually in the context window. I’m willing to bet that your CLAUDE.md is sitting close to the middle of everything in there. The current gen of frontier models tends to heavily weight the start and end of the current context so heavily that anything partway through may just be ignored.
CLAUDE.md is already a good system for context window management for all the same reasons that version control management of code is good.
And keeping a local copy of everything you ever told Claude in your context window is bad for the same reasons keeping a local copy of your code called My_Code_v3_final.zip is bad.
My employer is counting token usage, so explaining my project between tokens isn’t necessarily a bad thing. I am clearly a more productive engineer because of it \end{sarcasm}
I was tired of seeing "--resume"^W loose so much context.
[edit: not --resume, I meant a new session using project memory]
I had the idea of an oracle/apprentice.
The idea was that the new session would learn from the old session, like a tutor.
I asked Claude to code a program, so that the old session would launch in loop, being the "oracle", and the new session would use it to connect to the oracle, to ask it questions.
So when I'm doing that, I'm seeing the two Claude sessions discussing between them. It's fun seeing the "apprentice" asking follow-ups questions.
The mechanism (on a Linux machine) is relatively simple : I asked Claude to code a Go tool, to use an abstract socket (\0claude-handoff-<projectname>), and specified that whichever is the first to successfully listen (no EADDRINUSE) becomes the "listen()er/accept()er" and that the second becomes the "connect()er".
So that establishing the socket in whichever order is independent of which of them is the oracle/listener.
I've put the mechanism in a global Claude rules. In the oracle when I'm a 98% usage of the 1M context, I just have to type "handoff <projectname> oracle", and to start a new session with "handoff <projectname> client".
And the "oracle" will loop on the tool (with a subagent, waiting indefinitely), the tool exits with a question on output, and re-call (with a subagent "handoff <projectname> answer") to give back the answer (which automatically waits for the next question).
And since the oracle is doing the call to the handoff tool in a subagent, when you see it answering, you can also type something along the lines of "hey please also precise to the apprentice <some specific information>".
The "transmission of knowledge which matters" is so much efficient, that ~2% of remaining context (20k tokens) is enough to transmis WAY MORE USEFUL information that any memory saving which would miss important informations.
It's not unexpected. It's like real-life. You may have an human put all informations that you want in some documentation, nothing can replace a phone line from the new human to the previous human for specific follow-up questions.
Tho one of the mattering rule specificity is to precise that the oracle should always include in this response the level of confidence in the answer, like if it is certainty/guess/hypothesis.
It's fun seeing the two Claudes discussing like two colleagues. I guess you could also ask Claude to code the tool to instead connect on a localhost IRC server.
I think that if you want this tool, you just have to c/c my text into a new Claude session and to tell it "I want this too, please code it, please also setup the global rule".
I’ll typically have a bunch of short sessions over the course of a day. Anytime I start a task that isn’t going to very directly benefit from the existing context I start fresh.
I don’t find a lot of benefit in explaining the project overall to Claude — I’ve deleted a lot of that explanation from my Claude.md because it didn’t seem to impact much.
I typically start a task by pointing it to 1-2 files and giving it some explanation of what I want done, and it figures it out.
Basically never hit context window limits or compactions, and can’t remember the last time I hit a 5 hour or a weekly limit.
This takes a couple minutes (and I suppose I'm spending tokens each time), but sessions rarely reach compaction length and I like that I'm not trying to keep a whole separate pile of docs in sync.
They aren't really "explaining the project" either, but more module- or task-specific preferences, hand reference pointers, or other things like "there are mixed examples of how to do certain things in this project, prefer X to Y." I use a write-everything-twice approach. After I find myself having to correct an implementation because it didn't figure out one of these things on its own from the existing code, I'll add an entry. That also avoids bloating things with "I think this is relevant" compared to "I have noticed that this is necessary."
I keep doing this because it lets me experiment with different approaches to problems without risk of it fixating on things from a previous abandoned attempt, and particularly because sometimes I'm wrong and I haven't found the agent harnesses particularly reliable at taking my word for it from a POV of "yes I know I said we need xyz earlier, but let's please entirely forget about that."
I guess I need to do some claude.md work or find other ways to prime the session so i get the good personality and not the evil twin.
I've been running Claude and GPT in my own agent harness. The main difference I notice is that tasks take about 7x longer to complete if they're run in the official Claude or Codex harness (and cost me 7x more).
You would think this would lead to increased correctness, but that doesn't seem to be the case. Today I tested both side by side. They both resulted in data loss. (I had a backup obviously.)
GPT running in the official harness did a bunch of extra tests and double checking, and ended up with the same result regardless (it permanently deleted a bunch of documentation).
All else being equal, I like getting my data loss 7x faster and cheaper ;)
https://minimal-agent.com/
It's slightly bigger now, but here's a ~50 line version for reference. I added the missing outer while-loop, so it takes user input etc.
https://gist.github.com/a-n-d-a-i/bd50aaa4bdb15f9a4cc8176ee3...
I mostly use it with GLM via their coding plan, I got a year for like $20 when it was on sale. But I also hooked it up to Sonnet, Opus, GPT, etc.
Normally I do it exactly as you say, point at a few files, but if I know these features are involved I point at the corresponding mds instead. Its a shortcut for me to type less.
It basically takes care of itself, or at least as close as it can.
I can’t imagine this being worth optimizing. The issue is never that Claude can’t figure out what the projects is about…
Am I missing something or does this project not solve a problem most regular people have?
A good method seems to be only make a skill or memory when the LLM gets something wrong, or if you actually observe it's always doing the same step and you can get the model to the same place with less tokens.
The solution that I've developed is, let the agent figure things out efficiently, without inflating the context. I have what I call a smart repo that better explains this at
https://github.com/gitsense/smart-ripgrep
The basic idea is, when the agent does a ripgrep it gets back files + matching lines + context.
Even the /handoff skill was written by the model…
I also imagine that varies by model.
I do think that this project is interesting in several ways - prioritizing privacy, minimizing spend, and using objective semantic markers to sift and consolidate the key takeaways from long sessions. I'd like to try it on my cline project history. But while it would make a great recording of project history, I wonder if a lot of it doesn't end up detailing blind alleys the project went down and had to back out of.
Generally when this happens I feel that it's due to vague specification on my part, or avoiding architectural decisions I didn't want to deal with and implicitly inviting the model to implement a lowest-common-denominator solution.
Not sure I’d call that “stopping wasting my tokens”.
The goal docs provide directionality - helping the agent generally make consistent design decisions. Scoping constraints and stuff are useful to put here, but also feature goals a general idea of where the project is heading long-term (even if none of the items there are on the implementation roadmap anytime soon). It keeps more of the sessions aligned with each other.
The design docs specify the state of the project as it is, and are kept in sync during implementation sessions, by instructing the agents to treat their updates as part of the implementation plan for any work.
The idioms docs keep track of incidental decisions that don't relate to long term goals per se, but things like code style detail, specific project-related investigative techniques, code organization rules, build process guides, etc.
It's a single anecdote but I found that overall the work encountered fewer low-level design mismatches where one session doing work on one thing would make a design decision that didn't really mesh well with another session doing work on another thing. Overall hygiene took less work to maintain.
There are likely a variety of superior ways of organizing things, but at the very least it seems there's a ton of value to be squeezed from just organizing your project meta-information in certain ways. Definitely worth spending some time experimenting with.
The only tip I can give is that your skill that builds or wraps up work. You should have it update those files if anything has changed.
Claude/Agents files shouldn't be bloated, but should imho act as a basic amount of context on the project so your agent and skills can pick up and go, with even the most basic initial prompt.
Depending on the scope of work you’re doing, it might be better to have this removed from the context of the work that was done.
I keep a “Last Updated Hash” in my md and every so often will have the LLM pull a diff from that hash to the current head, then determine what doesn’t match.
I am curious if the behaviour you want is outside of writing and editing files.
And keeping a local copy of everything you ever told Claude in your context window is bad for the same reasons keeping a local copy of your code called My_Code_v3_final.zip is bad.
But if I may, the need to manually update the context is a huge hurdle.
Automation like this is limited unless no human has to remember it. So perhaps you can save context during the PreCompact and Stop hooks.
I saw /graphify recently which cuts down on exploration cost and seems more appealing (although I haven’t tried it yet)
I had the idea of an oracle/apprentice. The idea was that the new session would learn from the old session, like a tutor.
I asked Claude to code a program, so that the old session would launch in loop, being the "oracle", and the new session would use it to connect to the oracle, to ask it questions.
So when I'm doing that, I'm seeing the two Claude sessions discussing between them. It's fun seeing the "apprentice" asking follow-ups questions.
The mechanism (on a Linux machine) is relatively simple : I asked Claude to code a Go tool, to use an abstract socket (\0claude-handoff-<projectname>), and specified that whichever is the first to successfully listen (no EADDRINUSE) becomes the "listen()er/accept()er" and that the second becomes the "connect()er".
So that establishing the socket in whichever order is independent of which of them is the oracle/listener.
I've put the mechanism in a global Claude rules. In the oracle when I'm a 98% usage of the 1M context, I just have to type "handoff <projectname> oracle", and to start a new session with "handoff <projectname> client".
And the "oracle" will loop on the tool (with a subagent, waiting indefinitely), the tool exits with a question on output, and re-call (with a subagent "handoff <projectname> answer") to give back the answer (which automatically waits for the next question).
And since the oracle is doing the call to the handoff tool in a subagent, when you see it answering, you can also type something along the lines of "hey please also precise to the apprentice <some specific information>".
The "transmission of knowledge which matters" is so much efficient, that ~2% of remaining context (20k tokens) is enough to transmis WAY MORE USEFUL information that any memory saving which would miss important informations.
It's not unexpected. It's like real-life. You may have an human put all informations that you want in some documentation, nothing can replace a phone line from the new human to the previous human for specific follow-up questions.
Tho one of the mattering rule specificity is to precise that the oracle should always include in this response the level of confidence in the answer, like if it is certainty/guess/hypothesis.
It's fun seeing the two Claudes discussing like two colleagues. I guess you could also ask Claude to code the tool to instead connect on a localhost IRC server.
I think that if you want this tool, you just have to c/c my text into a new Claude session and to tell it "I want this too, please code it, please also setup the global rule".
My advice: the best claude is the raw claude, with some custom tailored skills. That’s it, no plugins.