Detailed balance in large language model-driven agents

(arxiv.org)

48 points | by Anon84 53 days ago

4 comments

gwern 49 days ago
There's a vein of research which interprets self-attention as a kind of gradient descent and says that LLMs have essentially pre-solved indefinitely large 'families' or 'classes' of tasks, and the 'learning' they do at runtime is simply gradient descent (possibly Newton) using the 'observations' to figure out which pre-solved instance they are now encountering; this explains why they fail in such strange ways, especially in agentic scenarios - because if the true task is not inside those pre-learned classes, no amount of additional descent can find it after you've found the 'closest' pre-learned task to the true task. (Some links: https://gwern.net/doc/ai/nn/transformer/attention/meta-desce... )
I wonder if this can be interpreted as consistent with that 'meta-learned descent' PoV? If the system is fixed and is just cycling through fixed strategies, that is what you'd expect from that: the descent will thrash around the nearest pre-learned tasks but won't change the overall system or create new solved tasks.
Mathnerd314 49 days ago
So, the takeaway I get from this paper is that if you have a language model and you set it up so it has an input and it generates an output that is towards some goal (e.g., "make this sentence sound smarter"), then it should converge, because it is following a potential function.
But I have used prompts like this a fair amount, and it is more like stochastic gradient descent - most of the time, once it is close to the target, the model will take a small incremental change, but when it is really close the model will sort of say "this is not improveable as it is" and it will take a large leap to a completely different configuration. And then this will do the incremental optimizations and so on. This could be an artifact of the sampling algorithm, but I think it is also an issue that the model has this potential function encoded, but the prompt and the structure of the model do not actually minimize this potential. So, a real lesson here is that there is actually a lot of work still left to do in terms of smarter sampling. Beam search like is used today is sort of the tip of the iceberg. If we could start doing optimization with the transformer model as a component, like optimizing pipelines of reasoning rather than always generating inputs and outputs sequentially, that is where you could start using this potential function directly and then you would see orders of magnitude smarter AI. There is stuff about prompt optimization, but it is still based on treating models as black boxes rather than the piles of math they are.
[-]
- versteegen 48 days ago
  That's an interesting observation. I'd suggest modelling the LLM's behaviour in that situation as selecting between different simple strategies, each of which has its own transition function. Some of the strategies will be far more common than others. Some of them may be very simple and obey the detailed balance condition (meaning they are reversible Markov chains), but others, and the overall transition function does not.
  The definition of the detailed balance condition is very strict and it's obvious that it won't be met in general by most probabilistic programs (sets of rules with probabilistic output) even if you consider only those where all possible outputs have non-zero probability (as required by detailed balance).
  And the LLM+agent is only a Markov chain because of the limited state space of the agent. While an LLM is adding to its context window without reaching the window size limit, it is not a Markov chain, as I explained here: https://news.ycombinator.com/item?id=45124761
  And, agreed that better optimisation would be incredible. (I would describe it as a search problem.) I'm not sure how feasible it is improve without changing the architecture, e.g. to a diffusion language model. But LLMs already predict many tokens ahead at once which is why beam search is surprisingly unnecesarr. That's how they're able to write coherent sentences (and rhymes), they've already largely determined at the beginning what they're going to write. (See Anthropic mech interp work.) So maybe if we could tap into that we search over vaguely-formed next blocks of text rather than next words.
dhampi 48 days ago
The actual title is pretty buzzy given how limited the task described is. In one specific, very constrained and artificial task, you can find something like detailed balance. And even then, their data are quite far from being a perfect fit for detailed balance.
Would love it if I could use my least action principle knowledge for LLM interpretability, this paper doesn't convince me at all :)
[-]
- versteegen 48 days ago
  Since it took me some minutes to find the description of the task, here it is:
  We conducted experiments on three different models, including GPT-5 Nano, Claude-4, and Gemini-2.5-flash. Each model was prompted to gener- ate a new word based on a given prompt word such that the sum of the letter indices of the new word equals 100. For example, given the prompt “WIZ- ARDS(23+9+26+1+18+4+19=100)”, the model needs to generate a new word whose letter indices also sum to 100, such as “BUZZY(2+21+26+26+25=100)”