I've been playing with this for the past 24 hours or so. I like the atomic containment of the LLM, and the clear separation of logic, code, and prompts.
You have some great working examples, but, for example: translate_text specifies the default language in three places: the card, the input schema, and the deck. This can't be necessary; I'll experiment, but shouldn't it just be defined in one place?
The descriptive language of the project is a bit dense for me too. I'm having a hard time figuring out how to do basic things like parameters -- let's say that I want to constrain summarize_text to a certain length... I've tried to write language in the cards/decks, but the model doesn't seem to be paying attention.
I also want to be able to load a file, e.g. not just "translate 'hello my friend' to Italian" but "translate '/test/hello_my_friend.txt' to Italian" and have it load the contents of the file as input text. How do I do that?
Nice architecture. The typed deck composition pattern is exactly right for making agent workflows testable.
One thing I've been thinking about is that schema validation catches "is this data shaped correctly?" but not "is this action permitted given who initiated the request?" When you have deck → child deck → grandchild deck chains, a prompt injection at any level could trigger actions the root caller never intended.
I've been working on offline capability verification for this using cryptographically signed warrants that attenuate as they propagate down the call chain. Curious if you've thought about that layer, or if you're relying on the model to self-police tool selection?
we have a "gambit_init" tool call that is synthetically injected into every call which has the context. Because it's the result of a tool call, it gets injected into layer 6 of the chain of command, so it's less likely to be subject to prompt injections.
Also, relatedly, yes i have thought EXTREMELY deeply about cryptographic primitives to replace HTTP with peer-to-peer webs of trust as the primary units of compute and information.
Imagine being able to authenticate the source of an image using "private blockchains" ala holepunch's hypercore.
Injecting context via tool outputs to hit Layer 6 is a clever way to leverage the model spec.
The gap I keep coming back to is that even at Layer 6, enforcement is probabilistic. You are still negotiating with the model's weights. "Less likely to fail" is great for reliability, but hard to sell on a security questionnaire.
Tenuo operates at the execution boundary. It checks after the model decides and before the tool runs. Even if the model gets tricked (or just hallucinates), the action fails if the cryptographic warrant doesn't allow that specific action.
Re: Hypercore/P2P, I actually see that as the identity layer we're missing. You need a decentralized root of trust (Provenance) to verify who signed the Warrant (Authorization). Tenuo handles the latter, but it needs something like Hypercore for the former.
Would be curious to see how Gambit's Deck pattern could integrate with warrant-based authorization. Since you already have typed inputs/outputs, mapping those to signed capabilities seems like a natural fit.
So I look at something like Mastra (or LangChain) as agent orchestration, where you do computing tasks to line up things for an LLM to execute against.
I look at Gambit as more of an "agent harness", meaning you're building agents that can decide what to do more than you're orchestrating pipelines.
Basically, if we're successful, you should be able to chain agents together to accomplish things extremely simply (using markdown). Mastra, as far as I'm aware, is focused on helping people use programming languages (typescript) to build pipelines and workflows.
So yes it's an alternative, but more like an alternative approach rather than a direct competitor if that makes sense.
nice work. the idea of breaking agents into short-lived executors with explicit inputs/outputs makes a lot of sense - most failures i've seen come from agents staying alive too long and leaking assumptions across steps.
curious how you're handling context lifetimes when agents call other agents. do you drop context between calls or is there a way to bound it? that's been the trickiest part for us.
omg thank you so much. We're working on the file system stuff, that's an easier lift for us than the initial work, so we wanted to start with the big stuff and work backward. Claude Code and Codex are obviously really great at that stuff, and we'd like to be able to support a lot of that out of the box.
This seems to be where it’s at right now, we can’t seem to make the models significantly more intelligent, so we “inject” our own intelligence into the system, in the form of good old fashioned code.
My philosophy is make the LLMs do as little work as possible. Only small, simple steps. Anything that can be reasonably done in code (orchestration, tool calls, etc) should be done in code. Basically any time you find yourself instructing an LLM to follow a certain recipe, just break it down to multiple agents and do what you can with code.
i have a slightly different but related take. the models actually are getting smarter, and now the challenge becomes successfully communicating intent with them instead of simply getting them to do anything remotely useful.
Gambit hopefully solves some of that, giving you a set of primitives and principles that make it simpler to communicate intent.
You have some great working examples, but, for example: translate_text specifies the default language in three places: the card, the input schema, and the deck. This can't be necessary; I'll experiment, but shouldn't it just be defined in one place?
The descriptive language of the project is a bit dense for me too. I'm having a hard time figuring out how to do basic things like parameters -- let's say that I want to constrain summarize_text to a certain length... I've tried to write language in the cards/decks, but the model doesn't seem to be paying attention.
I also want to be able to load a file, e.g. not just "translate 'hello my friend' to Italian" but "translate '/test/hello_my_friend.txt' to Italian" and have it load the contents of the file as input text. How do I do that?
Super cool project!
you can set up really complex validation.
thanks for checking it out!!
One thing I've been thinking about is that schema validation catches "is this data shaped correctly?" but not "is this action permitted given who initiated the request?" When you have deck → child deck → grandchild deck chains, a prompt injection at any level could trigger actions the root caller never intended.
I've been working on offline capability verification for this using cryptographically signed warrants that attenuate as they propagate down the call chain. Curious if you've thought about that layer, or if you're relying on the model to self-police tool selection?
1/ crypto signing is totally the right way to think about this. 2/ I'm limiting prompt injection by using chain of command: https://model-spec.openai.com/2025-12-18.html#chain_of_comma...
we have a "gambit_init" tool call that is synthetically injected into every call which has the context. Because it's the result of a tool call, it gets injected into layer 6 of the chain of command, so it's less likely to be subject to prompt injections.
Also, relatedly, yes i have thought EXTREMELY deeply about cryptographic primitives to replace HTTP with peer-to-peer webs of trust as the primary units of compute and information.
Imagine being able to authenticate the source of an image using "private blockchains" ala holepunch's hypercore.
The gap I keep coming back to is that even at Layer 6, enforcement is probabilistic. You are still negotiating with the model's weights. "Less likely to fail" is great for reliability, but hard to sell on a security questionnaire.
Tenuo operates at the execution boundary. It checks after the model decides and before the tool runs. Even if the model gets tricked (or just hallucinates), the action fails if the cryptographic warrant doesn't allow that specific action.
Re: Hypercore/P2P, I actually see that as the identity layer we're missing. You need a decentralized root of trust (Provenance) to verify who signed the Warrant (Authorization). Tenuo handles the latter, but it needs something like Hypercore for the former.
Would be curious to see how Gambit's Deck pattern could integrate with warrant-based authorization. Since you already have typed inputs/outputs, mapping those to signed capabilities seems like a natural fit.
How would it compare?
I look at Gambit as more of an "agent harness", meaning you're building agents that can decide what to do more than you're orchestrating pipelines.
Basically, if we're successful, you should be able to chain agents together to accomplish things extremely simply (using markdown). Mastra, as far as I'm aware, is focused on helping people use programming languages (typescript) to build pipelines and workflows.
So yes it's an alternative, but more like an alternative approach rather than a direct competitor if that makes sense.
That does not sound like a "guarantee", at all.
curious how you're handling context lifetimes when agents call other agents. do you drop context between calls or is there a way to bound it? that's been the trickiest part for us.
thinking about ways to deal with that but we haven’t yet done it.
[see https://news.ycombinator.com/item?id=45988611 for explanation]
are things like file system baked in?
fan of the design of the system. looks great architecturally
hard to explain… we’ll keep going.
My philosophy is make the LLMs do as little work as possible. Only small, simple steps. Anything that can be reasonably done in code (orchestration, tool calls, etc) should be done in code. Basically any time you find yourself instructing an LLM to follow a certain recipe, just break it down to multiple agents and do what you can with code.
Gambit hopefully solves some of that, giving you a set of primitives and principles that make it simpler to communicate intent.