4 comments

  • TheDong 3 hours ago
    I was curious, so I dug a bit.

    Under the hood it's effectively running:

        docker run --rm -w $PWD:/workspace \
          python:3.11-slim \
          pip install -q patchpal && \
          <command>
    
    Which cool, great, I sure love "pip install"ing every time instead of just baking a single container image with it already installed.

    This isn't any sort of fancy or interesting sandboxing, this is shelling out to "docker run", and not even using docker as well as it could.

    Quoting from the linked page:

    > The tradeoff is ~5-10 seconds of container startup overhead

    Sure, maybe it's 5-10 seconds if you use containers wrong. Unpacking a root filesystem and spinning up a clean mount namespace on linux is a few ms, and taking more than a second means something is going wrong, like "pip install"ing at runtime instead of buildtime for some reason.

    I can spin up a full linux vm and run some code in quicker than 5 seconds.

  • davispeck 3 hours ago
    This feels less like "agents" and more like a controlled generate → execute → fix loop.

    Works great when you have a clear verification signal (tests passing), but what drives convergence when that signal isn’t well-defined?

  • lightningenable 2 hours ago
    Sandboxed execution is the right call. The next challenge is what happens when these agents need to pay for external resources — APIs, data, compute. Right now most demos just assume free access or hardcoded API keys. Curious how you're thinking about that.
  • gpubridge 2 hours ago
    The "2 lines of code" framing is appealing but hides the real complexity: what happens when the agent needs to make external API calls at runtime?

    Sandboxed execution solves the safety problem (agent cannot destroy your filesystem). But autonomous agents also need compute resources — inference, embeddings, image generation — that run outside the sandbox. The payment and authentication for those external calls is where the interesting engineering happens.

    An agent running in a sandbox with a funded wallet (USDC on Base L2 via x402) can pay for its own compute without any human in the loop. That is the missing piece between "launch an agent" and "agent runs autonomously for weeks."