GLM-5.2 is a step change for open agents

(interconnects.ai)

91 points | by vantareed 1 day ago

6 comments

  • jerojero 1 day ago
    Open weight models from Chinese labs tend to be significantly cheaper.

    I think theyre absolutely needed. I can't afford 200 USD a month for personal use of coding AI, and I don't think such prices are reasonable for most of the world economy anyway. Not to mention US firms might be giving their employees a lot more than that.

    It's increasingly feeling, to me, that theres a gap building up between haves and have nots. But then, we get news of these open weight models that are reasonably priced in inference with reasonable capabilities. Yes, they take maybe 6-9 months to get there, tbh, that's not a bad trade off at all.

    • Fr0styMatt88 10 minutes ago
      If we can agree that the AI model is at least as capable as a junior engineer or new contractor, how’s that different to saying “software engineering isn’t worth $200 a month”?

      Has a very race-to-the-bottom feel to it.

      Though in the grand scheme of it, $200/mo probably isn’t the real price either. Also looking at it not just in a vacuum - paying for a product that can change what you get from under you doesn’t seem great anyway.

      At least with a locally-hosted model you know what you’re getting.

    • tacomagick 16 hours ago
      DeepSeek through their own API has saved me tons of tokens honestly. Even though it is not as smart as Kimi or Claude, their level of entry is very low with a top up of 2$ and Pay as you go compared to the subscription of Claude or 20$ top up of Kimi
      • praveer13 2 hours ago
        For personal use I’m considering using the frontier models from openai or anthropic to create a plan with research and brainstorming etc with enough details for cheap models to be able to follow (glm, deepseek etc) - with openrouter - will monitor how cheap and effective that turns out to be.
        • ImaCake 1 minute ago
          You should try out the cheaper models first. I find Deepseek v4 models pretty comparable to sonnet 4.6 but at a fraction of the cost. You might find you just don't need to use the American models at all.
    • ttoinou 1 day ago
      200 is much less than the value you’re supposed to get out of it. If it’s not then yeah go ahead and use cheaper models with worst quality
      • martinjc 2 hours ago
        Are you aware of how much purchasing power 200 dollars is in china, brazil, thailand or india is? This is an extremely arrogant take.
      • uberex 1 hour ago
        Unless that value is $200 cash in hand it will be hard to afford it for people who just don't have $200.
      • Dayshine 1 day ago
        I'm not sure how I'm supposed to get $200 of value out of personal use!
        • LPisGood 2 hours ago
          Note that 200 dollars of value is different than 200 dollars of profit.
        • devmor 2 hours ago
          I personally don’t find it that useful for most tasks, but if say, you get paid $50/hr for your work and it saves you more than 4 hours of work in a month, there you go.
        • holoduke 1 hour ago
          Here most of my colleagues have +200 dollar rates. It's really a no brainer. But sure, in south America or some Asian countries maybe it is. But still most devs need it anyway. Also in the poor regions.
          • HDBaseT 58 minutes ago
            $200/h is on the extreme end and I would argue most people here aren't anywhere close to that.

            The median hourly wage in the US is $28/h, this equates to nearly 7.5 hours. A full day of work a month for the average person to use Claude with reasonable limits.

            Yes, the people on $28/h may not be the software development types, so their income might not be as high, but these are the people who would probably be vibe coding the most since they aren't day to day programmers!

      • margalabargala 47 minutes ago
        Last time you bought a computer, did you buy the absolute fastest best CPU available?
        • girvo 42 minutes ago
          Yes, but that was because I could see the writing on the wall with respect to hardware prices being cooked by AI demand, so I built the best computer possible at the time knowing it'd probably need to last me the next 5+ years

          So not really comparable. I use Step 3.7 Flash locally, models are good enough for so many coding tasks even at the lower end! (Though I note that calling a 200B model "lower end" is kind of amusing)

      • smrtinsert 57 minutes ago
        I've actually come to believe the overwhelming majority of use cases require nowhere frontier quality so there's that. Much faster execution is just a bonus on top of the much reduced cost
  • aunty_helen 47 minutes ago
    I signed up to a z.ai max account, $144. Hardly been able to use it as it 429s on most requests. They’re also refusing to refund me.
    • osti 1 minute ago
      Even as a GLM z.ai fan, I wouldn't pay for their plans. They are just way worse values than gpt or anthropic plans, in terms of both usage and capabilities.
    • sergiotapia 7 minutes ago
      My experience as well unfortunately :(
  • themgt 2 hours ago
    I just tested GLM 5.2 out via Z.ai in pi for a little one-off project that was already scoped. It actually did a relatively decent job starting out, and figured important things out from context.

    But the reasoning traces became increasingly hilarious, with it getting confused and going in loops, doubting itself. I began to feel almost sad, it was like listening to the internal monologue of someone with anxiety disorder.

    It made pretty good progress but wound up going in a lot of goofy loops and doing things a bit "off" from standards I'd hoped it would infer, and finally started going a bit nuts, "This is very confusing.", "OH WAIT", seemingly hallucinating a whole side-quest that didn't make sense and looking at making internal system changes to try to achieve its (now very confused) goal when I pulled the plug.

    Without seeing the reasoning traces from Claude/GPT it's hard to really know, but it definitely didn't feel like the same quality of reasoning, even if dogged persistence does wind up actually working eventually.

    • jauntywundrkind 1 hour ago
      I think the self-doubt might actually be a very crucial part of it's capability. I often feel compelled to interrupt when I'm watching it think (which thank the stars it let's us do, unlike the big American models!!), but usually it makes the right pick!

      Being willing and able to reconsider seems very good. Going around and around, pulling in more thinking, integrating it: maybe that's why it is as good as it's good.

      I want to emphasize again how excellent it is that we can see the thinking. I think this makes GLM so much better an experience for me. It gives me such insight into what is being considered, helps me see where things go wrong. It grounds me, gives me the notion of where the results come from. It was so jarring to switch to GPT and Opus and find that they won't discuss with me, won't reveal their thinking: that feels fundamentally unsafe, for me, for society, to have such a severe black box. I don't think it should be allowed, honestly.

      Many thanks to this recent submission, which is the first time I've seen anyone blog about this core difference: The text in Claude Code’s “Extended Thinking” output is not authentic. https://patrickmccanna.net/the-text-in-claude-codes-extended... https://news.ycombinator.com/item?id=48630535

      • wuhhh 1 hour ago
        Your post made me laugh because I experienced the same as you but the other way around. I switched from Claude to a multi model harness a couple of days ago and the first model I tried was GLM5.2.

        I gave it some simple code porting exercises and watched dumbfounded at the reasoning, which was more like the ravings of a lunatic - but lo and behold, after much confusion and a dizzying number of eureka moments the task was completed very successfully.

        I tried Kimi on a similar task, much faster, a little more reassuring somehow in its ramblings, also surprisingly good results.

        To be clear, I’m not surprised the results were good because they’re not GPT or Claude, but because the line of reasoning was so bonkers. Coming from Claude, I was just not used to seeing this, but I’ll bet it’s just as nuts with the frontier models and we’re just not allowed to see it (I’m about to read the links you shared).

        Agree wholeheartedly that transparency is of grave importance.

        • rainmaking 43 minutes ago
          Yeah isn't that thinking weird?

          Now I see the issue clearly! But wait... now I have the full picture! But wait... Found it!

          I gave up a few times because of it at first until I realized I just had to let GLM get on with it and what came out was great!

          But once it was outright endearing- challenging bug, it said: I have been very thorough. Then it escalated where to look and aced it. Built in confucian values

          • wuhhh 8 minutes ago
            If there’s one thing I’ve learned these past couple of days, it’s to resist the temptation to jab the escape button and start waving my arms! I wonder how much of this cyclical self doubt / self congratulating I go through in my own thoughts without even realising it. If you could verbalise or articulate all the half thoughts, snatches of ideas, feelings and ruminations the human mind goes through on some tasks it might be even more bizarre (or could just be me)
  • timcobb 57 minutes ago
    Can people share their GLM and open model setups in general please? What provider do you use. Why do you trust it with serving full quality? What harness do you use? Why do you trust it not to have malware (most harnessed are TS apps). I am just trying GLM 5.1 from Nvidia build in open code would love to hear how you all do it, thanks.
    • gandreani 15 minutes ago
      I use both the openai subscription and the opencode go subscription. I use the go subscription for my personal work and the openai subscription for my consulting work.

      The differences between the models are minimal, but I usually stick with gpt-5.4-mini, gpt-5.4, mimo-pro-2.5, deepseek-v4-pro. These latter ones have way more usage than even using 5.4-mini so I tend to use them in personal projects for that reason.

      My harness is https://github.com/can1357/oh-my-pi. I trust it...enough. It updates very frequently so as a safe guard I run it sandboxed with https://github.com/containers/bubblewrap so it can only access the project folder and some whitelisted config files

      • timcobb 7 minutes ago
        Thanks. I was looking at open code go yesterday and I couldn't figure out if the base pricing is including usage or if that's just base pricing and then you have to pay for usage too. How does it work? It is very cheap.
    • michimagdesign 28 minutes ago
      Next to my Claude Pro plan, I have subbed to OpenCode Go. I find the OpenCode UX much better than in Claude Code CLI. As for models, I started a few months ago with GLM 5.1 and it was solid and could archive near sonnet-level tasks. It weirdly sputtered out Chinese characters sometimes. Then I switched to Kimi K2.6, which is the Chinese model I used the most until now. It used way too many reasoning tokens (improved in k2.7). But executed Claude created plans reliably. Now I’m back with GLM 5.2 and it’s really solid (among other things it’s good at design) and I get good usage with the $10 plan. Still the Claude models have less hiccups but the Chinese models are getting really close.
    • smoe 38 minutes ago
      For work, I mostly use Codex and some Claude. For personal use, I’ve started using Chinese models directly through their respective providers, mostly for automation tasks and experiments so far, either via the API directly or through the Pi harness.

      I do not trust any of them. Everything runs inside virtual machines, not just the sandboxes provided by the harnesses. I also do not run Claude or Codex directly on the host machine. Not just because of supply chain fears, but also because of how incredibly user hostile the VC funded companies are when it comes to installing random stuff on your machine.

    • rainmaking 47 minutes ago
      GLM 5.2 coding plan- I'll post the agent as soon as I can! But opencode works and their own zcode is really good as well.
  • citizenpaul 1 hour ago
    Ive been using glm5 since its release and still prefer it to glm5.1 and so far to glm5.2

    Perhaps it is just my harness and workflow, but the older model still seems to work better. Also the token cost is significantly lower. I rarely spend more than $20 a week with $50 cap. Not even half claudes ambiguous minimum $200 a month plan.

    • rainmaking 4 minutes ago
      Now that's a tremendous pointer, I'm going to have to try that.

      Do you full on let GLM5 get stuff done on its own or is it more like a guided workflow? The former's what the point releases doubled down on and is also something that uses a lot of juice.

  • Balinares 1 day ago
    I can't help wondering what kind of models we'll see coming out of China once it gets its own chip fabs up and running. Right now it sounds like the US's export ban is not slowing them down a whole lot.
    • ceejayoz 1 hour ago
      > Right now it sounds like the US's export ban is not slowing them down a whole lot.

      It may wind up being a massive boost to them in the long run, even.

      Necessity is the mother of invention.