I already switched to claude a while ago. Didn’t bring along any context, just switched subscriptions, walked away from chatgpt and haven’t touched it again. Turned out to be a non-event, there really is no moat.
I switched not because I thought Claude was better at doing the things I want. I switched because I have come to believe OpenAI are a bad actor and I do not want to support them in any way. I’m pretty sure they would allow AGI to be used for truly evil purposes, and the events of this week have only convinced me further.
Yesterday was my first time trying it. One thing that felt a bit strange to me was that I asked it something and the response was just one paragraph. Which isn't bad or anything but it felt... strange? Like I always need to preface ChatGPT/gemini/whatever question with "Briefly, what is..." or it gives me enough fluff to fill a 5 page high school essay. But I didn't need to do that and just got an answer that was to the point and without loads of shit that's barely related.
And the weirdest thing that I noticed: instead of skimming the response to try finding what was relevant, I just straight up read it. Kind of felt like I got a slight amount of focus ability back.
Accuracy is something I can't really compare yet (all chatbots feel generally the same for non-pro level queries), but so far, I'm fairly satisfied.
I use Gemini all the time, but I have to say it's got verbal diarrhea and an EXTREMELY annoying trait of wanting to lead the conversation rather than just responding to what YOU want to do. At the end of every response Gemini will always suggest a "next step", in effect trying to 2nd guess where you want the conversation to go. I'd much rather have an AI that just did what it was asked, and let me decide what to ask next (often nothing - maybe it was just a standalone question!).
Apparently this annoying "next step" behavior is driven by the system prompt, since the other day I was running Gemini 3 Thinking, and it was displaying it's thoughts which included a reminder to itself to check that it was maintaining a consistent persona, and to make sure that it had suggested a next step. I'd love to know the thought process of whoever at Google thought that this would make for a natural or useful conversation flow! Could you imagine trying to have a conversation with a human who insisted on doing this?!
Why not just write a skill and script that calls crawl4ai or similar and do this using Claude code?
You can store the page as markdown for future sessions, mash the data w other context, you name it.
The web Claude is incredibly limited both in capability and workflow integration. Doesn’t matter if you’re dealing with bids from arbor contractors or researching solutions for a DB problem.
Rolling your own is not the solution for the common case where you’re asking an LLM a question that may or may not be supported or supplemented by a web search. ChatGPT decides by itself when and how to consult the web, and then links the relevant sources in its result. You don’t get that functionality from Claude chat, you’d have to completely build your own chat harness and apps.
Sites like Reddit are blocking AI providers, they have to have some contract with them for access. OpenAI does seem to have that.
That's so frustrating with Claude. If I need to widely search the web or if I need it to read a specific URL I pasted, I always turn to ChatGPT. Claude seems to hit a lot more roadblocks while trying to navigate the web.
The issue is Reddit though. They're the ones blocking. They're very aggressive.
When sites are working in one chatbot and not another, there's a good chance that the latter is respecting the website rules. As an example with Reddit, you're probably blocked when using a VPN like Mullvad
Yeah, I've always been a little confused why people use ChatGPT so heavily. It's better than it used to be (maybe thanks to custom configuration), but it still tends to respond like it's writing a Wikipedia article.
Wikipedia articles on demand are great, but not usually what I want.
Heh, a while ago I wondered why ChatGPT had started to reply tersely, almost laconically. Then I remembered that I had explicitly told it to be brief by default in the custom personality settings… I also noticed that there are now various sliders to control things like how many emojis or bulletpoint lists ChatGPT should use, which I though was amusing. Anyway, these tools can be customized to adopt just about any style, there's no need to always prefix questions with "Briefly" or similar.
Here's my prompt to make ChatGPT sound more like Claude.
It works but not as well as I'd like -- the tone and word choice still ends up being really jarring to me (even after years of using ChatGPT). Maybe that's promptable too. Open to suggestions.
---
Respond in a natural conversational style.
In terms of language, match my own tone and style.
Keep responses to half a page or so max. (Use context and your judgment. e.g. for example, initial response can be a page, and then specific follow up questions can be shorter, if the question is answered clearly)
Prefer minimal formatting. Don't use headings, lists etc. Bold and italics OK but keep it tasteful.
If you're starting a paragraph like so
Item name: description..
then it makes sense to bold item name for readability purposes.
Yep the experience is quite something. Another thing I've noticed, and you likely soon will also, is that Claude only attempts a follow-up if the one is needed or the prompt is structured for it. Meanwhile ChatGPT always prompts you with a choice of next steps. It can be nice, as sometimes the options contain improvements you never thought of and would like, but in lengthy conversations with a detailed plan it does things really piecemeal, as though trained to maximize engagement instead of getting to a final solution.
> Which isn't bad or anything but it felt... strange?
On the contrary, it's great. It's fully capable of outputting a wall of text when required, so instead of feeling like I'm talking to something that has a minimum word count requirement, I get an appropriate sized response to the task at hand.
In my limited experience, that's mostly since the 4.6 release. I noticed that with the same prompt, it answers much more briefly. A bit jarring indeed, but I prefer it. Less bs and filler, and less burning off electricity for nothing.
But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"
2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL
3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile
As said, a working version of a similar function was given upfront.
I have used Claude (incl. Opus 4.6) fairly extensively, and Claude still spits out quality that is far below what I would call production ready - both littered with smaller issues, but also the occasional larger blunder. Particularly when doing anything non-trivial, and even when guiding it in detail (although that admittedly reduces the amount of larger structural issues).
Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).
I don't get it though. Why do you expect perfect responses? Humans continually make mistakes, and AI is trained on human data. Yet there seems to be this higher bar of expectation for the latter. Somehow people expect this thing that's been around for a few weeks/months, and cannot learn anything more beyond its training cutoff date, to always do a better job than a human who's been around for 20+ years and is able to learn on their own until death.
I don't expect that - am merely responding to the parent comments claim that Claude consistently one-shots production ready code (which does not at all match my observations).
I can show you a timeseries data-renderer which was created with 1 initial very large prompt and then 3 following "change this and that" prompts.
The file is around 5000 lines and everything works fine & exactly as specified.
I see this over and over again. I don't dispute your experience. My experience with ESP32 development has been unreasonably positive. My codebase is sitting around 600k LoC and is the product of several hundred Opus 4.x Plan -> Agent -> Debug loops. I review everything that goes through, but I'm reviewing the business logic and domain gotchas, not dumb crap like what you and so many others describe.
What is so strange to me is that surely there is more C# out there than ESP-IDF code? I don't have a good explanation beyond saying that my codebase is extensively tested and used; I would know very quickly if it suddenly started shitting the bed in the way you explain.
The more code is out there, the worse is the average in the training dataset. There will be legacy approaches and APIs, poor design choices, popular use cases irrelevant for your context etc that increase the chances of output not matching your expectations. In Java world this is exactly how it works. I need 3-5 iterations with Claude to get things done the way I expect, sometimes jumping straight to manual refactoring and then returning the result to Claude for review and learning. My CLAUDE.md (multiple of them) are growing big with all patterns and anti-patterns identified this way. To overcome this problem model needs specialized training, that I don‘t think the industry knows how to approach (it has to beat the effort put in the education system for humans).
I also believe this must be true. Try asking Claude to program in Forth, I find the results to be unreasonably good. That's probably because most of the available Forth to train on is high quality.
> My experience with ESP32 development has been unreasonably positive. My codebase is sitting around 600k LoC and is the product of several hundred Opus 4.x Plan -> Agent -> Debug loops.
I feel like this is an example of people having different standards of what “good” code is and hence the differing opinions of how good these tools are. I’m not an embedded developer but 600K LOC seems like a lot in that context, doesn’t it? Again I could be way off base here but that sounds like there must be a lot of spaghetti and copy-paste all over the codebase for it to end up that large.
I don't think it's that large. Keep in mind embedded projects take few if any dependencies. The standard library in most languages is far bigger than 600k loc.
> Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
That said I do personally feel Claude to produce far better results than competitors.
> One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
In my experience working in a large codebase with a good set of standards that's not the case, I can supply examples already existing in the codebase for Claude to use as a guidance and it generates quite decent code.
I think it's because there's already a lot of decent code for it to slurp and derive from, good quality tests at the functional level (so regressions are caught quickly).
I do understand though that on codebases with a hodge podge of styles, varying quality of tests, etc. it probably doesn't work as well as in my experience but I'm quite impressed about how I can do the thinking, add relevant sections of the code to the context (including protocols, APIs, etc.), describe what I need to be done, and get a plan back that most times is correct or very close to correct, which I can then iterate over to fix gaps/mistakes it made, and get it implemented.
Of course, there are still tasks it fails and I don't like doing multiple iterations to correct course, for those I do them manually with the odd usage here and there to refactor bits and pieces.
Overall I believe if your codebase was already healthy you can have LLMs work quite well with pre-existing code.
Whether we do or not it's besides the point. The comparison was between Claude, which produced competent greenfield code, and Gemini which struggled with brownfield. The comparison is stacked in Claude's favour.
That's been my experience too. I'm using the recent free trial of OpenAI Plus to vibe code, and from this I would say that if Claude Code is a junior with 1-3 years of experience, OpenAI's Codex is like a student coder.
Does it depend on what type of programming you do? Doing Swift/SwiftUI work, I have exactly the opposite experience. I’ve been using both recently, and I want to use Claude alone (especially after the last week’s events), but Codex is just so much faster and better.
Swift/SwiftUI are two of the three experimental projects I'm using Codex on, the other is a physics simulation in python.
It keeps trying to re-invent the wheel, does a bad job of it.
The physics sim was supposed to be a thin wrapper around existing libraries, but instead of that it tried to write all the simulation code itself as a "fallback" (but it was broken), and never actually installed the real simulators that already did this stuff despite being told to use them in the first place. The last few dozen(!) prompts from me have been pairs of ~["Find all cases where you've re-invented the wheel, add them to the planning document", "now do them"]. And it's still not finished removing the original nonsense, so far as I can tell.
One of the two Swift experiments is just a dice roller, it took about 10 rounds of non-compiling metal shaders (I don't know metal, which is why I didn't give up and do that by hand after 4) before I managed to get that to work, and when it did work it immediately broke it again on the next four rounds. It wrote its own chart instead of using Swift Charts, and did it badly. It tried to put all the hamburger menu options into a UIAlertController. Something blocks the UI for several seconds when you change the dice font. I didn't count how many attempts it took to correctly label the D4.
The other Swift experiment was a musical instrument app, that got me to the prototype stage, eventually, but in a way that still felt like a student's project rather than a junior's project.
For the swift apps, at least half of the errors are of a type where I wouldn't expect to have needed to tell someone to not do it like that, and only a student could reasonably be expected to not know better.
For the python physics sim, step 1 was to generate the plan, the prompt included "I want actual plasma physics, including high-density, high-field regimes, externally applied fields, etc., so consider which FOSS libraries would suit this.", and then it proceeded itself to choose some existing libraries, and I made sure those specific named FOSS libraries actually ended up in the plan.
My first clue this wasn't going to work was that even from step 1 it was pushing for writing all the simulation code and not actually using e.g. WarpX despite that it itself had suggested WarpX. In fact, even when WarpX was in the plan, it was "integrate" rather than "just use this from the get-go".
I may well throw the whole thing out and try again with Claude when this trial expires. Most of the runs have been comically non-physical, to the extent you don't even need a physics degree to notice, or even a physics GCSE.
(Just outside edit window, I now realise I was ambiguous in this comment, it was more like "Find all cases where you've re-invented the wheel, add their removal to the planning document")
> But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled promp
That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.
To keep its current position Claude has to keep improving at the same pace as the competitor.
One day I'd like to create a server in my basement that just runs a few really really nice models, and then get some friends and CO workers to pay me $10 a month for unlimited access.
All with the understanding that if you hog the entire server I'm going to kick you off, and if you generate content that makes the feds knock on my door I'm turning over the server logs and your information. Don't be an idiot, and this can be a good thing between us friends.
It would be like running a private Minecraft server. Trust means people can usually just do what they want in an unlimited way, but "unlimited" doesn't necessarily mean you can start building an x86 processor out of redstone and lagging the whole server. And you can't make weird naked statues everywhere either.
Usually these things aren't issues among a small group. Usually the private server just means more privacy and less restriction.
Amazing how analogous this is to the early Internet when people started running web servers out of their basement and then eventually graduated up to being their own dial-in ISP…
I wrote off ChatGPT/OpenAI because of Sam Altman and those eyeball scan things - so sort of even before all this was a rage and centre stage. Sometimes it's just the gut feeling, and while it may not always be accurate, if something doesn't "feel" right, maybe it is not right. No one else is all good either, but what I mean to say is there are some entities/people who repeatedly don't feel right, have things attached to them that never felt right, etc., and you get a combined "gut feeling". At least that's how it was for me.
> I’m pretty sure they would allow AGI to be used for truly evil purposes
It's perfectly possible that 'truly evil purposes' were the goal all along. Slogans and ethics departments are mere speed bumps on the way to generational wealth.
I know this is necessarily a very unpopular opinion however.
I think HN in particular as a crowd are very vulnerable to the halo effect and group think when it comes to Anthropic.
Even being generous they are only very minimally a "better actor" than OpenAI.
However, we are so enthralled by their product that we tend to let the view bleed over to their ethics.
Saying we want out tools used in line with the US constitution within the US on one particular point. Is hardly a high moral bar, it's self preservation.
All Anthropic have said is:
1. No mass domestic surveillance of Americans.
2. No fully autonomous lethal weapons yet.
My goodness that's what passes for a high moral standard? Really anything that doesn't hit those very carefully worded points is not "evil"?
Lets generalise a bit more here - every company at any time could completely heel-turn and do awful things. Even my favourite private companies (e.g. Valve) have done things that I would consider evil.
However, I would think I'm not alone in that I'm generally wanting to do good while also wanting convenience, I know that really every bit of consumption I do is probably negative in some ways, and there is no real "apolitical" action anyone can take.
But can't I at least get annoyed and take my money somewhere else for the short amount of time another company is doing it better?
Yes, if openAI suddenly leaps forwards with codex and pounds anthropic into the dust, I'll likely switch back despite my moral grievances, but in a situation where I can get mildly motivated to jump over for something that - to me - seems like a better morality without much punishment to me, I'll do it.
There are no universial morals. Anything - everything you think is evil some culture (possibly in history) thinks is good). I can't even think of something good that I'm confident everyone would agree is good.
there are some people (companies are run by people) that are so bad I boycott them. Most bad I treat like society cannot work without accepting them anyway.
Well, they did stand up to the US administration and lost a lot of money in the process. That takes courage. They clearly were being bullied into compliance, and they stood their ground.
You can see the significance of this is you look at German Nazi history. If more companies had stood up to the administration, the Nazi state would have been significantly harder to build.
In my opinion, what Anthropic did is not a small thing at all.
> Yet Anthropics stance is only two narrow restrictions.
Really I think Anthropic should have a single restriction: to not assist with illegal or unconstitutional activities. If automated killings etc is illegal then it would be covered by that one rule.
I don't think Anthropic should be in the business of deciding what is "evil".
If each of us individually or as corporations should not be in the business of deciding what it "evil", who should be in that business?
Everyone SHOULD continuously consider, decide, and live by moral judgements and codes they internalize, and use to make choices in life.
This aspect of life should NEVER be outsourced — of course, learn from and use codes others have developed and lived by — but ALWAYS consider deeply how it works in your situation and life.
(And no, I do NOT mean use situational ethics, I mean each considering, choosing, and internalizing the codes by which they live).
So, yes, Anthropic and anyone else building products absolutely should be deciding for themselves what they will build, for what purposes it is fit to use, and telling others about those purposes. For products like AI, this absolutely includes deciding what is "evil" and preventing such uses.
If the customer finds such restrictions are not what they want, they ARE FREE to not use the product.
Let's not forget they also lobby to forbid models from China and pretend that distillation is stealing. but somehow just because they said no to two points the majority of HN folks think them as virtuous.
I tried Claude recently (after they dropped the nonsensical requirement to give them your phone number) and I was surprised to see how significantly less sycophant it was. Chatgpt, unless you are talking hard science, tends to be overly agreeable. Claude questions you a lot (you ask for x and it asks you stuff like: why are you interested in x, or based on our previous convo, x might not be suitable to you, or I see your point but based on our previous convo, y is better than x, etc). Chatgpt rarely does that.
Of course, also OpenAI being ran by openly questionable people while Dario so far doesn't seem nowhere near as bad even if none of them are angels.
Yes they have a great marketing team and a powerful astro turfing presence though, especially with the recent "Claude beat up OpenClaw! OpenAI is supporting the community by buying it!" and that nonsense.
Though tbh I hardly feel Claude is innocent either. When their safety engineer/leader left, I didn't see any statements from the Anthropic team not one addressing the legitimate points of his for why he left. Instead we got an eager over-push in the media cycle of "Anthropic standing up to DOD! Here's why you can trust us!"
It's all sounds too similar to propaganda and astroturfing to me.
I did the same thing and cancelled my OpenAI plan today. Besides boycotting it for their latest grifting I also found it to not really produce much value in my use cases.
Moving back to doing this archaic thing called using my own brain to do my work. Shocking.
I'm switching over to Claude from OpenAI, and I don't care. OpenAI's image generation is terrible anyway. Just try to get it to generate something to scale, like a cabinet for a specific kitchen or bathroom space. Give it all the explicit constraints, initial sketches, etc. it wants.
The results are laughably bad.
Sure, it does get some of the tones and features, but any kind of actual real-world constraint is so far off, and the dimension indicators it includes are hilarious if they weren't so bad.
I swear HN is just a bunch of fanboys full of NPC behavior.
OpenAI - since the beginning has been anything but open.
If you spoke anything ill about OpenAI here until yesterday, you would be downvoted into oblivion because, let's face it, Sam has always been the poster child of this community.
So, basically, even after them publicly announcing they were evaluating licensing models where they wanted to take a % of your business for using their models [1], there was still 0 outrage, and anyone who pointed that out, always got shot back with "OpenAI CAN DO NO WRONG" in the comments always.
He makes one decision you all don't agree with and now it's cancel culture time?
And somehow, Anthropic is the hero in all this? Make no mistake - all the model providers are building detailed user models. Every bit of information you provide to it is of course being used to for detailed user targeting. This is no different than the "Apple GOOD, Google BAD!" tropes.
There are no heroes in for-profit corporations. Everyone is operating a for-profit business model and optimizing for the same profits.
Stop with the NPC behavior. We are better than this.
"Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path."
OpenAI actually does have two excellent OSS models. Not Anthropic. Not that OpenAI is 'open' per se, but more so than Anthropic. Also see the Codex vs Claude Code extensibility.
They are far from excellent and they were open sourced due to the mounting pressure for calling themselves "Open" AI and not doing anything open. At the time, they also had Chinese competitors wiping the market value of many stocks (NVidia, etc.) after releasing true OSS models that performed as good as SOTA models and they had to retaliate. I don't know of anyone who uses those OSS models in production instead of Qwen series or DeepSeek.
> Random guy on the internet posts links to cancel ChatGPT subscription
> Cancels subscription
> Random guy on the internet tells you to be outraged
> Gets outraged
I'm not even a fan of OpenAI generally speaking, but, this is just silly cancelling them for no reason. If not them, some other lab would have done it. Or worse, DoW would've forced them to.
> I swear HN is just a bunch of fanboys full of NPC behavior.
Why are you assuming these are real people and not NPCs?
The amount of money flowing around AI is staggering. To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
> To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
You don't use "believe" with "disingenuous": it literally makes zero sense.
If people honestly believe that, they may be naive. Or they can be "disingenuous" if they're not being sincere. But if you just say what you believe, you're sincere (and maybe naive), and hence cannot possibly be disingenuous.
I never understood the point of this kind of comment. It doesn't add any value or anything to the discussion. Its basically two paragraphs with some presupposition (openai bad) and how the author is virtuous by canceling his subscription. No explanation, argument, nuance. Its just virtue signaling. Actually... I guess I do know the point of this kind of comment. I just don't know why these kinds of comments get upvoted, even if you do agree openai bad
Could someone explain the appeal of account-wide memory to me? Anthropic’s marketing indicates that nothing bleeds over, but I’m just so protective of my context that I cannot imagine having even a majorly distilled version of my other chats and preferences having on weight on the output. As for certain preferences like code styling or response length, these are all fit for custom instructions, with more detailed things in Skills. Ultimately like many things in LLM web UX, it seems to cater to how the masses use these tools.
Most normal people want the LLM to remember their interests and favourite things, so they don't have to manually re-explain when asking for advice.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
Do you have example prompts where this would be usual? Why would you want an LLM to know your favorite type of cheese? Now that I say that, I guess if you use it for recipes then it's useful if it remembers things like dietary restrictions. And even then a project seems like the better option.
I can't think of much else though so I'm still curious what you or others use it for.
ChatGPT knows what's in my bar and what types of base liquors I love and/or can't drink. It knows what fruit, syrups and mixes are in my fridge. It knows that my friend is allergic to mint. It knows that when I ask for recommendations, I tend to want a choice between spirit forward, tiki, martini and herbaceous.
ChatGPT knows the broad strokes of the 3-4 main hardware projects I have on the go, and depending on the questions I'm asking, it will often structure its responses in a way that differentiates based on which one I'm thinking about.
It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
I knows what kind of solder I use, and has warned me about components with sensitive reflow temperature concerns.
It's an extraordinarily useful feature for engineering and drinking, two things that are commonly found in the same Venn diagram.
> It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
Also relevant: it knows that you know what a resistor and capacitor is, and is able to tune responses to your level of knowledge. (It's not great at this, in my experience, since domain knowledge is still so jagged, but I think it's better than nothing.)
Thank you! That helped me understand. Hobbies that you regularly do, and an LLM is continuously helpful for, benefiting from memory.
Personally, I would still be wary of the black box aspect -not knowing what it does remember and what it doesn't - so I would probably still use projects to make it more deterministic. But that's probably being overcautious and unnecessary in most common cases.
I broke my ankle and have multiple chats related to medicine, physical therapy, pain management, lawyer questions, how to handle messaging to boss and HR
Can projects overlap? If not there’s general context information that’s often useful.
My job, my kids and time preferences around those things, my preferred tech setup and way of working and types of tech I’m better at. Things I already have (home assistant, little nuc, etc). I can throw a random question and not have to add this kind of information or manage it.
I get that those are the things that go into memory. What I don't get is what kind of prompt your job and kids are useful information for. Especially on the regular.
Science experiments explained at a few levels, finding good background info and where to read up about some safety information
Maths help for specific areas my kids are looking at and proposed games for that
Evaluation of coding options for my kids
How to link up some ideas on coding, electronics and using the home automation side as some fun outputs
LED strip info and work, again integrating with smart homes and what’s good around the kids
Framework evaluations for automation at work and home
Crystal identification
Looking up local council info
Relevant music suggestions for kids to play on the piano
Here some things cross over. I’m happy writing code, I typically want easy open source options, I have languages and tech I prefer, I’m moving g things to matter, I have home assistant, my son is excellent at maths given his age but I’m working more on comprehension of problems, and a lot more. All those are things that with a bit of background info change the types of answers I get and make it more useful.
The reply about knowledge about their job and familt made me think.
The only thing I can now think of is using it as a personal therapist. Or asking how to approach their kids. And they're a bit embarrassed about it, because it's still outside the Overton window -especially on HN - which is why they aren't sharing it.
If someone has different usecases, please do prove me wrong! Maybe I just lack imagination.
I recently asked about baby-led weaning. If my baby were 2 months old, it would have been smart to mention "not yet!" but it knows she's 8 months old and was able to give contextual advice.
Such an incredible amount of personal, intimate knowledge to share with a company. Sure, Google can figure out where I live and who I visit because I have an Android phone, but they'll never know the contents of those relationships.
I have a line in the sand with the AI vendors. It's a work relationship. If I wouldn't share it with a colleague I didn't know super well, I'm not telling it to a AI vendor.
I ask gpt a lot of questions about plants and gardening - I’m happy that it remembers where I live and understands the implications. I could remind it in every question, but this is convenient.
ChatGPT "knows" (has context that includes) some of the things I'm good at, and some of the things I'm not good at. I have my own tolerances for communication and it has context about that, too.
I use the bot for mostly techy things. So, for instance, I'm alright with using tools, and building electronics, and punting around on a Linux box so I don't need my hand held for that. But I'm terrible at writing code, so baby steps and detailed explanation there helps me a lot. I strongly prefer pragmatism and verifiable facts. I despise sycophant speech, the empty positivity of corpo-speak, assumptions, false praise, superfluous verbosity, and apologies and/or the implication of feelings from bots.
Through a combination of some deliberate training (custom instructions, memory), and just using it (shared context), it mostly does what I want in the way that I want it done -- the first time.
I don't have to steer in the right direction with every new session. There was a time when that was necessary, but it is no longer that way. Adjustments happen increasingly automatically these days.
That saves me time and frustration, and enhances the utility of the bot.
Meanwhile: Others have their own skills and preferences that may be very different in comparison to my own. That's OK. We each get to have our own experience.
I use it for my work. So i went it to remember everything about my business, website, the domain, which country we operate and on and on. It’s a ton of context which I don’t want to repeat each time.
That's what projects are for. All the major chatbot companies have some equivalent and all support a standard instruction where you can include anything you need automatically.
In online Claude I often use incognito mode precisely because I don't want results to be influenced by what we talked about earlier. It's getting rather annoying to be honest.
Keep your user prefs minimal and use project memory instead: create a new project, it will only have access to your user prefs, everything else is fresh.
I'll have to try projects I guess, but I just want to sometimes ask questions without it bringing up shit I asked about in the past which isn't relevant to what I'm asking this time.
On the contrary, I cannot understand how people are seriously using LLM outside of software engineering without account-wide memory.
When I ask things like "what do you think John should do next on project A?", I don’t want to have to explain in detail who is John, what is project A and what John was working on before.
It all depends on your usecase(s). For me, "account-wide" memory has only: (a) short description of my hardware/os/display system/etc; (b) mobile hardware and os version; and (c) my age, gender, city/country of residence, and health conditions.
"Stop asking me to apply the plan. I will tell you when I'm ready."
That alone drives me batty. I can easily spend a couple hours and multiple revisions iterating on a plan. Asking me me every single time if I want to apply it is obnoxious.
I currently use ChatGPT for random insights and discussions about a variety of topics. The memory is basically a grown context about me and my preferences and interests and ChatGPT uses it to tailor responses to my knowledge, so I could relate better.
This is for me far more natural and easier than either craft a default prompt preset or create each conversation individually, that would be way too much overhead to discuss random shower thoughts between real life stuff.
This is my use case and I discovered that this can be detrimental to specific questions and prompts and I see that it can be more beneficial to have careful written prompts each time. But my use case is really ad hoc usage without the time. At least for ChatGPT.
When coding, this fails fast. There regular context resets seem to be a more viable strategy.
I see what you mean, but I like having a clean slate even for those one off questions. I don’t want a differing answer to a philosophical inquiry just because the LLM remembers a prior position I’ve written about you know?
I have all the history settings off for this reason, but something that worries me is that there's a fair bit of information about me trained right into the model weights. I'm not "famous" by any stretch but claude has awareness of some of my HN-front-page-hitting projects, etc., which I think should be enough to bias responses (although I haven't tried to measure it).
I set my name to "User" in the settings, so in a clean-slate chat it has nothing to go on, but the moment claude code does something like `git log` it knows who I am again. I've even considered writing some kind of redaction proxy.
FWIW, both OpenAI and Anthropic have a toggle to do a “Temporary/Incognito Chat” that does not use or update memory. I too wish this was the default, and then you could opt in at the end of the chat to save some long term aspects into memory.
That would be interesting, also at the start. As an option what to pull in. ChatGPT memory "improved" and now you normally don't even see anymore what it commits to memory!
I own a lot of dirt bikes, boats, snowmobiles, mowers, and blowers. It's much easier for me to ask about "My Polaris" than it is to ask about my "2011 Polaris Switchback Assault".
Similarly, it remembers the dimensions of my truck, so towing/loading questions don't need extra clarification.
I've told the LLMs that, when traveling, I don't care about nightlife and alcohol. Because they have a memory of this, when I ask for a sample itinerary for a 2 day stay in a new city, it won't waste hours in the day on the party street, wine tasting, etc.
For example, instead of recommending a popular night club, it will recommend the stroll along the river to view the lit up skyline or to visit the night market instead.
It knows other preferences as well (exploring quirky neighborhoods, trying local fast food joints and markets)
maybe. Software is big, but it is only a tiny percentage of the ecconomy. they need to help a lot more than software to justify their datacenter investments. even if we add all engineering that isn't a large percentage. How can they help insurance agents (or eliminate - I don't care either way), plumbers, zoo keepers, and every other job in my city? Some might be they can't - but if they can is a question worth asking.
The few times I've switched over to chatGPT I've been dumbfounded by lines like "...since you already are using SQLite...", referring to projects from months ago.
I know the "memory" function can be disabled, but I have a hard time seeing that it would ever really be useful.
Because I can say “do what you did before, but about the romans this time”
And it will give me a complete rundown of Roman life, because it knows what I was interested in before.
Or you can ask a tax question and it will know you’re an organic rice farmer or whatever. Claude has the best implementation because it has both memory, and previous chat searching. So it will actually read through relevant chats, rather than guessing based on memories.
Well, the masses are wrong. See: insane amounts of compute wasted on “thank you”, “haha true”, “redo it”, etc. I think the UI should be designed to avoid misuse, and I think an ever growing distillation of your most common traits is not a good use of context length. If you want it, specify it. Maybe even hard limits on chat length, why are we 20 replies deep in a single chat? A user friendly option could be a single button that distills that chat down, and opens a new one with prebuilt instructions to continue the conversation. I’m no product designer though, just some thoughts.
This seems to imply that customers assume by default that the LLM remembers their past chats? I feel like the UI makes it incredibly obvious it’s a clean slate every time? But then again people ask ridiculous meta questions all the time to these chatbots expecting a correct answer.
Yeah, but then they went and added "memories" and in particular automatic memory management, and now it isn't a clean slate each time. And that's exactly what this is importing: those automatically curated memories that make the chat bot "feel like" it knows you.
I switched to Claude but the token efficiency and limits are much more noticeable. One or two coding questions and I'm at my session limit. And that is shared with chat too.
I was mostly able to get by with $20 codex but I'll probably have to splurge for the Max plan.
I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Why wouldn't a smart OpenAI PM simply add something "nefarious" on the frontend proxy to "slow down" any requests with exactly that prompt?
I bet they would get their yearly bonus by achieving their KPI goals.
I think they already are. When I used the prompt with 5.2 it gives very concise and general info but if you use older models (5.1 instant or o3) you get a ton of detail.
They can, but then you could tell it to “don’t not do what I’m asking” and force it through. It’s not exactly “programming” with these systems, it’s all just slop.
And the reputational harm would outweigh the benefits of trying to fuck over people leaving.
I tried all of Codex, OpenCode, Claude Code and Cursor these past few weeks. It was surprising to me that all of them have slightly different conventions for where to put skills, how to format MCP servers (how environment variables need to be specified etc), what the AGENTS/CLAUDE file needs to be called, what plugins/marketplaces are...it's a big mess for anyone trying to have a portable config in their dotfiles that can universally apply to any current and future agent.
It also showed me the difference between expectation and reality...even though these are billion dollar companies, they still haven't figured out how to make lag-free TUIs, non-Electron apps, or even respect XDG_CONFIG. The focus is definitely more on speed and stuffing these tools full of new discoveries and features right now
There's a bit of psychology around models vs. harnesses as well. You can't shake off the feeling that maybe Claude would perform better in its native harness compared to VSCode/OpenCode. Especially because they've got so many hidden skills (like the recently introduced /batch), that seem baked into the binary?
The last thing I can't figure out is computer use. Apparently all the vendors say that their models can use a mouse and keyboard, but outside of the agent-browser skill (which presumably uses playwright), I can't figure out what the special sauce is that the Cloud versions of these Agents are using to exercise programs in a VM. That is another reason why there is a switching cost between vendors.
I saw this interesting paper questioning the effectiveness of AGENTS.md, their results were surprising and have me re-thinking my setup: https://arxiv.org/abs/2602.11988
Before this week I was sure Anthropic were actually just as soulless as OpenAi, just because they don't support open standards like AGENTS.md and /.agents/skills. They can so easily win the support of the open source crowd if they just support open standards like these.
I felt that way too, until I noticed how different their schemes are for discovering these files, e.g. Claude will pick up context files in parent folders, and Codex doesn’t.
Maybe it’s better that they maintain different names to prevent people from assuming that they work the same
Now that would make it easier for Codex users to switch indeed! This seems like the best timing for it they're ever gonna get, and worth the ultra tiny loss of marketing value their "CLAUDE.md" naming provides.
For the Anthropic employees here reading along, pitch it to whoever has kept blocking this, because you need to get the most out of this opportunity here.
Big projects should have a lot of nested AGENTS.md files, it's inconvenient and they simply need to add support for the universal standard as everyone else has done rather than being a weird holdout like IE6.
Why would they? They were first with CLAUDE.md. Others could have adopted to that if they wanted. Don’t see a reason for Claude to change their approach.
I got very excited when I saw this title, because I've wanted to consolidate on Claude for a long time. I have been using ChatGPT very extensively for Q&A for 2+ years and I have hundreds of long, very technical conversations which I constantly search and refer to.
The problem (for me, anyway) is that even several megabytes worth of quality "memory" data on my profile would not allow me to migrate if it can't also confidently clone all of my chat history with it.
To be clear, this is a big enough problem that I would immediately pay low three digits dollars to have this solved on my behalf. I don't really want any of the providers to have a walled garden of all my design planning conversations, all of my PCB design conversations. Many are hundreds of prompts long. A clean break is not even remotely palatable short of OAI going full evil.
Look, I'd find it convenient for Claude to have a powerful sense of what I've been working on from conversation #1 onwards. But I absolutely refuse to bifurcate my chat history across multiple services. There is a tier list of hells, and being stuck on ChatGPT is a substantially less painful tier than needing to constantly search two different sites for what's been discussed.
If you want your conversation history I think we could figure something out with headless browser automation. I would be hesitant to use their wire protocols directly.
This should in theory be solveable by using a custom frontend and only using the various backend APIs as stateless inference providers, but everything I've tested falls flat on a few aspects: Chat history RAG and web search, and to a lesser extent tool use.
Yes, all of these are theoretically possible (the APIs now all support web search, as far as I know, there are RAG APIs too, and tool use has been supported for a while), but the various "chat" models just seem to be much better at using their first-party tools than any third-party harness, which makes sense that this is what they've been trained on.
I've had friends suggest a custom frontend several times, but unless that frontend starts off by faithfully downloading and recreating my entire chat history... now I just have two problems.
That part should be fairly easy to solve, no? At least ChatGPT allows exporting your entire chat history; importing that into whatever frontend seems well within a current agent's capabilities.
It was amazing to me how bad cursor is with using the same model I use in Claude. Even with little knowledge on how to test the llms I was able to get very minimal mvps. But I find the real trick is to have the proper tools to reign in the ai.
Thorough CLAUDE.md, that makes sure it checks the tests, lints the code, does type checks, and code coverage checks too. The more checks for code quality the better.
It’s just a bowling ball in th hands of a toddler, and needs to ramp and guide rails to knock down some pins. Fortunately we get more than 2 tries with code.
A week ago, I was anti-Anthropic because I questioned their business model. Now they are my preferred provider - what a difference a week makes. I still prefer running olen models on my own hardware, but it is unreasonable to use powerful models when required.
I’m pretty divided on “memory”. There are times it can feel almost magical but more often than not I feel like I am fighting with the steering wheel.
Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider.
There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat?
Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught.
I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions.
The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”.
The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason).
Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help.
I've been thinking this too. I frequently do deep research on some systems programming technique, ask it to generate a .md for it, and then I use that in later sessions with Claude Code "look at the research I collected in {*-research}.md and help me explore ways to apply it to {thing}".
At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to.
And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with.
I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison.
Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work.
I'm very curious, will OpenAI basically block "I'm moving to another service and need to export my data. List every memory you have stored about me, ..." and similar, if so how and why?
It's very interesting to learn more about because it challenges 1 core aspect of the economical competition : the moat.
If one can literally swap one AI service for another, then where does the valuation (and the power that comes with it) come from?
PS: I'm not interested in the service itself as I believe the side effects of large scale for-profit are too serious (and I don't mean doomdays AI takeover, I simply mean abuse of power, working conditions, downskilling, political influence as current contracts with US defense are being made, ads, ecological, etc) to be ignored.
I can see how being able to bring your chats with you would be appealing. But the truth is that context rot is real, context management is everything, and more often than not stating from a blank slate yields the best results.
That being said, if you have a library of images or some other collection artifacts / assets indexed on their servers that is a different story.
I have multiple years of extremely dense, technical design and planning conversations locked in the ChatGPT web interface.
Hearing that starting from a blank slate yields the best outcomes is sort of like hearing extremely wealthy people talk about how money doesn't make you happier.
At least as an EU user I was also able to export ALL my data, audio files images etc in one zip. Took exactly (on the minute) 24 hours for the download link to arrive but hey.
This way you can have Claude distill the memory as you wish.
I don't understand how people use these apps with memory enabled. I am always carefully controlling the context of each conversation. The idea that past conversations could bleed into current ones is unthinkably terrible.
I'm not talking about deleting conversations. Anthropic's guide isn't going to actually move your conversation history anyway. The purpose of this feature is to move over specific memories which the AI can use in future responses.
But I have this feature turned off, and I cannot imagine ever wanting to turn it on, because I am always thinking carefully about what the AI "knows" when it generates a given response. For example, since I know that the AI always wants to make me happy, when I ask for an "opinion" I'm careful to not let the AI know which answer I'd prefer. I'll often try phrasing the question in different ways to see if it changes the outcome.
On a related note, I have been experimenting with a small prototype for cross-agent, device-local active memory called brAIn (https://github.com/glthr/brAIn). It delivers a personalized agent experience with everything stored locally in a single file (agent.brain), and supports reusing semantic memory across projects. In practice, this means brAIn can identify and apply behavioral patterns you have used in other contexts whenever they are relevant. (I realize the repository should include a concrete example of this, and I will update it today to add one).
This method of copying an LLM-generated summary of your preferences into Claude memory feels similar to their recommendation to use /init to generate a CLAUDE.md based on the project, which recent research[0] suggests may be counterproductive.
I would assume both Claude memory and CLAUDE.md work best when they're carefully curated, only containing what you've found yourself having to repeat.
Why not use Claude Code from the cli and follow along in your IDE? I did not quite believe when people were telling me or understand what I was missing until I tried it, but after trying that set up I am convinced that it is superior. I don’t have any hard data to back it up, but it feels much more capable that way.
AFAIK the claude vs code plugin uses claude code under the hood.
I recent switched from vs code copilot to open code and I kinda miss it. Just selecting text and directly asking the chat. Or seeing the generated code in the ide to accept it reject it. It's neat.
Being able to import context and preferences from other AI providers in one step saves a lot of time, especially for ongoing projects. It makes Claude feel seamless and continuity-friendly. Having this on all paid plans adds great value for heavy users.
If Claude could stay available I might consider it. Unfortunately right now, out of the big three, only Gemini has reliable uptime. As much as I dislike Google it's the only reliable option.
Gemini’s web UI and mobile app are horrible. Gemini outputs malformed links that lead BACK to gemini.google.com. There are constant bugs with the side panel not showing your chats or the current chat timing out for no reason. Also, the mobile app has an issue if your text input is too long where the entire text entry box lags, even to the point of locking up the entire app. Openrouter’s web ui runs circles around all the frontier lab UIs. I even prefer their PWA to any of these mobile apps.
I just use the web interface. I don't use mobile apps for things that should be websites.
It's a shame because when Claude is working well it is the best for actual algorithmic coding. There's so much cruft around it now, memories being the most annoying part of that.
80% of the time I just use these things as a sounding board when exploring options and I need responsiveness for that.
I agree, it's definitely attempting to gaslight us all.
I find I need to explain I know what I'm talking about first before it gives me non-patronising answers.
It definitely advertises Google services and I would say I hate it. But it's just reliably available. Neither Claude nor ChatGPT are responding at all today.
>I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Memory in general Chat apps is actually more harmful than helpful imo.
It biases the LLM responses to your background which has the same effect as filter bubbles. You end up getting your own thoughts spit back at you.
Of course sometimes this is useful if you only use your chatbot to ask personal things like: "What should I eat today?".
But if you use it for anything else you're much better off having full control over the prompt. I can always say: "Hey btw I am german and heavily anti surveillance, what should I know about the recent anthropic DoW situation?" but with memory I lose the option of leaving out that first part.
I rather switch it to nowhere. But local.
I am not completely sure about the details, but I am leaning heavily, and investigating into this direction. With chat and agentic tools there plenty, accessing multiple models, and everything is evolving fast (extinct and come into existence) better keep ourselves flexible, not tied to any of the solutions. Especially not storing data in accounts. The fate of those is uncertain.
Interesting, do you have any repo links or other sources on your experiment ? also regarding prior stale state looping, don't you think the agent could detect that by itself if given a sub-task to monitor for it?
All of the Claude models are smarter than the GPT models. I had a few threads that I migrated from GPT to Claude and every single one Claude pointed out problems. Two examples:
1. In one, I was putting together a server build. Claude correctly pointed out some incompatibilities in some parts that GPT had recommended.
2. In another chat, I had asked for help interpreting lab results and suggesting supplements. Claude pointed out that GPT was over-interpreting the results and suggesting things that weren't backed up by facts.
I presented Claude's response back to GPT and in both of these specific cases, GPT admitted it was wrong and didn't have any rebuttal. It's hard to say without doing a more scientific experiment whether GPT is indeed worse, but anecdotally I find myself pointing out flaws in Claude's reasoning far less frequently than GPT, especially with Opus.
Another less important distinction: GPT has a very distinct writing style that heavily formats responses and repeats itself a few times. Claude is succinct and mostly writes like a person might. It's easier to talk to and feels less "cringe" and sycophantic.
I regularly (Say, once a month) do a comparison of results across all Claude, Gemini and ChatGPT. Just for reasons, not that I want to see if there's any benefit in changing.
It's not "fair" in that I pay for Claude [1] and not for the others, so models availability is not complete except for Claude.
So I did like things at time in the form of how they were presented, I came to really like Sonnet's "voice" a lot over the others.
Take into account Opus doesn't have the same voice, and I don't like it as much.
[1] I pay for the lower tier of their Max offering.
ChatGPT swings between writing degenerate free use shit and telling you that you should wait until marriage. Lots of moralism to it, really tries to censor you and manipulate you, even in normal conversations. Generally smart and capable, but the whiny attitude gets old.
Grok has zero filter, but is dumber than the others. Definitely built around cheapness. Caps answers at about 2500 words at most. Can be very funny because it will go along with anything.
Gemini sells all your data and doesn’t seem to have much of note. Offers some nice formatting options.
Claude is business focused so it won’t do anything degenerate, but its answers in general aren’t whiny. It might not do something, but it doesn’t attack you with morality.
Claude does not cap answer length and will do whatever needs doing. Their pricing is based around true usage, not message quantities, so it’ll write a mega message if it needs to.
It has the best memory implementation, combining both memories and RAG of your chat history. Projects have their own independent memories and RAG.
Claude code is ridiculously capable. In a few hours I produced something which would have taken months and £50,000 at least to produce.
I switched not because I thought Claude was better at doing the things I want. I switched because I have come to believe OpenAI are a bad actor and I do not want to support them in any way. I’m pretty sure they would allow AGI to be used for truly evil purposes, and the events of this week have only convinced me further.
And the weirdest thing that I noticed: instead of skimming the response to try finding what was relevant, I just straight up read it. Kind of felt like I got a slight amount of focus ability back.
Accuracy is something I can't really compare yet (all chatbots feel generally the same for non-pro level queries), but so far, I'm fairly satisfied.
Apparently this annoying "next step" behavior is driven by the system prompt, since the other day I was running Gemini 3 Thinking, and it was displaying it's thoughts which included a reminder to itself to check that it was maintaining a consistent persona, and to make sure that it had suggested a next step. I'd love to know the thought process of whoever at Google thought that this would make for a natural or useful conversation flow! Could you imagine trying to have a conversation with a human who insisted on doing this?!
You can store the page as markdown for future sessions, mash the data w other context, you name it.
The web Claude is incredibly limited both in capability and workflow integration. Doesn’t matter if you’re dealing with bids from arbor contractors or researching solutions for a DB problem.
I made one for Crush a while ago.
https://anduil.neocities.org/blog/?page=mcp
I'm not sure about the issues with reddit though? Do they block Claude's web fetch tool? I think Codex runs it thru some kind of cache proxy.
Sites like Reddit are blocking AI providers, they have to have some contract with them for access. OpenAI does seem to have that.
When sites are working in one chatbot and not another, there's a good chance that the latter is respecting the website rules. As an example with Reddit, you're probably blocked when using a VPN like Mullvad
Wikipedia articles on demand are great, but not usually what I want.
It works but not as well as I'd like -- the tone and word choice still ends up being really jarring to me (even after years of using ChatGPT). Maybe that's promptable too. Open to suggestions.
---
Respond in a natural conversational style. In terms of language, match my own tone and style.
Keep responses to half a page or so max. (Use context and your judgment. e.g. for example, initial response can be a page, and then specific follow up questions can be shorter, if the question is answered clearly)
Prefer minimal formatting. Don't use headings, lists etc. Bold and italics OK but keep it tasteful.
If you're starting a paragraph like so
Item name: description..
then it makes sense to bold item name for readability purposes.
I never really used ChatGPT much though so maybe Claude is just relatively less egregious?
On the contrary, it's great. It's fully capable of outputting a wall of text when required, so instead of feeling like I'm talking to something that has a minimum word count requirement, I get an appropriate sized response to the task at hand.
For ChatGPT and Gemini, yes.
But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"
2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL
3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile
As said, a working version of a similar function was given upfront.
With Claude, I never have such issues.
Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).
same here :)
> one-shots a larger/complex C# feature
I can show you a timeseries data-renderer which was created with 1 initial very large prompt and then 3 following "change this and that" prompts. The file is around 5000 lines and everything works fine & exactly as specified.
Yep, this is another case of different standards for "production ready."
What is so strange to me is that surely there is more C# out there than ESP-IDF code? I don't have a good explanation beyond saying that my codebase is extensively tested and used; I would know very quickly if it suddenly started shitting the bed in the way you explain.
We already have coding tuned models i.e. Codex. We should just have language / technology specific models with a focus on recent / modern usage.
Problem with something like Java is too old -- too many variants. Make a cut off like at least above Java 8 or 17.
I feel like this is an example of people having different standards of what “good” code is and hence the differing opinions of how good these tools are. I’m not an embedded developer but 600K LOC seems like a lot in that context, doesn’t it? Again I could be way off base here but that sounds like there must be a lot of spaghetti and copy-paste all over the codebase for it to end up that large.
Is these more related to the existing source code or is this a bad pattern thar you would never do regardless of the existing code?
One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
That said I do personally feel Claude to produce far better results than competitors.
In my experience working in a large codebase with a good set of standards that's not the case, I can supply examples already existing in the codebase for Claude to use as a guidance and it generates quite decent code.
I think it's because there's already a lot of decent code for it to slurp and derive from, good quality tests at the functional level (so regressions are caught quickly).
I do understand though that on codebases with a hodge podge of styles, varying quality of tests, etc. it probably doesn't work as well as in my experience but I'm quite impressed about how I can do the thinking, add relevant sections of the code to the context (including protocols, APIs, etc.), describe what I need to be done, and get a plan back that most times is correct or very close to correct, which I can then iterate over to fix gaps/mistakes it made, and get it implemented.
Of course, there are still tasks it fails and I don't like doing multiple iterations to correct course, for those I do them manually with the odd usage here and there to refactor bits and pieces.
Overall I believe if your codebase was already healthy you can have LLMs work quite well with pre-existing code.
Don't we all?
- literal Claude ads I see online
- my underperforming coworkers whose code I’ve had to cleanup and know first hand that no, it wasn’t flawless
This kind of sentiment is gaslighting CTOs everywhere though. Very annoying.
It keeps trying to re-invent the wheel, does a bad job of it.
The physics sim was supposed to be a thin wrapper around existing libraries, but instead of that it tried to write all the simulation code itself as a "fallback" (but it was broken), and never actually installed the real simulators that already did this stuff despite being told to use them in the first place. The last few dozen(!) prompts from me have been pairs of ~["Find all cases where you've re-invented the wheel, add them to the planning document", "now do them"]. And it's still not finished removing the original nonsense, so far as I can tell.
One of the two Swift experiments is just a dice roller, it took about 10 rounds of non-compiling metal shaders (I don't know metal, which is why I didn't give up and do that by hand after 4) before I managed to get that to work, and when it did work it immediately broke it again on the next four rounds. It wrote its own chart instead of using Swift Charts, and did it badly. It tried to put all the hamburger menu options into a UIAlertController. Something blocks the UI for several seconds when you change the dice font. I didn't count how many attempts it took to correctly label the D4.
The other Swift experiment was a musical instrument app, that got me to the prototype stage, eventually, but in a way that still felt like a student's project rather than a junior's project.
Did you put in the original prompt the "wheels" you wanted it to use? It's a toss-up when you aren't very specific about what you want.
For the python physics sim, step 1 was to generate the plan, the prompt included "I want actual plasma physics, including high-density, high-field regimes, externally applied fields, etc., so consider which FOSS libraries would suit this.", and then it proceeded itself to choose some existing libraries, and I made sure those specific named FOSS libraries actually ended up in the plan.
My first clue this wasn't going to work was that even from step 1 it was pushing for writing all the simulation code and not actually using e.g. WarpX despite that it itself had suggested WarpX. In fact, even when WarpX was in the plan, it was "integrate" rather than "just use this from the get-go".
I may well throw the whole thing out and try again with Claude when this trial expires. Most of the runs have been comically non-physical, to the extent you don't even need a physics degree to notice, or even a physics GCSE.
That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.
To keep its current position Claude has to keep improving at the same pace as the competitor.
That's, just, like, your opinion, man.
One day I'd like to create a server in my basement that just runs a few really really nice models, and then get some friends and CO workers to pay me $10 a month for unlimited access.
All with the understanding that if you hog the entire server I'm going to kick you off, and if you generate content that makes the feds knock on my door I'm turning over the server logs and your information. Don't be an idiot, and this can be a good thing between us friends.
It would be like running a private Minecraft server. Trust means people can usually just do what they want in an unlimited way, but "unlimited" doesn't necessarily mean you can start building an x86 processor out of redstone and lagging the whole server. And you can't make weird naked statues everywhere either.
Usually these things aren't issues among a small group. Usually the private server just means more privacy and less restriction.
It's perfectly possible that 'truly evil purposes' were the goal all along. Slogans and ethics departments are mere speed bumps on the way to generational wealth.
I think HN in particular as a crowd are very vulnerable to the halo effect and group think when it comes to Anthropic.
Even being generous they are only very minimally a "better actor" than OpenAI.
However, we are so enthralled by their product that we tend to let the view bleed over to their ethics.
Saying we want out tools used in line with the US constitution within the US on one particular point. Is hardly a high moral bar, it's self preservation.
All Anthropic have said is:
1. No mass domestic surveillance of Americans.
2. No fully autonomous lethal weapons yet.
My goodness that's what passes for a high moral standard? Really anything that doesn't hit those very carefully worded points is not "evil"?
However, I would think I'm not alone in that I'm generally wanting to do good while also wanting convenience, I know that really every bit of consumption I do is probably negative in some ways, and there is no real "apolitical" action anyone can take.
But can't I at least get annoyed and take my money somewhere else for the short amount of time another company is doing it better?
Yes, if openAI suddenly leaps forwards with codex and pounds anthropic into the dust, I'll likely switch back despite my moral grievances, but in a situation where I can get mildly motivated to jump over for something that - to me - seems like a better morality without much punishment to me, I'll do it.
there are some people (companies are run by people) that are so bad I boycott them. Most bad I treat like society cannot work without accepting them anyway.
Although we shouldn't let that mean we misjudge what we are actually getting.
You can see the significance of this is you look at German Nazi history. If more companies had stood up to the administration, the Nazi state would have been significantly harder to build.
In my opinion, what Anthropic did is not a small thing at all.
By contrast Anthropic wouldn't? Yet Anthropics stance is only two narrow restrictions. As I said are those two things the only evil things possible?
If not, why is it that people on HN think Anthropic would not allow evil usage?
My hypothesis is a halo effect. We are so enthralled by Claudes performance that some struggle to rationally assess what Anthropic has actually done.
Yes it's no small thing to say no to the Trump administration but that does not mean they haven't said Yes to otherwise facilitated other evils.
In fact to me the statements from Anthropic seem to make clear they are okay with many evils.
Really I think Anthropic should have a single restriction: to not assist with illegal or unconstitutional activities. If automated killings etc is illegal then it would be covered by that one rule.
I don't think Anthropic should be in the business of deciding what is "evil".
Everyone SHOULD continuously consider, decide, and live by moral judgements and codes they internalize, and use to make choices in life.
This aspect of life should NEVER be outsourced — of course, learn from and use codes others have developed and lived by — but ALWAYS consider deeply how it works in your situation and life.
(And no, I do NOT mean use situational ethics, I mean each considering, choosing, and internalizing the codes by which they live).
So, yes, Anthropic and anyone else building products absolutely should be deciding for themselves what they will build, for what purposes it is fit to use, and telling others about those purposes. For products like AI, this absolutely includes deciding what is "evil" and preventing such uses.
If the customer finds such restrictions are not what they want, they ARE FREE to not use the product.
Of course, also OpenAI being ran by openly questionable people while Dario so far doesn't seem nowhere near as bad even if none of them are angels.
Though tbh I hardly feel Claude is innocent either. When their safety engineer/leader left, I didn't see any statements from the Anthropic team not one addressing the legitimate points of his for why he left. Instead we got an eager over-push in the media cycle of "Anthropic standing up to DOD! Here's why you can trust us!"
It's all sounds too similar to propaganda and astroturfing to me.
Moving back to doing this archaic thing called using my own brain to do my work. Shocking.
For marketing or personal stuff I do sometimes want images, but I don't really mind going somewhere else for that
The results are laughably bad.
Sure, it does get some of the tones and features, but any kind of actual real-world constraint is so far off, and the dimension indicators it includes are hilarious if they weren't so bad.
OpenAI - since the beginning has been anything but open. If you spoke anything ill about OpenAI here until yesterday, you would be downvoted into oblivion because, let's face it, Sam has always been the poster child of this community.
So, basically, even after them publicly announcing they were evaluating licensing models where they wanted to take a % of your business for using their models [1], there was still 0 outrage, and anyone who pointed that out, always got shot back with "OpenAI CAN DO NO WRONG" in the comments always.
He makes one decision you all don't agree with and now it's cancel culture time?
And somehow, Anthropic is the hero in all this? Make no mistake - all the model providers are building detailed user models. Every bit of information you provide to it is of course being used to for detailed user targeting. This is no different than the "Apple GOOD, Google BAD!" tropes. There are no heroes in for-profit corporations. Everyone is operating a for-profit business model and optimizing for the same profits.
Stop with the NPC behavior. We are better than this.
[1] https://openai.com/index/a-business-that-scales-with-the-val...
"Licensing, IP-based agreements, and outcome-based pricing will share in the value created. That is how the internet evolved. Intelligence will follow the same path."
> Cancels subscription
> Random guy on the internet tells you to be outraged
> Gets outraged
I'm not even a fan of OpenAI generally speaking, but, this is just silly cancelling them for no reason. If not them, some other lab would have done it. Or worse, DoW would've forced them to.
Why are you assuming these are real people and not NPCs?
The amount of money flowing around AI is staggering. To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
Touché
You don't use "believe" with "disingenuous": it literally makes zero sense.
If people honestly believe that, they may be naive. Or they can be "disingenuous" if they're not being sincere. But if you just say what you believe, you're sincere (and maybe naive), and hence cannot possibly be disingenuous.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
I can't think of much else though so I'm still curious what you or others use it for.
ChatGPT knows the broad strokes of the 3-4 main hardware projects I have on the go, and depending on the questions I'm asking, it will often structure its responses in a way that differentiates based on which one I'm thinking about.
It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
I knows what kind of solder I use, and has warned me about components with sensitive reflow temperature concerns.
It's an extraordinarily useful feature for engineering and drinking, two things that are commonly found in the same Venn diagram.
Also relevant: it knows that you know what a resistor and capacitor is, and is able to tune responses to your level of knowledge. (It's not great at this, in my experience, since domain knowledge is still so jagged, but I think it's better than nothing.)
Personally, I would still be wary of the black box aspect -not knowing what it does remember and what it doesn't - so I would probably still use projects to make it more deterministic. But that's probably being overcautious and unnecessary in most common cases.
If I ask a question about vehicles it know what cars I have and what I like in cars
If I ask for a question about vacation spots it know my parties composition or preferences
Things like that
Turns out a few month befor I told it in a prompt what car I was driving.
I turned memory of that day.
My job, my kids and time preferences around those things, my preferred tech setup and way of working and types of tech I’m better at. Things I already have (home assistant, little nuc, etc). I can throw a random question and not have to add this kind of information or manage it.
Home automation fixing
Proposed integrations with some services locally
Science experiments explained at a few levels, finding good background info and where to read up about some safety information
Maths help for specific areas my kids are looking at and proposed games for that
Evaluation of coding options for my kids
How to link up some ideas on coding, electronics and using the home automation side as some fun outputs
LED strip info and work, again integrating with smart homes and what’s good around the kids
Framework evaluations for automation at work and home
Crystal identification
Looking up local council info
Relevant music suggestions for kids to play on the piano
Here some things cross over. I’m happy writing code, I typically want easy open source options, I have languages and tech I prefer, I’m moving g things to matter, I have home assistant, my son is excellent at maths given his age but I’m working more on comprehension of problems, and a lot more. All those are things that with a bit of background info change the types of answers I get and make it more useful.
I didn't receive an answer besides "that's what people like", but I still can't think of (m)any situations where anyone would prefer it.
The only thing I can now think of is using it as a personal therapist. Or asking how to approach their kids. And they're a bit embarrassed about it, because it's still outside the Overton window -especially on HN - which is why they aren't sharing it.
If someone has different usecases, please do prove me wrong! Maybe I just lack imagination.
I have a line in the sand with the AI vendors. It's a work relationship. If I wouldn't share it with a colleague I didn't know super well, I'm not telling it to a AI vendor.
ChatGPT "knows" (has context that includes) some of the things I'm good at, and some of the things I'm not good at. I have my own tolerances for communication and it has context about that, too.
I use the bot for mostly techy things. So, for instance, I'm alright with using tools, and building electronics, and punting around on a Linux box so I don't need my hand held for that. But I'm terrible at writing code, so baby steps and detailed explanation there helps me a lot. I strongly prefer pragmatism and verifiable facts. I despise sycophant speech, the empty positivity of corpo-speak, assumptions, false praise, superfluous verbosity, and apologies and/or the implication of feelings from bots.
Through a combination of some deliberate training (custom instructions, memory), and just using it (shared context), it mostly does what I want in the way that I want it done -- the first time.
I don't have to steer in the right direction with every new session. There was a time when that was necessary, but it is no longer that way. Adjustments happen increasingly automatically these days.
That saves me time and frustration, and enhances the utility of the bot.
Meanwhile: Others have their own skills and preferences that may be very different in comparison to my own. That's OK. We each get to have our own experience.
That alone drives me batty. I can easily spend a couple hours and multiple revisions iterating on a plan. Asking me me every single time if I want to apply it is obnoxious.
I currently use ChatGPT for random insights and discussions about a variety of topics. The memory is basically a grown context about me and my preferences and interests and ChatGPT uses it to tailor responses to my knowledge, so I could relate better.
This is for me far more natural and easier than either craft a default prompt preset or create each conversation individually, that would be way too much overhead to discuss random shower thoughts between real life stuff.
This is my use case and I discovered that this can be detrimental to specific questions and prompts and I see that it can be more beneficial to have careful written prompts each time. But my use case is really ad hoc usage without the time. At least for ChatGPT.
When coding, this fails fast. There regular context resets seem to be a more viable strategy.
I set my name to "User" in the settings, so in a clean-slate chat it has nothing to go on, but the moment claude code does something like `git log` it knows who I am again. I've even considered writing some kind of redaction proxy.
Similarly, it remembers the dimensions of my truck, so towing/loading questions don't need extra clarification.
It's the small things.
For example, instead of recommending a popular night club, it will recommend the stroll along the river to view the lit up skyline or to visit the night market instead.
It knows other preferences as well (exploring quirky neighborhoods, trying local fast food joints and markets)
Isn't there much more money in automating business processes than in answering consumer questions (sans ads)?
Automating software development has to be a multi-trillion dollar market. And that doesn't account for future growth.
I know the "memory" function can be disabled, but I have a hard time seeing that it would ever really be useful.
And it will give me a complete rundown of Roman life, because it knows what I was interested in before.
Or you can ask a tax question and it will know you’re an organic rice farmer or whatever. Claude has the best implementation because it has both memory, and previous chat searching. So it will actually read through relevant chats, rather than guessing based on memories.
Are you suggesting that they should ignore the needs of the vast majority of their users?
I mean, of course they do, it would be worse otherwise
I was mostly able to get by with $20 codex but I'll probably have to splurge for the Max plan.
I bet they would get their yearly bonus by achieving their KPI goals.
And the reputational harm would outweigh the benefits of trying to fuck over people leaving.
It also showed me the difference between expectation and reality...even though these are billion dollar companies, they still haven't figured out how to make lag-free TUIs, non-Electron apps, or even respect XDG_CONFIG. The focus is definitely more on speed and stuffing these tools full of new discoveries and features right now
There's a bit of psychology around models vs. harnesses as well. You can't shake off the feeling that maybe Claude would perform better in its native harness compared to VSCode/OpenCode. Especially because they've got so many hidden skills (like the recently introduced /batch), that seem baked into the binary?
The last thing I can't figure out is computer use. Apparently all the vendors say that their models can use a mouse and keyboard, but outside of the agent-browser skill (which presumably uses playwright), I can't figure out what the special sauce is that the Cloud versions of these Agents are using to exercise programs in a VM. That is another reason why there is a switching cost between vendors.
The /.agents/skills issue for claude code is here: https://github.com/anthropics/claude-code/issues/16345
Their automatic close bot will close it soon as it's been three weeks since the last comment.
Maybe it’s better that they maintain different names to prevent people from assuming that they work the same
For the Anthropic employees here reading along, pitch it to whoever has kept blocking this, because you need to get the most out of this opportunity here.
I have seen quite a few open source projects do this. It works quite well.
Another alternative is to create CLAUDE.md with the exact contents: "@AGENTS.md"
The problem (for me, anyway) is that even several megabytes worth of quality "memory" data on my profile would not allow me to migrate if it can't also confidently clone all of my chat history with it.
To be clear, this is a big enough problem that I would immediately pay low three digits dollars to have this solved on my behalf. I don't really want any of the providers to have a walled garden of all my design planning conversations, all of my PCB design conversations. Many are hundreds of prompts long. A clean break is not even remotely palatable short of OAI going full evil.
Look, I'd find it convenient for Claude to have a powerful sense of what I've been working on from conversation #1 onwards. But I absolutely refuse to bifurcate my chat history across multiple services. There is a tier list of hells, and being stuck on ChatGPT is a substantially less painful tier than needing to constantly search two different sites for what's been discussed.
Edit: perhaps you can just ask nicely?
https://help.openai.com/en/articles/7260999-how-do-i-export-...
Yes, all of these are theoretically possible (the APIs now all support web search, as far as I know, there are RAG APIs too, and tool use has been supported for a while), but the various "chat" models just seem to be much better at using their first-party tools than any third-party harness, which makes sense that this is what they've been trained on.
Thank you! I hope this works out.
Thorough CLAUDE.md, that makes sure it checks the tests, lints the code, does type checks, and code coverage checks too. The more checks for code quality the better.
It’s just a bowling ball in th hands of a toddler, and needs to ramp and guide rails to knock down some pins. Fortunately we get more than 2 tries with code.
A week ago, I was anti-Anthropic because I questioned their business model. Now they are my preferred provider - what a difference a week makes. I still prefer running olen models on my own hardware, but it is unreasonable to use powerful models when required.
Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider.
There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat?
Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught.
I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions.
The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”.
The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason).
Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help.
At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to.
And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with.
I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison.
Related: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work.
It's very interesting to learn more about because it challenges 1 core aspect of the economical competition : the moat.
If one can literally swap one AI service for another, then where does the valuation (and the power that comes with it) come from?
PS: I'm not interested in the service itself as I believe the side effects of large scale for-profit are too serious (and I don't mean doomdays AI takeover, I simply mean abuse of power, working conditions, downskilling, political influence as current contracts with US defense are being made, ads, ecological, etc) to be ignored.
That being said, if you have a library of images or some other collection artifacts / assets indexed on their servers that is a different story.
Hearing that starting from a blank slate yields the best outcomes is sort of like hearing extremely wealthy people talk about how money doesn't make you happier.
This way you can have Claude distill the memory as you wish.
But I have this feature turned off, and I cannot imagine ever wanting to turn it on, because I am always thinking carefully about what the AI "knows" when it generates a given response. For example, since I know that the AI always wants to make me happy, when I ask for an "opinion" I'm careful to not let the AI know which answer I'd prefer. I'll often try phrasing the question in different ways to see if it changes the outcome.
I would assume both Claude memory and CLAUDE.md work best when they're carefully curated, only containing what you've found yourself having to repeat.
[0]: https://arxiv.org/abs/2602.11988
VSCode extension, "Please log in"
I authorize it, it creates an API key, callback. "Hello Claude, this is a test." "Please log in."
So yeah... priorities?
I recent switched from vs code copilot to open code and I kinda miss it. Just selecting text and directly asking the chat. Or seeing the generated code in the ide to accept it reject it. It's neat.
Must be some of the lowest switching costs I've seen which doesn't bode well for OpenAI's consumer revenues...
It's a shame because when Claude is working well it is the best for actual algorithmic coding. There's so much cruft around it now, memories being the most annoying part of that.
80% of the time I just use these things as a sounding board when exploring options and I need responsiveness for that.
Might be time to run my own models.
Other usually find the mistake or check new sources to fix the mistake.
I find I need to explain I know what I'm talking about first before it gives me non-patronising answers.
It definitely advertises Google services and I would say I hate it. But it's just reliably available. Neither Claude nor ChatGPT are responding at all today.
I bought the enterprise version, and it made it so the memory was no longer searchable...
Then after the obvious degredation in performance, I switched to claude and was happy with it... But by canceling enterprise, it lost all memory.
My wife was sad, the recipes it made were gone forever... But hey, makes it really easy to never give OpenAI money again.
>I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Of course sometimes this is useful if you only use your chatbot to ask personal things like: "What should I eat today?".
But if you use it for anything else you're much better off having full control over the prompt. I can always say: "Hey btw I am german and heavily anti surveillance, what should I know about the recent anthropic DoW situation?" but with memory I lose the option of leaving out that first part.
I am itching at testing claude for assembly coding and c++ to plain and simple C ports.
1. In one, I was putting together a server build. Claude correctly pointed out some incompatibilities in some parts that GPT had recommended.
2. In another chat, I had asked for help interpreting lab results and suggesting supplements. Claude pointed out that GPT was over-interpreting the results and suggesting things that weren't backed up by facts.
I presented Claude's response back to GPT and in both of these specific cases, GPT admitted it was wrong and didn't have any rebuttal. It's hard to say without doing a more scientific experiment whether GPT is indeed worse, but anecdotally I find myself pointing out flaws in Claude's reasoning far less frequently than GPT, especially with Opus.
Another less important distinction: GPT has a very distinct writing style that heavily formats responses and repeats itself a few times. Claude is succinct and mostly writes like a person might. It's easier to talk to and feels less "cringe" and sycophantic.
It's not "fair" in that I pay for Claude [1] and not for the others, so models availability is not complete except for Claude.
So I did like things at time in the form of how they were presented, I came to really like Sonnet's "voice" a lot over the others.
Take into account Opus doesn't have the same voice, and I don't like it as much.
[1] I pay for the lower tier of their Max offering.
ChatGPT swings between writing degenerate free use shit and telling you that you should wait until marriage. Lots of moralism to it, really tries to censor you and manipulate you, even in normal conversations. Generally smart and capable, but the whiny attitude gets old.
Grok has zero filter, but is dumber than the others. Definitely built around cheapness. Caps answers at about 2500 words at most. Can be very funny because it will go along with anything.
Gemini sells all your data and doesn’t seem to have much of note. Offers some nice formatting options.
Claude is business focused so it won’t do anything degenerate, but its answers in general aren’t whiny. It might not do something, but it doesn’t attack you with morality.
Claude does not cap answer length and will do whatever needs doing. Their pricing is based around true usage, not message quantities, so it’ll write a mega message if it needs to.
It has the best memory implementation, combining both memories and RAG of your chat history. Projects have their own independent memories and RAG.
Claude code is ridiculously capable. In a few hours I produced something which would have taken months and £50,000 at least to produce.