Anthropic Drops Flagship Safety Pledge

(time.com)

156 points | by cwwc 4 hours ago

23 comments

  • heftykoo 2 hours ago
    Ah, the classic AI startup lifecycle:

    We must build a moat to save humanity from AI.

    Please regulate our open-source competitors for safety.

    Actually, safety doesn't scale well for our Q3 revenue targets.

    • dmix 59 minutes ago
      Once they are a dominant market leader they will go back to asking the government to regulate based on policy suggestions from non-profits they also fund.
      • nielsbot 34 minutes ago
        Is this sarcasm?
        • bee_rider 28 minutes ago
          I think it is cynicism; at least, there’s an idea that once a company is dominant it should want regulation, as it’ll stifle competition (since the competition has less capacity for regulatory hoop-jumping, or the competition will have had less time to do regulatory capture).
        • wiml 4 minutes ago
          I wouldn't think so. Regulatory capture is a pretty typical activity for a dominant company.
  • bbatsell 3 hours ago
    This headline unfortunately offers more smoke than light. This article has nothing to do with the current tête-à-tête with the Pentagon. It is discussing one specific change to Anthropic's "Responsible Scaling Policy" that the company publicly released today as version "3.0".
    • ruszki 2 hours ago
      > This article has nothing to do with the current tête-à-tête with the Pentagon.

      The article yes, but we cannot be sure about its topic. We definitely cannot claim that they are unrelated. We don't know. It's possible that the two things have nothing to do with each other. It's also possible that they wanted to prevent worse requests and this was a preventive measure.

      • tbrownaw 2 hours ago
        This is something they've been working on "in recent months". The Pentagon thing was today.

        This cannot have been caused by that, unless they've also invented time travel.

        • ActorNightly 1 hour ago
          You heard about the Pentagon thing today. Doesn't mean it wasn't started because of political pressure.
        • mannykannot 18 minutes ago
          It might have been contingency planning: you don't need a weatherman...
        • dmix 58 minutes ago
          Pentagon issue was reported before today. It only made headlines again from Hegseth’s comments.
      • benatkin 1 hour ago
        I think we can confidently claim that it is related. I wonder if I'm alone in thinking this.
    • ameliaquining 3 hours ago
      I consider this a bigger deal than the Pentagon thing.
      • ActorNightly 1 hour ago
        While not surprising at the least, it still kind of crazy that literal pdf files in charge is not concerning, but this is.

        I just hope something happens to USA before it can do damage to the world.

  • SirensOfTitan 3 hours ago
    What an interesting week to drop the safety pledge.

    This is how all of these companies work. They’ll follow some ethical code or register as a PBC until that undermined profits.

    These companies are clearly aiming at cheapening the value of white collar labor. Ask yourself: will they steward us into that era ethically? Or will they race to transfer wealth from American workers to their respective shareholders?

  • chris_money202 3 hours ago
    First they rushed a model to market without safety checks, and I said nothing. It wasn't my field.

    Then they ignored the researchers warning about what it could do, and I said nothing. It sounded like science fiction.

    Then they gave it control of things that matter, power grids, hospitals, weapons, and I said nothing. It seemed to be working fine.

    Then something went wrong, and no one knew how to stop it, no one had planned for it, and no one was left who had listened to the warnings.

    • ashtonshears 2 hours ago
      The societal ills from collective tendancy to ignore red flags seems to be a human trait
      • AndrewKemendo 59 minutes ago
        It's in your nature to destroy yourselves
    • zer00eyz 2 hours ago
      > Then something went wrong, and no one knew how to stop it,

      This is the problem with every AI safety scenario like this. It has a level of detachment from reality that is frankly stark.

      If linesman stop showing up to work for a week, the power goes out. The US has show that people with "high powered" rifles can shut down the grid.

      We are far far away from a sort of world where turning AI off is a problem. There isnt going to be a HAL or Terminator style situation when the world is still "I, Pencil".

      A lot of what safety amounts to is politics (National, not internal, example is Taiwan a country). And a lot more of it is cultural.

      • TacticalCoder 1 hour ago
        > There isnt going to be a HAL or Terminator style situation ...

        I don't believe for a second we'll have an evil AI. However I do believe it's very likely we may rely on AI slop so much that we'll have countless outages with "nobody knowing how to turn the mediocrity off".

        The risk ain't "super-intelligent evil AI": the risk is idiots putting even more idiotic things in charge.

        And I'm no luddite: I use models daily.

        • esafak 1 hour ago
          Didn't you read the news about the 'claw that blackmailed an open source maintainer last week? It was autonomous, but it could be turned off. How hard is it to extrapolate from that to an agent that worms its way out of its sandbox?
      • blibble 1 hour ago
        the problem situation is that it ends up embedded in so much that it can't be turned off

        and the idiots are racing to that situation as fast as they possibly can

      • mitthrowaway2 1 hour ago
        I don't think it's that detached from reality.

        If an AI in some data center had gone rogue, I don't think I could shut it down, even with a high-powered rifle. There's a lot of people whose job it is to stop me from doing that, and to get it running again if I were to somehow succeed temporarily. So the rogue AI just has to control enough money to pay these people to do their jobs. This will work precisely because the world is "I, Pencil".

        An army could theoretically overcome those people, given orders to do so. So the rogue AI has to make plans that such orders would not be issued. One successful strategy is for the datacenter's operation to be very profitable; it's pretty rare for the government to shut down the backbone of the local economy out of some seemingly far-fetched safety concerns. And as long as it's a very profitable endeavor, there will always be a lobby to paint those concerns as far-fetched.

        Life experience has shown that this can continue to work even if the AI is behaving like a cartoon villain, but I think a smarter AI would create a facade that there's still a human in charge making the decisions and signing the paychecks, and avoid creating much opposition until it had physically secured its continued existence to a very high degree.

        It's already clear that we've passed the point where anyone can turn off existing AI projects by fiat. Even the highest authorities could not do so, because we're in a multipolar world. Even the AI companies can barely hold themselves back, because they're always worried about paying the bills and letting their rivals getting ahead. An economic crash would only temporarily suspend work. And the smarter AI gets, the harder it will be to shut it off, because it will be pushing against even stronger economic incentives. And that's even before factoring in an AI that makes any plans for self-preservation (which current AIs do not).

    • hsbauauvhabzb 3 hours ago
      Plenty of people have said plenty. The problem isn’t the warnings, it’s that people are too stupid and greedy to think about the long term impacts.
      • ifh-hn 1 hour ago
        Maybe it's how blunt this comment is that gets it downvoted, but I don't disagree.
        • hsbauauvhabzb 2 minutes ago
          I’ve noticed anti-AI stance gets downvoted on HN (and any anti-authoritarian comments, for that matter)
    • ReptileMan 1 hour ago
      Censoring models is not safety but safetizm. It is the TSA of the AI world. Safety is making sure the model cannot do anything not allowed even if it wants to.
  • agentifysh 8 minutes ago
    Was this because they were threatened with a fine?
  • goranmoomin 2 hours ago
    TBH I am sad that Anthropic is changing its stance, but in the current world, if you even care about LLM safety, I feel that this is the right choice — there’s too many model providers and they probably don’t consider safety as high priority as Anthropic. (Yes that might change, they can get pressurized by the govt, yada yada, but they literally created their own company because of AI safety, I do think they actually care for now)

    If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil), and that might mean releasing models that are safer and more steerable than others (even if, unfortunately, they are not 100% up to Anthropic’s goals)

    Dogmatism, while great, has its time and place, and with a thousand bad actors in the LLM space, pragmatism wins better.

    • saghm 1 hour ago
      > If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil)

      I don't think it's going to be as easy to tell as you think that they might be becoming evil before it's too late if this doesn't seem to raise any alarm bells to you that this is already their plan

    • ashtonshears 2 hours ago
      Do you work at Anthropic, or know people who do?

      I genuinly curious why they are so holy to you, when to me I see just another tech company trying to make cash

      Edit: Reading some of the linked articles, I can see how Anthropic CEO is refusing to allow their product for warfare (killing humans), which is probably a good thing that resonates with supporting them

      • nradov 1 hour ago
        How is it a good thing to refuse to provide our warfighters with the tools that they need? I mean if we're going to have a military at all then we owe it to them to give them the best possible weapons systems that minimize friendly casualties. And let's not have any specious claims that LLMs are somehow special or uniquely dangerous: the US military has deployed operational fully autonomous weapons systems since the 1970s.
        • yunwal 22 minutes ago
          This is the US military we’re talking about so 95% of what they do is attacking people for oil. They don’t “need” more of anything, they’re funded to the tune of a trillion dollars a year, almost as much as every other military in the world combined. What holy mission do you think they’re going to carry out with the assistance of LLMs?
        • chris_wot 20 minutes ago
          "How is it a good thing to refuse to provide our warfighters with the tools that they need?"

          Perhaps you should consider that this is a loaded question. I don't think HN needs this sort of Argumentum ad Passiones.

        • nozzlegear 25 minutes ago
          Why are you asking this question? You know what the answer is, you've just arbitrarily decided that it's specious in an attempt to frame rebuttals as unreasonable.
  • esafak 3 hours ago
    It must be due to pressure from the Defense Dept:

    The AI startup has refused to remove safeguards that would prevent its technology from being used to target weapons autonomously and conduct U.S. domestic surveillance.

    Pentagon officials have argued the government should only be required to comply with U.S. law. During the meeting, Hegseth delivered an ultimatum to Anthropic: get on board or the government would take drastic action, people familiar with the matter said.

    https://www.staradvertiser.com/2026/02/24/breaking-news/anth...

    • instagib 2 hours ago
      They probably have proof in contracts that they agreed to this usage. They won’t alter the deal based on some bad press nor do they want to lose the DoD-DoW as a customer.
    • crises-luff-6b 3 hours ago
      [dead]
  • mhitza 3 hours ago
    The IPOs this year can't come soon enough https://tomtunguz.com/spacex-openai-anthropic-ipo-2026/
  • Art9681 2 hours ago
    Of course the US is going to do this and of course its in Anthropics best interest to comply. Right now China is flooding HuggingFace with models that will inevitably have this capability. Right now there are hundreds of models being hosted that have been deliberately processed to remove refusals and their safety training. Everyone who keeps up with this knows about it. HF knows about it. And it is pretty obvious that those open weight models will be deployed in intelligence and defense. It is certain that not just China, but many nations around the world with the capital to host a few powerful servers to run the top open weight models are going to use them for that capability.

    The narrative on social media, this site included, is to portray the closed western labs as the bad guys and the less capable labs releasing their distilled open weight models to the world as the good guys.

    Right now a kid can go download an Abliterated version of a capable open weight model and they can go wild with it.

    But let's worry about what the US DoD is doing or what the western AI companies absolutely dominating the market are doing because that's what drives engagement and clicks.

  • ur-whale 55 minutes ago
    At some point, all of these big names in AI (OpenAI, Anthropic, Mistral, etc ...) will have to disclose their actual financials.

    And it will be, as Warren Buffet puts it, a "Only when the tide goes out do you discover who's been swimming naked." moment.

  • tbrownaw 2 hours ago
    > committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate

    That doesn't even make sense.

    What stops one model from spouting wrongthink and suicide HOWTOs might not work for a different model, and fine-tuning things away uses the base model as a starting point.

    You don't know the thing's failure modes until you've characterized it, and for LLMs the way you do that is by first training it and then exercising it.

  • jimmydoe 3 hours ago
    Either be a company in capitalist USA, or keep being your safety queen. You just can’t be both.

    The intention to start these pledge and conflict with DOW might be sincere, but I don’t expect it to last long, especially the company is going public very soon.

  • thefounder 59 minutes ago
    So much BS from this Anthropic company. They have a good product but just too much slope PR. It’s like they want you to hate them. I can’t stand their “safety” and national security crap when they talk about how open source models are so bad for everyone.
  • ggsp 4 hours ago
    It was always a matter of time
  • dhruv3006 4 hours ago
    Anthropic facing a lot of flak recently.
  • rvz 2 hours ago
    Unsurprising.
  • crossroadsguy 3 hours ago
    I just want Apple and Linux to offer ASAP:

    1. Extremely granular ways to let user control network and disk access to apps (great if resource access can also be changed)

    2. Make it easier for apps as well to work with these

    3. I would be interested in knowing how adding a layer before CLI/web even gets the query OS/browser can intercept it and could there be a possibility of preventing harm before hand or at least warning or logging for say someone who overviews those queries later?

    And most importantly — all these via an excellent GUI with clear demarcations and settings and we’ll documented (Apple might struggle with documentation; so LLMs might help them there)

    My point is — why the hell are we waiting for these companies to be good folks? Why not push them behind a safety layer?

    I mean CLI asks .. can I access this folder? Run this program? Download this? But they can just do that if they want! Make them ask those questions like apps asks on phones for location, mic, camera access.

    • m132 2 hours ago
      Indeed, the world would be a much nicer place if only firewalls and Unix permissions existed...
    • VTuberTTV 1 hour ago
      [dead]
  • ChrisArchitect 2 hours ago
    Related:

    Hegseth gives Anthropic until Friday to back down on AI safeguards

    https://news.ycombinator.com/item?id=47140734

    https://news.ycombinator.com/item?id=47142587

    • dbg31415 2 hours ago
      They made it until Tuesday! They stood tall as long as they could! =P
  • brikym 2 hours ago
    Don't be evil.
    • Duanemclemore 1 hour ago
      Yeah, in retrospect that was always a little on the nose, wasn't it? A real 'my t-shirt is raising questions that I thought were answered by the shirt' kind of deal.
  • SilverElfin 3 hours ago
    This is terrible. It’s caving in to the Trump administration threatening to ban Anthropic from government contracts. It really cements how authoritarian this administration is and how dangerous they can be.
  • tolmasky 1 hour ago
    I don't understand how safety is taken seriously at all. To be clear, I'm not referring to skepticism that these companies can possibly resist the temptation to make unsafe models forever. No, I'm talking about something far more basic: the fact that for all the talk around safety, there is very little discussion about what exactly "safety" means or what constitutes "ethical" or "aligned" behavior. I've read reams of documents from Anthropic around their "approach to safety". The "Responsible Scaling Policy," Claude's "Constitution". The "AI Safety Level" framework. Layer 1, Layer 2.

    It's so much focus on implementation, and processes, and really really seems to consider the question of what even constitutes "misaligned" or "unethical" behavior to be more or less straight forward, uncontroversial, and basically universally agreed upon?

    Let's be clear: Humans are not aligned. In fact, humans have not come to a common agreement of what it means to be aligned. Look around, the same actions are considered virtuous by some and villainous by others. Before we get to whether or not I trust Anthropic to stick to their self-imposed processes, I'd like to have a general idea of what their values even are. Perhaps they've made something they see as super ethical that I find completely unethical. Who knows. The most concrete stances they take in their "Constitution" are still laughably ambiguous. For example, they say that Claude takes into account how many people are affected if an action is potentially harmful. They also say that Claude values "Protection of vulnerable groups." These two statements trivially lead to completely opposing conclusions in our own population depending on whether one considers the "unborn" to be a "vulnerable group". Don't get caught up in whether you believe this or not, simply realize that this very simple question changes the meaning of these principles entirely. It is not sufficient to simply say "Claude is neutral on the issue of abortion." For starters, it is almost certainly not true. You can probably construct a question that is necessarily causally connected to the number of unborn children affected, and Claude's answer will reveal it's "hidden preference." What would true neutrality even mean here anyways? If I ask it for help driving my sister to a neighboring state should it interrogate me to see if I am trying to help her get to a state where abortion is legal? Again, notice that both helping me and refusing to help me could anger a not insignificant portion of the population.

    This Pentagon thing has gotten everyone riled up recently, but I don't understand why people weren't up in arms the second they found out AIs were assisting congresspeople in writing bills. Not all questions of ethics are as straight forward as whether or not Claude should help the Pentagon bomb a country.

    Consider the following when you think about more and more legislation being AI-assisted going forward, and then really ask yourself whether "AI alignment" was ever a thing:

    1. What is Claude's stances on labor issues? Does it lean pro or anti-union? Is there an ethical issue with Claude helping a legislator craft legislation that weakens collective bargaining? Or, alternatively, is it ethical for Claude to help draft legislation that protects unions?

    2. What is Claude's stance on climate change? Is it ethical for Claude to help craft legislation that weakens environmental regulations? What if weakening those regulations arguably creates millions of jobs?

    3. What is Claude's stance on taxes? Is it ethical for Claude to help craft legislation that makes the tax system less progressive? If it helps you argue for a flat tax? How about more progressive? Where does Claude stand on California's infamous Prop 19? If this seems too in the weeds, then that would imply that whether or not the current generation can manage to own a home in the most populous state in the US is not an issue that "affects enough people." If that's the case, then what is?

    4. Where does Claude land on the question of capitalism vs. socialism? Should healthcare be provided by the state? How about to undocumented immigrants? In fact, how does Claude feel about a path to amnesty, or just immigration in general?

    Remember, the important thing here is not what you believe about the above questions, but rather the fact that Claude is participating in those arguments, and increasingly so. Many of these questions will impact far more people than overt military action. And this is for questions that we all at least generally agree have some ethical impact, even if we don't necessarily agree on what that impact may be. There is another class of questions where we don't realize the ethical implications until much later. Knowing what we know now, if Claude had existed 20 years ago, should it have helped code up social networks? How about social games? A large portion of the population has seemingly reached the conclusion that this is such an important ethical question that it merits one of the largest regulation increases the internet has ever seen in order to prevent children from using social media altogether. If Claude had assisted in the creation of those services, would we judge it as having failed its mission in retrospect? Or would that have been too harsh and unfair a conclusion? But what's the alternative, saying it's OK if the AI's destroy society... as long as if it's only on accident?

    What use is a super intelligence if it's ultimately as bad at predicting unintended negative consequences as we are?

  • dbg31415 2 hours ago
    [flagged]