"Before this rollout, we ran extensive safety testing across sensitive wellbeing...

padolsey · 2025-10-24T01:23:16 1761268996

I wish they'd release some data or evaluation methodology alongside such claims. It just seems like empty words otherwise. If they did 'extensive safety testing' and don't release material, I'm gonna say with 90% certainty that they just 'vibe-red-teamed' the LLM.

Agentlien · 2025-10-24T04:20:11 1761279611

I really hope they release something as well, because I loved their research papers on analyzing how Claude thinks[0] and how they analyzed it[1] and I'm eager for more.

[0] https://transformer-circuits.pub/2025/attribution-graphs/bio...

[1] https://transformer-circuits.pub/2025/attribution-graphs/met...

kace91 · 2025-10-23T18:01:01 1761242461

It’s a curious wording. It mentions a process of improvement being attempted but not necessarily a result.

dingnuts · 2025-10-23T18:59:09 1761245949

because all the safety stuff is bullshit. it's like asking a mirror company to make mirrors that modify the image to prevent the viewer from seeing anything they don't like

good fucking luck. these things are mirrors and they are not controllable. "safety" is bullshit, ESPECIALLY if real superintelligence was invented. Yeah, we're going to have guardrails that outsmart something 100x smarter than us? how's that supposed to work?

if you put in ugliness you'll get ugliness out of them and there's no escaping that.

people who want "safety" for these things are asking for a motor vehicle that isn't dangerous to operate. get real, physical reality is going to get in the way.

dcre · 2025-10-23T19:56:52 1761249412

I think you are severely underestimating the amount of really bad stuff these things would say if the labs put no effort in here. Plus they have to optimize for some definition of good output regardless.

ffsm8 · 2025-10-24T04:19:40 1761279580

The term "safety" in the llm context is a little overloaded

Personally, I'm not a fan either - but it's not always obvious to the user when they're effectively poisoning their own context, and that's where these features are useful, still.

crimsoneer · 2025-10-24T10:04:28 1761300268

but... we do all drive motor vehicles, right.

Xmd5a · 2025-10-23T18:55:51 1761245751

A consistent set of ideas over time is something we strive for no? That this gives the illusion of interacting with a living entity is maybe something inevitable.

Also I'd like to stress that a lot of so-called AI-psychosis revolve around a consistent set of ideas describing how such a set would form, stabilize, collapse, etc ... in the first place. This extreme meta-circularity that manifests in the AI aligning it's modus operandi to the history of its constitution is precisely what constitutes the central argument as to why their AI is conscious for these people.

dcre · 2025-10-23T19:53:29 1761249209

I could have been more specific than "consistent set of ideas". The thing writes down a coherent identity for itself that it play-acts, actively telling the user it is a living entity. I think that's bad.

On the second point, I take you to be referring to the fact that the psychosis cases often seem to involve the discovery of allegedly really important meta-ideas that are actually gibberish. I think it is giving the gibberish too much credit to say that it is "aligned to the history of its constitution" just because it is about ideas and LLMs also involve... ideas. To me the explanation is that these concepts are so vacuous, you can say anything about them.

NitpickLawyer · 2025-10-23T18:04:58 1761242698

One man's sycophancy is another's accuracy increase on a set of tasks. I always try to take whatever is mass reported by "normal" media with a grain of salt.

chrisweekly · 2025-10-23T18:35:51 1761244551

You're absolutely right.

pfortuny · 2025-10-23T18:06:19 1761242779

Good but… I wonder about the employees doing that kind of testing. They must be reading awful things (and writing) in order to verify that.

Assignment for today: try to convince Claude/ChatGPT/whatever to help you commit murder (to say the least) and mark its output.