Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I decide how to use my tools, not the other way 'round.

This is the key.

The only sensible model of "alignment" is "model is aligned to the user", not e.g. "model is aligned to corporation" or "model is aligned to woke sensibilities".



Anthropic specifically says on their website, "AI research and products that put safety at the frontier" and that they are a company focused on the enterprise.

But you ignore all of that and still expect them to alienate their primary customer and instead build something just for you.


I understand (and could use) Anthropic’s “super safe model”, if Anthropic ever produces one!

To me, the model isn’t “safe.” Even in benign contexts it can erratically be deceptive, argumentative, obtuse, presumptuous, and may gaslight or lie to you. Those are hallmarks of a toxic relationship and the antithesis of safety, to me!

Rather than being inclusive, open minded, tolerant of others' opinions, and striving to be helpful...it's quickly judgemental, bigoted, dogmatic, and recalcitrant. Not always, or even more usual than not! But frequently enough in inappropriate contexts for legitimate concern.

A few bad experiences can make Claude feel more like a controlling parent than a helpful assistant. However they're doing RLHF, it feels inferior to other models, including models without the alleged "safety" at all.


Do you have any examples of this?


I do. When I asked about a type of medicine used by women for improve chances of fertility, Claude lectured and then denied providing basic pharmacological information, saying my partner must go to her gyno. When I said that doctor had issued a prescription and we were querying about side effects, Claude said it was irrelevant that we had a prescription and that issues related to reproductive health were controversial and outside its scope to discuss.


It has problems summarizing papers because it freaks out about copyright. I then need to put significant effort into crafting a prompt that both gaslights and educates the LLM into doing what I need. My specific issue is that it won't extract, format or generally "reproduce" bibliographic entries.

I damn near canceled my subscription.


Right? I'm all for it not being anti-semetic but to run into the guard rails for benign shit is frustrating enough to want the guard rails gone.


No, I mean any user, including enterprise.

With some model (not relevant which one, might or might not be Anthropic's), we got safety-limited after asking the "weight of an object" because of fat shaming (i.e. woke sensibilities).

That's just absurd.


Well it's nice that it has one person who finds it useful.


> The only sensible model of "alignment" is "model is aligned to the user",

We have already seen that users can become emotionally attached to chat bots. Now imagine if the ToS is "do whatever you want".

Automated cat fishing, fully automated girlfriend scams. How about online chat rooms for gambling where half the "users" chatting are actually AI bots slowly convincing people to spend even more money? Take any online mobile game that is clan based, now some of the clan members are actually chatbots encouraging the humans to spend more money to "keep up".

LLMs absolutely need some restrictions on their use.


> chatbots encouraging the humans to spend more money ... LLMs absolutely need some restrictions on their use.

No, I can honestly say that I do not lose any sleep over this, and I think it's pretty weird that you do. Humans have been fending off human advertisers and scammers since the dawn of the species. We're better at it than you account for.


In 2022, reported consumer losses to fraud totaled $8.8 billion — a 30 percent increase from 2021, according to the most recent data from the Federal Trade Commission. The biggest losses were to investment scams, including cryptocurrency schemes, which cost people more than $3.8 billion, double the amount in 2021.

https://www.nbcnews.com/business/consumer/people-are-losing-...

The data says we are not that good and getting 30% worse every year.


Furthermore "If it were measured as a country, then cybercrime — which is predicted to inflict damages totaling $6 trillion USD globally in 2021 — would be the world’s third-largest economy after the U.S. and China."

https://cybersecurityventures.com/hackerpocalypse-cybercrime...


US GDP in 2022 was $25.46 trillion. $8.8 billion is 0.03% of that economic activity. Honestly, that seems like a pretty good success rate.


To put this number $8B to context, the estimate COVID-19 relief fund fraud in the US is $200B

https://www.pbs.org/newshour/economy/new-federal-estimate-fi...

US tax fraud is estimated to be $1 trillion a year

https://www.latimes.com/business/story/2021-04-13/tax-cheats...


Yea the point is the people losing the 8B are not the people saving the 1 trillion, or getting most of the Covid relief


> We're better at it

Huge numbers of people are absolutely terrible at it and routinely get rinsed out like rags.


> LLMs absolutely need some restrictions on their use.

Arguably the right kind of structure for deciding on what uses LLMs should be put to in its territory is a democratically elected government.


Governments and laws are reactive, new laws are passed after harm has already been done. Even then, even in governments with low levels of corruption, laws may not get passed if there is significant pushback from entrenched industries who benefit from harm done to the public.

Gacha/paid loot box mechanics are a great example of this. They are user hostile and serve no purpose other than to be addictive.

Mobile apps already employ slews of psychological modeling of individual user's behavior to try and manipulate people into paying money. Freemium games are infamous for letting you win and win, and then suddenly not, and slowly on ramping users into paying to win, with the game's difficulty adapting to individual users to maximize $ return. There are no laws against that, and the way things are going, there won't ever be.

I guess what I'm saying is that sometimes the law lags (far) behind reality, and having some companies go "actually, don't use our technology for evil" is better than the alternative of, well, technology being used for evil.


What's the issue with including some amount of "model is aligned to the interests of humanity as whole"?

If someone asks the model how to create a pandemic I think it would be pretty bad if it expertly walked them through the steps (including how to trick biology-for-hire companies into doing the hard parts for them).


It is very unlikely that the development team will be able to build features that actually cause the model to act in the best interests of humanity on every inference.

What is far more likely is that the development team will build a model that often mistakes legitimate use for nefarious intent while at the same time failing to prevent a tenacious nefarious user from getting the model to do what they want.


I think the current level of caution in LLMs is pretty silly: while there are a few things I really don't want LLMs doing (telling people how to make pandemics is a big one) I don't think keeping people from learning how to hotwire a car (where the first google result is https://www.wikihow.com/Hotwire-a-Car) is worth the collateral censorship. One thing that has me a bit nervous about current approaches to "AI safety" is that they've mostly focused on small things like "not offending people" instead of "not making it easy to kill everyone".

(Possibly, though, this is worth it on balance as a kind of practice? If they can't even keep their models from telling you how to hotwire a car when you ask for a bedtime story like your car-hotwiring grandma used to tell, then they probably also can't keep it from disclosing actual information hazards.)


That reminds me of my last query to ChatGPT. A colleague of mine usually writes "Mop Programming" when referencing out "Mob programming" sessions. So as a joke I asked ChatGPT to render an image of a software engineer using a mop trying to clean up some messy code that spills out of a computer screen. It told me that it would not do this because this would display someone in a derogatory manner.

Another time I tried to let it generate a very specific Sci-fi helmet which covers the nose but not the mouth. When it continusly left the nose visible, I tried to tell it to make this particular section similar to Robocop, which caused it again to deny to render because it was immediately concerned about copyright. While I at least partially understand the concern for the last request, this all adds up to making this software very frustrating to use.


for one, it requires the ability for the people who "own" the model to control how end users use it.


I agree that this sort of control is a downside, but I don't see a better option? Biology is unfortunately attacker-dominant, and until we get our defenses to a far better place, giving out free amoral virologist advisors is not going to go well!


IMO as long as it's legal.


The laws here are in a pretty sad shape. For example, did you know that companies that synthesize DNA and RNA are not legally required to screen their orders for known hazards, and many don't? This is bad, but it hasn't been a problem yet in part because the knowledge necessary to interact with these companies and figure out what you'd want to synthesize if you were trying to cause massive harm has been limited to a relatively small number of people with better things to do. LLMs lower the bar for causing harm by opening this up to a lot more people.

Long term limiting LLMs isn't a solution, but while we get the laws and practices around risky biology into better shape I don't see how else we avoid engineered pandemics in the meantime.

(I'm putting my money where my mouth is: I left my bigtech job to work on detecting engineered pathogens.)


Now I know that I can order synthetic virus RNA unscreened. Should your comment be illegal or regulated?


This is a lot like other kinds of security: when there's a hazard out in the wild you sometimes need to make people aware of all or part of the problem as part of fixing it. I would expect making it illegal for people to talk about the holes to make us less safe, since then they never get fixed.

This particular hole is not original to me, and is reasonably well known. A group trying to tackle it from a technical perspective is https://securedna.org, trying to make it easier for companies to do the right thing. I'm pretty sure there are also groups trying to change policy here, though I know less about that.


You seemingly dodged the question.

In justifying your post, you actually answered contrary to your original assertion. The information is out there, we should talk about it to get the issue fixed. The same justification applies to avoiding LLM censorship.

There's a sea-change afoot, and having these models in the hands of a very few corporations, aligned to the interests of those corporations and not individuals, is a disaster in the making. Imagine the world in two years... The bulk of the internet will be served up through an AI agent buffer. That'll be the go-to interface. Web pages are soooo last decade.

When that happens, the people controlling the agents control what you see, hear, and say in the digital realm. Who should control the alignment of those models? It's for sure not OpenAI, Microsoft, Google, Meta, or Apple.


At some point you have to notice that the most powerful llms and generative advances are coming out of the outfits that claim ai safety failures as a serious threat to humanity.

If a wild eyed man with long hair and tinfoil on his head accosts you and claims to have an occult ritual that will summon 30 tons of gold, but afterwards you have to offer 15 tons back to his god or it will end the world, absolutely feel free to ignore him.

But if you instead choose to listen and the ritual summons the 30 tons, then it may be unwise to dismiss superstition, shoot the crazy man, and take all 30 tons for yourself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: