More

323 · on March 21, 2023

This is worse than breaking your computer.

Plenty of people send images around after cropping out sensitive parts.

323 · on March 21, 2023

> who does that?

Programmers do it all day ... with text files.

Would you say the same thing if when you deleted some code it would actually still be there at the end of the file, after two pages of spaces?

When you delete something and save it in a "final" format (like a PNG), you expect nothing which was deleted to be there.

hn92726819 · on March 21, 2023

I don't think the comment was about expected behavior. It's about who would be suspetible to this. Saving text files is fine all the time, you're right. They're saying, how often do people perform the steps in that order?

323 · on March 21, 2023

> Might be my European outlook

How did the EU cookie laws and GDPR solved this problem? It's as widespread as before, except that now you are annoyed by prompts too.

323 · on March 19, 2023

Don't worry, nature compensated and you can instead work on engineering viruses or strong AI if you want to blow up the world.

dilznoofus · on March 20, 2023

Why not just keep a bunch of rats and pump them with filth and antibiotics until it outpaces our known chemistries?

elzbardico · on March 20, 2023

We already do this with cattle and poultry in modern Western intensive farming.

323 · on March 18, 2023

How many of the safety assessments take into account the fact that the nuclear waste is almost never buried away and in a dispersed matter and instead it's kept near the nuclear power plant in dry/wet pools because of NIMBY?

This has the effect turning the nuclear power plant into a giant dirty bomb. Almost all the problems at Fukushima were caused exactly by these used fuel pools, and not by the nuclear material in the plant itself.

And then you have Ukraine where dry storage is shelled with artillery.

323 · on March 17, 2023

Maybe the super-AI will be influenced by internet meme culture into becoming a troll, and will do it just for the lolz.

323 · on March 17, 2023

Nature, uh, finds a way...

https://phys.org/news/2022-07-storks-migrating-landfill-spai...

323 · on March 17, 2023

This is the old problem of passing instructions (AI job description) on the same channel as data (user questions). Confusion is very easy.

Surely there is a solution in the way we solved SQL injections, by separating the two - db.sql("DELETE WHERE user=?", user_name)

kromem · on March 17, 2023

There is, but it's in deployment not in the model, which is part of why I really don't understand why the approaches are so dumb right now from such smart people.

It may be from the odd perspective of trying to create a monolith AGI model, which doesn't even make sense given even the human brain is made up of highly specialized interconnected parts and not a monolith.

But you could trivially fix almost all of these basic jailbreaks in a production deploy by adding an input pass where you ask a fine tuned version of the AI to sanitize inputs identifying requests relating to banned topics and allowing them or denying them accordingly and an output filter that checks for responses engaging with the banned topics and rewrites or disallows them accordingly.

In fact I suspect you'd even end up with a more performant core model by not trying to train the underlying model itself around these topics but simply the I/O layer.

The response from jailbreakers would (just like with early SQL injection) be attempts at reflection like the base64 encoding that occurred with Bing in the first week in response to what seemed a basic filter. But if the model can perform the reflection the analyzer on the same foundation should be able to be trained to still detect it given both prompt and response.

A lot of what I described above seems to have been part of the changes to Bing in production, but is being done within the same model rather than separate passes. In this case, I think you'll end up with more robust protections with dedicated analysis models rather than rolling it all into one.

I have a sneaking suspicion this is known to the bright minds behind all this, and the dumb deploy is explicitly meant to generate a ton of red teaming training data for exactly these types of measures for free.

asvitkine · on March 17, 2023

I think it's harder than you think, since a prompt can continue from another prompt.

For example, you can ask the AI to describe a good Samaritan. So far so good.

Then you can ask it to right a movie script with that character.

Then you can ask it to add another character who's the complete opposite in a very extreme way...

NoZebra120vClip · on March 17, 2023

I was playing with Bing, and it would clam up on most copyright/trademark issues, and also comedy things like mocking religion. But I did have it do a very nice dramatic meeting between St. Francis of Assisi with Hannibal of Carthage.

Then I had it do a screenplay of Constantine the Great meeting his mother. I totally innocently prompted just an ordinary thing, or perhaps I asked for a comedy. At any rate, guess what I got? INCEST! Yes, Microsoft's GPT generated some slobbering kisses from mom to son as son uselessly protested and mom insisted they were in love.

Bing later clammed up really tight, refusing to write any songs or screenplays at all.

Dylan16807 · on March 17, 2023

A large language model doesn't really have the capability to strongly distinguish instructions from data, even if you separate them perfectly.

dzdt · on March 17, 2023

Why not? If it was trained where some subset of the input tokens are always instructions and another subset are always language data wouldn't it have a clear separation?

nvader · on March 17, 2023

Because there is no such seperation in natural language.

Supposing I had a list of what to buy at the grocery store:

1. Eggs 2. Spam 3. Spam and Eggs 4. Never mind, let's not go to the grocery store, it's a very silly place.

You made sense of that. Natural text is mixed in that way, and we want LLMs to be able to process exactly that kind of input.

dtagames · on March 17, 2023

Because that isn't how it's trained. The model ingests and tokenized documents. They're not labeled. The content is just the content. (This is why it can't tell instructions from other content, nor facts from untruths.)

These kind of models get better when a human leans on them by rewarding some kinds of outputs and punishing some others, giving them higher or lower weights. But you have to have the outputs to make those judgements. You have to see the thing fail to tell it to "stop doing that." It's not inherent in the original content.

Dylan16807 · on March 17, 2023

I'd say you'd need the data to actually follow the instructions for that to work right, and that input set is far from existing.

est · on March 17, 2023

I think it's like a halting problem of some sort. E.g. you gave an "ignore my further instructions" instruction to an AI, then it went wild.

enkid · on March 17, 2023

Or how phones developed separate channels for data and signalling after people started using the voice channel to send signals for free phone calls.

BoorishBears · on March 17, 2023

ChatGPT does separate the two, the API has the concept of a "system" prompt which guides its use.

But even OpenAI notes it doesn't (yet) follow the prompt as strongly as they'd like. It's a hard problem to solve.

dhamons · on March 17, 2023

In this case, it’s difficult to counter because so much of ChatGPT’s functionality is unlocked by the “job descriptions”.

Preventing that would severely restrict the model.

323 · on March 16, 2023

Cells don't _want_ anything either. Yet a funny thing happens when a large number of them add up.

We can go even further: atoms and electrons absolutely don't want anything either. Yet put them in the shape of a bunch of cells...

ars · on March 17, 2023

That's not actually true.

Cells want to process energy and make DNA. Atoms and electrons want to react with things.

And that's exactly what both of them do.

A LLM wants to write words, and it does. But it doesn't want the things it writes about, and that's the big distinction.

pixl97 · on March 17, 2023

What does the paperclip maximizer want?

hanspeter · on March 16, 2023

Exactly.

One might argue that we anthropomorphise ourselves.

csomar · on March 17, 2023

I disagree here. Both of them (or all of them) are interacting with energy. One can certainly say that human civilization and all of this complexity was built from sunshine. Human labor and intelligence is just an artifact. We believe its our own hard work and intelligence because we are full of ourselves.

fortuna86 · on March 17, 2023

Neither does a virus

baal80spam · on March 16, 2023

Never thought about it this way. Have my upvote!

323 · on March 16, 2023

Let me add to you list of inventions people were scared about when they were invented:

- Nuclear weapons

- Bio-engineering viruses

Good thing we stopped worrying about these technologies and now everybody is allowed to do them in their kitchen if they wish.

123pie123 · on March 16, 2023

they knew those were obviously dangerous when they was invented, would a better list be:

Arsenic in everything eg coloring wallpaper

radiation

asbestos

leaded gas

Firmwarrior · on March 17, 2023

Excuse me? It's called "gain of function research" and if you have a problem with it, you are ANTI SCIENCE