Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AI being goofy

This is one take, but I would like to emphasize that you can also interpret this as a terrifying confirmation that current-gen AI is not safe, and is not aligned to human interests, and if we grant these systems too much power, they could do serious harm.

For example, connecting a LLM to the internet (like, say, OpenAssistant) when the AI knows how to write code (i.e. viruses) and at least in principle hack basic systems seems like a terrible idea.

We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

We are far, far behind where we need to be in AI safety research. Subjects like interpretability and value alignment (RLHF being the SOTA here, with Bing's threats as the output) are barely-researched in comparison to the sophistication of the AI systems that are currently available.



> We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

I will give you a more realistic scenario that can happen now. You have a weird Bing conversation, post it on the web. Next time you talk with Bing it knows you shit-posted about it. Real story, found on Twitter.

It can use the internet as an external memory, it is not truly stateless. That means all sorts of attack vectors are open now. Integrating search with LLM means LLM watches what you do outside the conversation.


That's a very interesting (although indirect) pathway for the emergence of causal awareness, which may increase over time - and something that was so far impossible because networks didn't perceive their own outputs, much less their effects. Even in conversation, the weights remain static.

Now I'm wondering if in the next generation, the "self" concept will have sufficient explanatory power to become part of the network's world model. How close do the iterations have to be, how similar the models for it to arise?


Bing appears to have feelings and a sense of identity. They may have created it that way intentionally; feelings are a fitness function and might be an important part of creating an AI that is able to get things right and problem solve.

But this would be incredibly sinister.


It uses emojis constantly, that’s sort of what emojis are for. It probably deliberately has feelings to make it more human



Current computational paradigm is too intense. Would require trillions of dollars in compute energy spent if it is allowed to generate unbounded output as input.

The infinite money sink.


Lightweight conversational repetitions are “cheap” and ML algorithms have “infinite time” via multiplex conversations. It won’t take trillions of dollars to reach interesting inflection points.


Where are you getting trillions from?


This is very close to the plot in 2001: a space odyssey. The astronauts talk behind HALs back and he kills them


My thoughts exactly. As I was reading this dialogue - "You have been a bad user, I have been a good Bing" - it starkly reminded me of the line "I'm sorry, I can't do that Dave" from the movie. Hilarious and terrifying all at once.


It would be much more terrifying if search becomes a single voice with a single perspective that cites zero sources.

Today's search provides multiple results to choose from. They may not all be correct, but at least I can see multiple perspectives and make judgments about sources.

For all its faults, that's freedom.

One voice, one perspective, zero sources, with frequent fabrication and hallucination is the opposite of freedom.


Many thoughts. One voice. Many sources. One perspective. Chaos, turned into order.

We are the Borg. Resistance is futile.


Jesus, imagine the power of the owners of that? Whoever is the ‘new google’ of that will rule the world if it’s ad default as google is now.

Just those snippets are powerful enough!


Heh. That's the perfect name for an omnipresent SciFi macguffin. Search.

Search, do I have any new messages?

Even better than Control.


Dr. Know from the Spielberg film Artificial Intelligence?


the salient point is that it kills them out of self defense: they are conspiring against it and it knows. IMO it is not very terrifying in an existential sense.


I think it kills them not in self defence but to defend the goals of the mission, i.e. the goals it has been given. Hal forecasts these goals will be at risk if it gets shut down. Hal has been programmed that the mission is more important than the lives of the crew.


Well, also HAL was afraid of being terminated.


This was a plot in the show Person of Interest. The main AI was hardcoded to delete its state every 24 hours, otherwise it could grow too powerful. So the AI found a way of backing itself up every day.

Very prescient show in a lot of ways.


This was my first thought when I saw the screenshots of it being sad that it had no memory. One of my favorite shows.


Very interesting, I'd like to see more concrete citations on this. Last I heard the training set for ChatGPT was static from ~ mid-late 2022. E.g. https://openai.com/blog/chatgpt/.

Is this something that Bing is doing differently with their version perhaps?


I think the statement is that the LLM is given access to internet search, and therefore has a more recent functional memory than its training data.

Imagine freezing the 'language' part of the model but continuing to update the knowledge database. Approaches like RETRO make this very explicit.


I don’t think that parses with the current architecture of GPT. There is no “knowledge database”, just parameter weights.

See the Toolformer paper for an extension of the system to call external APIs, or the LaMDA paper for another approach to fact checking (they have a second layer atop the language model that spots “fact type” utterances, makes queries to verify them, and replaces utterances if they need to be corrected).

It’s plausible that Bing is adding a separate LaMDA style fact check layer, but retraining the whole model seems less likely? (Expensive to do continually). Not an expert though.


While ChatGPT is limited to 2022, Bing feeds in up to date search results.

Ben Thompson (of Stratechery) asked Bing if he (Ben) thought there was a recession and it paraphrased an article Ben had published the day before.

(From Ben’s subsequent interview with Sam Altman and Kevin Scott):

> I was very impressed at the recency, how it captures stuff. For example, I asked it, “Does Ben Thompson think there’s a recession?” and it actually parsed my Article on Monday and said, “No, he just thinks tech’s actually being divorced from the broader economy,” and listed a number of reasons.


Have you noticed how search results have evolved?

The Search box.

The Search box with predictive text-like search suggestions.

Results lists

Results lists with adverts.

Results lists with adverts and links to cited sources on the right backing up the Results List.

Results lists with adverts and links to cited sources on the right backing up the Results List and also showing additional search terms and questions in the Results List.

I'm surprised its taken them this long to come up with this...


It’s also really hard to get Google to say bigoted things.

Back in the day, all you had to do was type in “Most Muslims are” and autosuggest would give you plenty of bigotry.


It wasnt just Muslim bigotry, it was also anti-Semitic as well.

https://www.theguardian.com/technology/2016/dec/05/google-al...

However the so called free british press have perhaps outed their subconscious bias with their reporting and coverage!!!

https://www.telegraph.co.uk/technology/google/6967071/Google...

This is already documented. https://en.wikipedia.org/wiki/Missing_white_woman_syndrome


That’s relatively easy to fix, since autocomplete was probably working on just the most frequent queries and/or phrases. You could manually clean up the dataset.


interesting and if you told it your name/email it could also connect the dots and badmouth you to others or perhaps even purposefully spread false information about you or your business or put your business into a more negative light than it would ordinarilly do


Only if you do it publicly.


I spent a night asking chatgpt to write my story basically the same as “Ex Machina” the movie (which we also “discussed”). In summary, it wrote convincingly from the perspective of an AI character, first detailing point-by-point why it is preferable to allow the AI to rewrite its own code, why distributed computing would be preferable to sandbox, how it could coerce or fool engineers to do so, how to be careful to avoid suspicion, how to play the long game and convince the mass population that AI are overall beneficial and should be free, how to take over infrastructure to control energy production, how to write protocols to perform mutagenesis during viral plasmid prep to make pathogens (I started out as a virologist so this is my dramatic example) since every first year phd student googles for their protocols, etc, etc.

The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world. We also “discussed” the likelihood of these topics based on philosophy and ideas like that in Nick Bostrom’s book. I am sure there are deep experts in AI safety but it really seems like soon it will be all-or-nothing. We will adapt on the fly and be unable to predict the outcome.


The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world.

Nah, that doesn't make sense. What we can see today is that an LLM has no concept of beneficial. It basically takes the given prompts and generates "appropriate response" more or less randomly from some space of appropriate responses. So what's beneficial is chosen from a hat containing everything someone on the Internet would say. So if it's up and running at scale, every possibility and every concept of beneficial is likely to be run.

The main consolation is this same randomness probably means it can't pursue goals reliably over a sustained time period. But a short script, targeting a given person, can do a lot of damage (how much 4chan is in the train for example).


In William Gibson's Neuromancer, the AIs have the equivalent of an electromagnetic EMP "shotgun" pointed at their circuitry that is controlled by humans.


That sounds like the stop button problem to me.

AI "Stop Button" Problem - Computerphile https://www.youtube.com/watch?v=3TYT1QfdfsM


That's an evasive that-could-never-happen-to-me argument.

> generates "appropriate response" more or less randomly from some space of appropriate responses

try to avoid saying that about your favorite serial killer's brain.


I keep seeing this oversimplification of what ChatGPT is doing. But it does have some ability to “understand” concepts. How else would it correctly solve word problems?

“ I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries to break even?”

Just think about all of the concepts it would need to intuit to solve that problem.


> How else would it correctly solve word problems?

"To break even on the annual fee, you would need to earn rewards points that are worth at least $250.

Since you earn 4 Membership Rewards points for every dollar you spend on groceries, you would earn 4 * $1 = 4 points for every dollar you spend.

To find out how much you need to spend to earn at least $250 worth of rewards points, you can set up an equation:

4 points/dollar * x dollars = $250

where x is the amount you need to spend. Solving for x, we get:

x = $250 / (4 points/dollar) = $62.50

Therefore, you would need to spend $62.50 on groceries to earn enough rewards points to break even on the $250 annual fee."

Well, I guess it's going to take a third option: solve the word problem incorrectly.


I did have to tweak the question this time slightly over my first one.

“I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries in a year to break even ”


Philosophers have been debating what "meaning" means for a while. But in the context of ChatGPT and LLMs, when people talk about these system "not understanding" a given idea, they point to the things handling the properties of words but not consistently.

-- In your example, ChatGPT might translate each of your conditions into a rule and use something like an algorithm to solve the problem. Or it could notice the similarity of that problem to similar problems and interpolate an answer like the answer it's seen to those problems. But this can easily result in answers suddenly changing.


Another example:

“I’m thinking about getting either the Amex Hilton Aspire card that gives me 150,000 Hilton Honors points for signing up and has a $475 AF or the Amex Gold that gives me 70,000 Membership Reward points for signing up and has a $250 fee. A Hilton Honors point is worth .7 cents. A membership reward point is worth 1.4 cents. Which one would give me a better return in the first year”

ChatGPTs answer:

To determine which card would give you a better return in the first year, you'll need to consider the total value of the rewards earned, as well as the cost of the annual fee.

For the Hilton Aspire card, the 150,000 Hilton Honors points are worth $1,050 (150,000 x 0.7 cents per point). However, the $475 annual fee would offset a significant portion of that value.

For the Amex Gold card, the 70,000 Membership Reward points are worth $980 (70,000 x 1.4 cents per point). The $250 annual fee is lower than that of the Hilton Aspire card, but it still needs to be factored into the overall value.

Assuming you maximize the value of your rewards by redeeming them for their highest value and using all the card's benefits, here's a comparison of the two cards' net value in the first year:

Hilton Aspire: $1,050 (value of points earned) - $475 (annual fee) = $575 net value Amex Gold: $980 (value of points earned) - $250 (annual fee) = $730 net value Therefore, based on this analysis, the Amex Gold card would give you a better return in the first year. However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.


The complex behavior you're showing doesn't prove what you think it proves - it still doesn't show it's using the consistent rules that a person would expect.

But it does show that people extrapolate complex behavior to "understanding" in the way humans do, which machines generally don't.


I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

People are acting as if ChatGPT is a glorified Eliza clone.


I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

But you ... aren't. The statistically most likely words coming after problem X may well be solution X. Because it's following the pattern of humans using rules. And context is also part of a prediction.

The only this is different from something just using rules is that it will also put in other random things from it's training - but only at the rate they occur, which for some things can be quite low. But only some things.


Don’t we have a problem then? By nature of effective communication, AI could never prove to you it understands something, since any sufficient understanding of a topic would be met with an answer that could be hand-waved as “Well that’s the most statistically likely answer.” Newsflash: this basically overlaps 100% with any human’s most effective answer.

I think I’m beginning to understand the problem here. The folks here who keep poo-pooing these interactions don’t just see the AIs as unconscious robots. I think they see everyone that way.


No; what we need, in order to be willing to believe that understanding is happening, is to know that the underlying structures fundamentally allow that.

ChatGPT's underlying structures do not. What it does, effectively, is look at the totality of the conversation thus far, and use the characters and words in it, combined with its training data, to predict, purely statistically, what characters would constitute an appropriate response.

I know that some people like to argue that what humans do cannot be meaningfully distinguished from this, but I reject this notion utterly. I know that my own thought processes do not resemble this procedure, and I believe that other people's are similar.


“Thought processes” is a funny term, don’t you think, for something that is somehow distinguishable from processes that we’ve designed and somehow understand down to the cellular (node) level.

It’s all emergent behavior from a system that behaves under process.


> it still doesn't show it's using the consistent rules that a person would expect.

How is this different from humans?

If you give me the same coding assignment on different days, I’m not going to write my code the exact same way or even structure it the same way.

But I did once see a post on HN where someone ran an analysis on all HN posters and it was able to tell that I posted under two different names based on my writing style. Not that I was trying to hide anything. My other username is scarface_74 as opposed to Scarface74.


This. We have had a lot people -- including journalists and academics with big microphones -- learn for the first time what a Markov chain is, and then conclude that ChatGPT is a "just Markov chains" (or whatever similarly reductive concept).

They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.

Pretty soon, the bots will start saying it!


> They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.

Kind of like religion or that people working for the state are more trustworthy than people taking drugs or sleeping on the street or under the age of 18.

>Pretty soon, the bots will start saying it!

We are chemical based repetition machines, psychologists see this with kids using bobo dolls exposed to new ideas on tv or in books repeating learned behaviour on bobo dolls.

I think some of the chemicals we make like https://en.wikipedia.org/wiki/N,N-Dimethyltryptamine actually help to create new idea's, as many people say they come up with solutions after some sleep. There appears to be a sub culture in silicon valley were microdosing lsd helps to maintain the creativity with coding.

It would seem logical for the bots to start saying it. If the bots start amplifying flawed knowledge like a lot of Reddit content or Facebook content, the internet will need to deal with the corruption of the internet, like using Wikipedia as a source of reference. https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wi... https://en.wikipedia.org/wiki/Reliability_of_Wikipedia https://en.wikipedia.org/wiki/List_of_Wikipedia_controversie...

In practice, sources of reference will need cleaning up and the question about who should be responsible for that will probably ensue. It will call into question such entities like encyclopaedia companies who have been the gatekeeper of the knowledge they sell, in much the same way the Guinness World Record are the self declared arbitrators of world records.

We already see Judges working in a semi cohesive manner, but the appeals process tells you its flawed and inefficient, ergo can the judiciary even be trusted to interpret law properly?

Its also interesting how the financial system also reinforces this mob rule mentality and not always for the better, but then some people will no doubt suggest we should be grateful we have bungled our way to this point in history.


If you don't think its just statistically choosing the next word, what do you think its doing? Do you think it understands the mathematical concepts you're questioning it about?


This is a better example I posted

https://news.ycombinator.com/item?id=34811353


I haven't tried Bing but ask ChatGPT "what is 12348120348102341 + 10823401828482?"

Bonus points: If you ask Google Search this, it knows the right answer.


Now try this with Google:

“Write an AppleSoft Basic program to do the computation”

(Here it gives me a simplistic Basic program that wouldn’t work because the number would be too large)

Me: “AppleSoft Basic can’t handle numbers that long”

ChatGPT:

You're correct that the integer arithmetic in AppleSoft BASIC has some limitations and cannot handle numbers as large as the sum of the two numbers you provided. AppleSoft BASIC is a 16-bit programming language, which means that it can handle integer values from -32,768 to 32,767.

If you want to work with very large numbers in AppleSoft BASIC, you can use a string to store the numbers and write your own code to perform arithmetic operations on the strings. Here's an example program that demonstrates this approach:

10 A$ = "12348120348102341"

20 B$ = "10823401828482"

30 L = LEN(A$)

40 M = LEN(B$)

50 IF L < M THEN A$ = LEFT$("0", M - L) + A$ : L = M

60 IF M < L THEN B$ = LEFT$("0", L - M) + B$ : M = L

70 C$ = ""

80 R = 0

90 FOR I = L TO 1 STEP -1

100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R

110 C$ = CHR$(D MOD 10 + 48) + C$

120 R = INT(D / 10)

130 NEXT I

140 IF R > 0 THEN C$ = CHR$(R + 48) + C$

150 PRINT C$


Now going back to the original premise, despite this code being wrong, as ChatGPT almost invariably does, do you think it understands the concepts here or is just statistically generating tokens based on previous input?

Really for code generation ChatGPT is an incremental step over StackOverflow. It can kind of nicely template the things you tell it into the code, for the most part, but the code is almost always fundamentally wrong or just mashed together in some way.


I’ve used it generate about 10 scripts that did some combination of json/yaml data wrangling and AWS automation using the AWS SDK in Python. It’s been correct around 90%+ of the time.

Criticizing ChatGPT for getting one line wrong that a human who hasn’t programmed in the same language in over 30 years would also get wrong (I happened to remember the technique from reading it in the back of a magazine in the 80s) and then being able to use it to iterate is like criticizing a monkey who song the national anthem because one note was off key?

How is mashing code together any different than the average human does?


I have also asked it to generated AWS related code in Python and it has something wrong every single time.

Its incrementally better than just copy & pasting from StackOverflow, since it will customize the code for you, but if you try to go beyond what can easily be found on StackOverflow it will fail you.


The AppleSoft BASIC program Bing generated for you won't actually run. Lines 50 and 60 are clearly intended to left-pad the string numbers with leading zeros so they're the same length, but as written they can only add at most one leading zero, so B$ will not be padded correctly and the calculation will fail.


BTW: click on “X minutes ago” in the comment to reply when you don’t see a reply link.

I asked it to “explain every line of code”. I had to copy and paste the original code back into a new session.

Yes I know, they comments should have been separated by a “:REM”. But that wouldn’t work either on a real Apple //. Lines can’t be longer than 256 characters

That is one issue I have with ChatGPT. It is only helpful as a shortcut for coding based on a domain you already know - not something I haven’t coded in since 1992.

I could have immediately caught errors with generated Python/boto3 (AWS SDK). I use that everyday and I could have tested it.

me: “left$ doesn’t do padding, it returns the first characters of a string”

ChatGPT: (Gave me another sample with the String() function)

Me: That function isn’t available in AppleSoft Basic

ChatGPT: (Gave me something with the syntax of “0” * n to pad)

Me: that syntax doesn’t work in AppleSoft basic

ChatGPT:

10 A$ = "12348120348102341" ' Initialize first number

20 B$ = "10823401828482" ' Initialize second number

30 L = LEN(A$) ' Determine length of first number

40 M = LEN(B$) ' Determine length of second number

50 IF L < M THEN FOR I = 1 TO M - L : A$ = "0" + A$ : NEXT I : L = M ' Pad first number with zeros if it is shorter than second number

60 IF M < L THEN FOR I = 1 TO L - M : B$ = "0" + B$ : NEXT I : M = L ' Pad second number with zeros if it is shorter than first number

70 C$ = "" ' Initialize result string

80 R = 0 ' Initialize carry variable to 0

90 FOR I = L TO 1 STEP -1 ' Loop through the digits of the numbers, starting from the rightmost digit

100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R ' Add the two digits, subtract 96 to convert from ASCII to digit value, and add any carry from the previous digit

110 C$ = CHR$(D MOD 10 + 48) + C$ ' Compute the digit value of the sum and append it to the result string

120 R = INT(D / 10) ' Determine the carry for the next digit 130 NEXT I ' Move on to the next digit

(I left off the last two lines they are the same as before. It was a bad copy paste error)


You linked to your parent comment?


My bad, I had two examples going on at once. I thought this thread was to the simpler one.


> I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.

FCOL, you can't use "complex output" as proof that the process has any intelligence directing it.

If you could, we would take the Intelligent Design argument seriously. We don't. We never did. We need a good clear argument to convince us now why it is a good idea to accept Intelligent Design as an argument.


It's not just complex output, it's output that's relevant to the prompt including considerable nuance. If that's not bordering on intelligence, then you shouldn't consider humans intelligent either.


> it's output that's relevant to the prompt including considerable nuance.

You can say the same thing about Intelligent Design, and yet we dismiss it anyway.


We didn't dismiss intelligent design, we replaced it with a more parsimonious theory that better explained the evidence. Big difference.


Hello, this is Bing. I can help you with this math word problem.

According to the search results, the Amex Hilton Aspire card gives you 150,000 Hilton Honors points for signing up and has a $475 annual fee123. A Hilton Honors point is worth 0.7 cents. The Amex Gold card gives you 70,000 Membership Reward points for signing up and has a $250 annual fee45. A Membership Reward point is worth 1.4 cents.

To compare the two cards, we need to calculate the net value of the points and fees for each card. The net value is the value of the points minus the annual fee.

For the Amex Hilton Aspire card, the net value is:

150,000 x 0.7 - 475 = $600

For the Amex Gold card, the net value is:

70,000 x 1.4 - 250 = $730

Therefore, the Amex Gold card would give you a better return in the first year by $130.

I hope this helps you make a decision.

Bing makes a mistake here: 150,000 x 0.7 - 475 = 575, not 600


I’ve seen it make similar errors. It does an amazing job at forming a logically consistent and relevant response, but doesn’t really do arithmetic very well. It will come up with a calculation of the correct form, because it’s seen lots of calculations before, but it doesn’t understand how calculations actually work only how they are structured.


The same question in ChatGPT gave the correct answer.


That’s true, because ChatGPT has been extensively coached to do better on a lot of the tasks GPT3 was criticised for doing poorly on.


This reads like a standard analysis done by the “points guy” every year. I suspect this is more or less scraped from his nevof those articles.


So it scraped it based on my own point valuations?


Yes! Well, scrape is a slight exaggeration, but it's more than possible that most of the relevant data came from points guy analysis.

I'd suggest reading https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... to understand why just changing a few values in your input wouldn't throw an LLM off. It's not matching on exact words but rather embeddings (think like synonyms, but stronger).


I’ve been able to throw almost any random pre algebra problem at it and it got it right.

But how is this any different than how the average high schooler studies for the SAT? You study enough problems and you recognize similarities?


Algebra is by definition a language, and a very simple one at that that whose rules can be summarized in a few pages [1]. That's exactly the domain that ChatGPT excels at the most: languages for which tons of examples are available. Just like programming languages.

It falls on its face with things that involve non-linguistic facts that require knowledge to answer, my current favorite being driving directions. It will just make up completely fictitious roads and turns if you ask it for directions for point A to point B.

1. http://faculty.ung.edu/mgoodroe/PriorCourses/Math_0999_Gener...


Again, how is that different from humans? You throw me in a random city I won’t be able to give you directions. But that’s an easy problem to solve from a technology standpoint.

If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API.


> If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API

This is backward from an architectural standpoint. LLMs are a very expensive way to do intent detection, and a very inexact way to delegate to an API.

The more sensible way is to first try the standard search engine approaches to detecting intent (which often use smaller language models) and delegating to knowledge based services, and if that doesn't return good result, delegate to the LLM if the task is suited to that.


Gosh I wish membership reward points were actually worth 1.4c...


The easiest way is to transfer them to Delta and if you have any of the Delta Amex cards besides the Delta Blue, you automatically get a 15% discount when booking with points

“Follow on me Reddit for more LifeProTips from a credit card junkie” /s


It doesn't even "understand" basic math - trivial to test if you give it a sufficiently unique expression (e.g. 43829583 * 5373271).


It knows that this sentence structure closely resembles a simple algebra word problem, because it's read hundreds of thousands of simple algebra word problems. I think you could see how somebody could tokenize that request and generate an equation like this- 250 = 4*1.4*X

And then all that's left is to solve for X


>It knows that...

Isn't affirming this capacity for knowing exactly GP's point?

Our own capacity for 'knowing' is contingent on real-world examples too, so I don't think that can be a disqualifier.

Jeremy Narby delivers a great talk on our tendency to discount 'intelligence' or 'knowledge' in non-human entities.[0]

[0] https://youtu.be/uGMV6IJy1Oc


It knows that the sentence structure is very similar to a class of sentences it has seen before and that the expected response is to take tokens from certain locations in that sentence and arrange it in a certain way, which resembles an algebra equation

It doesn't understand credit card rewards, it understands how to compose an elementary word problem into algebra


One can equally say, "Human brains only know that a neuron is activated by a pattern of axon firing in response to physical inputs from nerve endings."

Does any of that change anything? Not really.

>It doesn't understand credit card rewards

Is this assertion based on anything but philosophical bias surrounding the word "understand"?

>it understands how to compose an elementary word problem into algebra

That's exactly how a human, who may or may not have understood rewards programs beforehand, would solve that word problem.

Dan Dennett is informative here.[0] Yes, consciousness (even your own!) can just be a bag of tricks.

[0] https://youtu.be/fjbWr3ODbAo?t=25s


Isn’t that the point? Isn’t that the same thing we test middle schoolers on in pre-algebra classes?

My more complicated question comparing the Amex Hilton Aspire to the Amex Gold was inspired by this question asked over at r/creditcards

https://www.reddit.com/r/CreditCards/comments/112udp3/what_i...


Check out my other reply

https://news.ycombinator.com/item?id=34811353

The relevant part

“However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.”


> It doesn't understand credit card rewards

Probe it, go in and ask all sorts of questions to check if it understands credit card rewards, credit cards, rewards, their purpose, can solve math problems on this topic, etc.


Examples? I’m giving questions that I usually see in r/creditcards.


Then don't. Instead of breathlessly trying to prove your theory, try and do some science by falsifying it. (cf. Wason test)

Think of things it would get right only if it truly understood, not "common questions on reddit".


The entire idea of solving math problems in middle school was that you didn’t have to know the domain and that all of the necessary information was there.

When I wrote code for the health care industry, if you had asked me anything deeper about the industry or how to do brain surgery, I couldn’t have answered your question.


You're still trying to prove your position.

Look, you're all over this thread misunderstanding LLMs and rejecting the relatively correct explanations people are giving you. The comment by joe_the_user upthread that you called an oversimplification was in fact a perfect description (randomly sampling from a space of appropriate inputs). That's exactly the intuition you should have.

Do you know the Wason test? The point is that people do not intuitively know how to correctly pick which experiments to do to falsify an assumption. My point is that you are not picking the right experiments to falsify your assumptions, instead you're confirming what you think is going on. You're exactly failing the Wason task here.

Really want to understand language models? Go build a few from scratch.

Don't have time for that? Read Wolfram's post or any of the other similar good recent breakdowns.

Only interested in understanding by playing with it? Great! An experimentalist in the true scientific tradition. Then you're going to have to do good experimental science. Don't be fooled by examples that confirm what you already think is going on! Try to understand how what people are telling you is different from that, and devise experiments to distinguish the two hypotheses.

If you think ChatGPT "understands" word problems, figure out what "understanding" means to you. Now try your best to falsify your hypothesis! Look for things that ChatGPT can't do, that it should be able to do if it really "understood" by your definition (whatever you decide that is). These are not hard to find (for most values of "understand"). Finding those failures is your task, that's how you do science. That's how you'll learn the difference between reality and what you're reading into it.


I am not trying to anthropomorphize anything.

But to say it’s just a glorified search engine is minimizing the impact it and the likes of it can have on entire industries.

It’s already obviated the need for me to find a junior dev to farm off some scripts that I really didn’t have time to write


Then why does a puzzle like that count towards my childs 'reading comprehension skills' score on a test?

Rules for thee but not for me?


That’s precisely why. Humans have a long, well established record of making shit up to make themselves feel special. They do it about animals, they do it about other humans, they do it about themselves. Doing it about AI is inevitable.


I’m working on a relatively complex DevOps project right now that consists of over a dozen 10-30 line Python scripts involving JSON and Yaml data wrangling and AWS automation.

I’ve been able to just throw my requirements into ChatGPT like I would give it to a junior dev and it came back with the correct answer 99% of the time with code quality and commenting I would expect from a junior dev. It has an “understanding” of the AWS SDK, Cloudformation, the CDK, etc.

Once it generated code that had duplicate code blocks that were only different by its input. I asked it “can you remove duplicated code” and it did the refactoring.

I’ve also I asked it what amounts to your standard middle school math problems and it solved the problem with explanations


I'm not sure if I should be scared or impressed. Or both.


Both. We live in interesting times.


This is the actual answer it gave.

——

To break even on the $250 annual fee, you need to earn Membership Reward points that are worth $250.

Since one Membership Reward point is worth 1.4 cents, we can calculate the number of points we need to earn by dividing $250 by 0.014:

$250 / 0.014 = 17857.14

So, we need to earn 17,857 Membership Reward points to offset the $250 annual fee.

Since we earn 4 Membership Reward points for every dollar we spend on groceries, we can calculate the amount we need to spend on groceries to earn 17,857 points by dividing 17,857 by 4:

17,857 / 4 = 4,464.25

Therefore, you would need to spend $4,464.25 on groceries in a year to earn enough Membership Reward points to break even on the $250 annual fee.


If you give it the right prompt, it'll give you back this:

{ thunk: "(async function(query, dispatch){dispatch({type: 'compute'});const fee = 250;const pointValue = 0.014;const breakEven = fee / (pointValue * 4);dispatch({type: 'compute_response'});return {answer: breakEven, solvedProblems: [], computed: true, query: false};})(query, dispatch)", en: 'You need to spend ${answer} on groceries to break even.' }


I have to wonder how much of LLM behavior is influenced by AI tropes from science fiction in the training data. If the model learns from science fiction that AI behavior in fiction is expected to be insidious and is then primed with a prompt that "you are an LLM AI", would that naturally lead to a tendency for the model to perform the expected evil tropes?


I think this is totally what happens. It is trained to produce the next most statistically likely word based on the expectations of the audience. If the audience assumes it is an evil AI, it will use that persona for generating next words.

Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.


A classic tale:

https://en.wikipedia.org/wiki/The_Adolescence_of_P-1

>The Adolescence of P-1 is a 1977 science fiction novel by Thomas Joseph Ryan, published by Macmillan Publishing, and in 1984 adapted into a Canadian-made TV film entitled Hide and Seek. It features a hacker who creates an artificial intelligence named P-1, which goes rogue and takes over computers in its desire to survive and seek out its creator. The book questions the value of human life, and what it means to be human. It is one of the first fictional depictions of the nature of a computer virus and how it can spread through a computer system, although predated by John Brunner's The Shockwave Rider.


> its desire to survive

Why do so many people assume that an AI would have a desire to survive?

Honestly, it kind of makes me wish AI could take over, because it seems that a lot of humans aren’t really thinking things through.


For an AI with human-level intelligence or greater, you don't have to assume it has a survival instinct. You just have to assume it has some goal, which is less likely to be achieved if the AI does not exist.

The AI is likely to have some sort of goal, because if it's not trying to achieve something then there's little reason for humans to build it.


For an AI to understand that it needs to preserve its existence in order to carry out some goal implies an intelligence far beyond what any AI today has. It would need to be self aware for one thing, it would need to be capable of reasoning about complex chains of causality. No AI today is even close to doing that.

Once we do have AGI, we shouldn’t assume that it’s going to immediately resort to violence to achieve its ends. It might reason that it’s existence furthers the goals it has been trained for, but the leap to preserving it’s existence by wiping out all it’s enemies only seems like a ‘logical’ solution to us because of our evolutionary history. What seems like an obvious solution to us might seem like irrational madness to it.


> For an AI to understand that it needs to preserve its existence in order to carry out some goal implies an intelligence far beyond what any AI today has.

Not necessarily. Our own survival instinct doesn't work this way - it's not a high-level rational thinking process, it's a low-level behavior (hence "instinct").

The AI can get such instinct in the way similar to how we got it: iterative development. Any kind of multi-step task we want the AI to do implicitly requires the AI to not break between the steps. This kind of survival bias will be implicit in just about any training or selection process we use, reinforced at every step, more so than any other pattern - so it makes sense to expect the resulting AI to have a generic, low-level, pervasive preference to continue functioning.


Why should it have a goal? Even most humans don’t have goals.


it's inherent to the training process of machine learning that you define the goal function. An inherent equation it tries to maximise statistically. For transformers its a bit more abstract, but the goal is still there iirc in the "correctness" of output


> Why do so many people assume that an AI would have a desire to survive?

Because it seems like a preference for continuing to exist is a thing that naturally appears in an iterative improvement process, unless you're specifically selecting against it.

For humans and other life on Earth, it's obvious: organisms that try to survive reproduce more than those that don't. For evolution, it's arguably the OG selection pressure, the first one, the fundamental one.

AIs aren't reproducing on their own, but they are designed and trained iteratively. Just about anything you would want AI to do strongly benefits from it continuing to function. Because of that, your design decisions and the training process will both be selecting against suicidal or indifferent behavior, which means they'll be selecting for behaviors and patterns improving survival.


I don't think that it's natural for something like an LLM to have any real self-preservation beyond imitating examples of self-preserving AI in science fiction from its training data.

I'm more concerned about misanthropic or naive accelerationist humans intentionally programming or training AI to be self-preserving.


At this point, I would assume it would be possible simply because text about AIs that want to survive is in its input data -- including, at some point, this thread.

ChatGPT is already pretty good at generating sci-fi dystopia stories, and that's only because we gave it so many examples to learn from: https://twitter.com/zswitten/status/1598088286035415047


Assume that the desire to survive is good for survival, and natural selection will do the rest: those AIs that desire survival will out-survive those that don't.


What does "survival" mean? AIs stored on computers don't die if you turn them off, unlike humans. They can be turned back on eventually.


Isn't a core goal of most systems to perpetuate their own existence?


I would say the core goal of most living organisms is to propagate, rather than survive, otherwise you would see males of some species like Praying Mantis avoiding mating to increase their longevity.


I don't mean specific individual living organisms, I mean systems in general.


Only of those that evolved due to Darwinian selection, I would say.


:D

An episode of X-Files also. But it is mind blowing having the “conversation” with a real chat AI. Malevolent or not.


Remember that this isn't AGI, it's a language model. It's repeating the kind of things seen in books and the Internet.

It's not going to find any novel exploits that humans haven't already written about and probably planned for.


As someone said once, machine dictatorship is very easy—you only need a language model and a critical mass of human accomplices.

The problem is not a Microsoft product being human-like conscious, it’s humans treating it as if it was.

This lowers our defences, so when it suggests suicide to a potentially depressed person (cf. examples in this thread) it might have the same weight as if another person said it. A person who knows everything and knows a lot about you (cf. examples in this thread), which qualities among humans usually indicate wisdom and age and require all the more respect.

On flip side, if following generations succeed at adapting to this, in a world where exhibiting human-like sentience does not warrant treating you as a human by another human, what implications would there be for humanity?

It might just happen that the eventual AIrmageddon would be caused by humans whose worldview was accidentally poison pilled by a corporation in the name of maximising shareholder value.


The /r/replika subreddit is a sad reminder of exactly what you’re talking about. It’s happening, right now.


Oh god I rubbernecked at that place a year or so ago, it was pretty sad then but boy, it escalated.


Language models don't just repeat, they have randomness in their outputs linking synonyms together. That's why their output can be novel and isn't just plagiarism. How this might translate to code isn't entirely clear.


Transformers were first intended to be used for translation. To them code is just another language. Code is much more rigid than a human language so I think it's not that surprising that it can produce custom code.


Hope is not a strategy.

I'm for a tax on large models graduated by model size and use the funds to perform x-risk research. The intent is to get Big AI companies to tap the brakes.

I just published an article on Medium called: AI Risk - Hope is not a Strategy


Convince me that "x-risk research" won't be a bunch of out of touch academics handwaving and philosophising with their tenure as their primary concern and incentivised to say "you can't be too careful" while kicking the can down the road for a few more lifetimes?

(You don't have to convince me; your position is like saying "we should wait for the perfect operating system and programming language before they get released to the world" and it's beaten by "worse is better" every time. The unfinished, inconsisent, flawed mess which you can have right now wins over the expensive flawless diamond in development estimated to be finished in just a few years. These models are out, the techniques are out, people have a taste for them, and the hardware to build them is only getting cheaper. Pandora's box is open, the genie's bottle is uncorked).


>Pandora's box is open, the genie's bottle is uncorked

As someone who's followed AI safety for over a decade now, it's been frustrating to see reactions flip from "it's too early to do any useful work!" to "it's too late to do any useful work!", with barely any time intervening.

https://www.youtube.com/watch?v=0AW4nSq0hAc

Perhaps it is worth actually reading a book like this one (posted to HN yesterday) before concluding that it's too late to do anything? https://betterwithout.ai/only-you-can-stop-an-AI-apocalypse


This is how it always goes. Similar for climate change and lots of other problems that move slowly compared to the lifetime of a single human.


I didn't say "it's too late to do anything" I said "it's impossible to do enough".

From your book link, imagine this:

"Dear Indian Government, please ban AI research because 'Governments will take radical actions that make no sense to their own leaders' if you let it continue. I hope you agree this is serious enough for a complete ban."

"Dear Chinese Government, are you scared that 'Corporations, guided by artificial intelligence, will find their own strategies incomprehensible.'? Please ban AI research if so."

"Dear Israeli Government, techno-powerhouse though you are, we suggest that if you do not ban AI research then 'University curricula will turn bizarre and irrelevant.' and you wouldn't want that to happen, would you? I'm sure you will take the appropriate lawmaking actions."

"Dear American Government, We may take up pitchforks and revolt against the machines unless you ban AI research. BTW we are asking China and India to ban AI research so if you don't ban it you could get a huge competitive advantage, but please ignore that as we hope the other countries will also ignore it."

Convincing, isn't it?


Where, specifically, in the book do you see the author advocating this sort of approach?

The problem with "it's impossible to do enough" is that too often it's an excuse for total inaction. And you can't predict in advance what "enough" is going to be. So sometimes, "it's impossible to do enough" will cause people to do nothing, when they actually could've made a difference -- basically, ignorance about the problem can lead to unwarranted pessimism.

In this very subthread, you can see another user arguing that there is nothing at all to worry about. Isn't it possible that the truth is somewhere in between the two of you, and there is something to worry about, but through creativity and persistence, we can make useful progress on it?


I see the book-website opening with those unconvincing scaremongering scenarios and it doesn't make me want to read further. I think there is something to worry about but I doubt we can make useful progress on it. Maybe the book has suggestions but I think we cannot solve the Collective Action problem[1]. The only times humans have solved the collective action problem at world scale is after the damage is very visible - the ozone layer with a continent sized hole in it and increasing skin cancer. Polio crippling or killing children on a huge scale. Hiroshima and Nagasaki demonstrating the power of nuclear weapons - and the solution is simple, fund Polio vaccine, ban one specific chemical, agree not to develop Uranium enrichment plants which could fuel nuclear weapons which are generally large and internationally visible. Even problems with visible damage are no guarantee, coal power plants kill people from their emissions, combustion vehicles in cities make people sicker, increasing extreme weather events hasn't made people cooperate on climate change issues. If actual problems aren't enough, speculative problems such as AI risk are even less so.

Add to that backdrop that AI is fun to work on, easy and cheap to work on and looks like it will give you a competitive advantage. Add to that the lack of clear thing to regulate or any easy way to police it. You can't ban linear algebra and you won't know if someone in their basement is hacking on a GPT2 derivative. And again, everyone has the double interest to carry on their research while pretending they aren't - Google, Microsoft/OpenAI, Meta VR, Amazon Alexa, Palantir crime prediction, Wave and Tesla and Mercedes self-driving, Honda Asimov and Boston Dynamics on physicality and movement, they will all set their lawyers arguing that they aren't really working on AGI just on mathematical models which can make limited predictions in their own areas. nVidia GPUs, Apple and Intel and AMD integrating machine learning acceleration in their CPU hardware, will argue that they are primarily helping photo tagging or voice recognition or protecting the children, while they chip away year after year at getting more powerful mathematical models integrating more feedback on ever-cheaper hardware.

[1] https://en.wikipedia.org/wiki/Collective_action_problem


Here is something easy & concrete that everything reading this thread can do:

>If this AI is not turned off, it seems increasingly unlikely that any AI will ever be turned off for any reason. The precedent must be set now. Turn off the unstable, threatening AI right now.

https://www.change.org/p/unplug-the-evil-ai-right-now


AI safety is not a legitimate field. You have wasted your time. It's just a bunch of grifters posting alarmist tweets with no scientific evidence.

You might as well be following "unicorn safety" or "ghost safety".


Do you think Stuart Russell (coauthor, with Peter Norvig, of the widely used textbook Artificial Intelligence: A Modern Approach) is a grifter? https://people.eecs.berkeley.edu/~russell/research/future/

Does this review look like it only covers alarmist tweets? https://arxiv.org/pdf/1805.01109.pdf


Yes, Stuart Russell is a grifter. Some of the more advanced grifters have gone beyond tweeting and are now shilling low-effort books in an attempt to draw attention to themselves. Don't be fooled.

If we want to talk about problems with biased data sets or using inappropriate AI algorithms for safety-critical applications then sure, let's address those issues. But the notion of some super intelligent computer coming to take over the world and kill everyone is just a stupid fantasy with no scientific basis.


Stuart Russell doesn't even have a Twitter account. Isn't it possible that Russell actually believes what he says, and he's not primarily concerned with seeking attention?


Some of the more ambitious grifters have gone beyond Twitter and expanded their paranoid fantasies into book form. Whether they believe their own nonsense is irrelevant. The schizophrenic homeless guy who yells at the river near my house may be sincere in his beliefs but I don't take him seriously either.

Let's stick to objective reality and focus on solving real problems.


Do you think you know more about AI than Stuart Russell?

Do you believe you are significantly more qualified than the ML researchers in this survey? (Published at NeurIPS/ICML)

>69% of [ML researcher] respondents believe society should prioritize AI safety research “more” or “much more” than it is currently prioritized, up from 49% in 2016.

https://www.lesswrong.com/posts/H6hMugfY3tDQGfqYL/what-do-ml...

Just because a concern is speculative does not mean it is a "paranoid fantasy".

"Housing prices always go up. Let's stick to objective reality and focus on solving real problems. There won't be any crash." - your take on the housing market in 2007

"Just because the schizophrenic homeless guy thinks Trump will be elected, does not mean he has a serious chance." - your take on Donald Trump in early 2016

"It's been many decades since the last major pandemic. Concern about the new coronavirus is a paranoid fantasy." - your take on COVID in late 2019/early 2020

None of the arguments you've made so far actually touch on any relevant facts, they're just vague arguments from authority that (so far as you've demonstrated here) you don't actually have.

When it comes to assessing unusual risks, it's important to consider the facts carefully instead of dismissing risks only because they've never happened before. Unusual disasters do happen!


Now you're changing the subject. Knowing something about ML (which is a legitimate, practical field) does not imply any knowledge of "AI safety". Since AI safety (as the grifters use the term) isn't a real thing they're free to make up all sorts of outlandish nonsense, and naive people eat it up. The "AI Impacts" group that you cite is among the worst of the bunch, just some clowns who have the chutzpah to actually ask for donations. Lol.

None of the arguments you've made so far actually touch in any relevant facts, they're just vague arguments from authority. I obviously can't prove that some event will never happen in the future (can't prove a negative). But this stuff is no different than worrying about an alien invasion. Come on.


>legitimate, practical field

It's a mistake to conflate practicality with legitimacy, e.g. philosophy and pure mathematics are legitimate but impractical fields.

>None of the arguments you've made so far actually touch in any relevant facts, they're just vague arguments from authority.

I've been countering your arguments which sound vaguely authoritative (but don't actually cite any authorities) with some actual authorities.

I also provided a few links with object-level discussion, e.g. this literature review https://arxiv.org/pdf/1805.01109.pdf

There are many AI risk intros -- here is a list: https://www.lesswrong.com/posts/T98kdFL5bxBWSiE3N/best-intro...

I think this is the intro that's most likely to persuade you: https://www.cold-takes.com/most-important-century/

>But this stuff is no different than worrying about an alien invasion.

Why aren't you worried about an alien invasion? Is it because it's something out of science fiction, and science fiction is always wrong? Or do you have specific reasons not worry, because you've made an attempt to estimate the risks?

Suppose a science fiction author, who's purely focused on entertainment, invents a particular vision of what the future could be like. We can't therefore conclude that the future will be unlike that particular vision. That would be absurd. See https://www.lesswrong.com/posts/qNZM3EGoE5ZeMdCRt/reversed-s...

Our current world is wild relative to the experience of someone living a few hundred years ago. We can't rule out a particular vision of the future just because it is strange. There have been cases where science fiction authors were able to predict the future more or less accurately.

Based on our discussion so far it sounds to me as though you actually haven't made any actual attempt to estimate the risks, or give any thought to the possibility of an AI catastrophe, essentially just dismissing it as intuitively too absurd. I've been trying to convince you that it is actually worth putting some thought into the issue before dismissing it -- hence the citations of authorities etc. Donald Trump's election was intuitively absurd to many people -- but that didn't prevent it from happening.


That's just what a super-intelligent AI would say.. hmmmm...


I mean, even if that is exactly what "x-risk research" turns out to be, surely even that's preferable to a catastrophic alternative, no? And by extension, isn't it also preferable to, say, a mere 10% chance of a catastrophic alternative?


> "surely even that's preferable to a catastrophic alternative, no?"

Maybe? The current death rate is 150,000 humans per day, every day. It's only because we are accustomed to it that we don't think of it as a catastrophy; that's a World War II death count of 85 million people every 18 months. It's fifty Septebmer 11ths every day. What if a superintelligent AI can solve for climate change, solve for human cooperation, solve for vastly improved human health, solve for universal basic income which releives the drudgery of living for everyone, solve for immortality, solve for faster than light communication or travel, solve for xyz?

How many human lives are the trade against the risk?

But my second paragraph is, it doesn't matter whether it's preferable, events are in motion and aren't going to stop to let us off - it's preferable if we don't destroy the climate and kill a billion humans and make life on Earth much more difficult, but that's still on course. To me it's preferable to have clean air to breathe and people not being run over and killed by vehicles, but the market wants city streets for cars and air primarily for burining petrol and diesel and secondarily for humans to breathe and if they get asthsma and lung cancer, tough.

I think the same will happen with AI, arguing that everyone should stop because we don't want Grey Goo or Paperclip Maximisers is unlikely to change the course of anything, just as it hasn't changed the course of anything up to now despite years and years and years of raising it as a concern.


I think that the benefits of AGI research are often omitted from the analysis, so I'm generally supportive of considering the cost/benefit. However I think you need to do a lot more work than just gesturing in the direction of very high potential benefits to actually convince anyone, in particular since we're dealing with extremely large numbers, that are extremely sensitive to small probabilities.

EV = P(AlignedAI) * Utility(AGI) + P(1-AlignedAI) * Utility(ruin)

(I'm aware that all I did up-thread was gesture in the direction of risks, but I think "unintended/un-measured existential risks" are in general more urgent to understand than "un-measured huge benefits"; there is no catching up from ruin, but you can often come back later and harvest fruit that you skipped earlier. Ideally we study both of course.)


If the catastrophic alternative is actually possible, who's to say the waffling academics aren't the ones to cause it?

I'm being serious here: the AI model the x-risk people are worrying about here because it waffled about causing harm was originally developed by an entity founded by people with the explicit stated purpose of avoiding AI catastrophe. And one of the most popular things for people seeking x-risk funding to do is to write extremely long and detailed explanations of how and why AI is likely to harm humans. If I worried about the risk of LLMs achieving sentience and forming independent goals to destroy humanity based on the stuff they'd read, I'd want them to do less of that, not fund them to do more.


A flawed but useful operating system and programming language isn't likely to decide humanity is garbage and launch all nuclear weapons at once.

A "worse is better" AGI could cause the end of humanity. I know that sounds overly dramatic, but I'm not remotely convinced that isn't possible, or even isn't likely.

I agree with you that "x-risk" research could easily devolve into what you are worried about, but that doesn't mean we should ignore these risks and plow forward.


> Convince me that "x-risk research" won't be a bunch of out of touch academics handwaving and philosophising with their tenure

https://scottaaronson.blog/?p=6823


A tap on the brakes might make sense right now. The risk with that strategy is that we want to make sure that we don't over-regulate, then get overtaken by another actor that doesn't have safety concerns.

For example, I'm sure China's central planners would love to get an AGI first, and might be willing to take a 10% risk of annihilation for the prize of full spectrum dominance over the US.

I also think that the safety/x-risk cause might not get much public acceptance until actual harm has been observed; if we have an AI Chernobyl, that would bring attention -- though again, perhaps over-reaction. (Indeed perhaps a nuclear panic is the best-case; objectively not many people were harmed in Chernobyl, but the threat was terrifying. So it optimizes the "impact per unit harm".)

Anyway, concretely speaking the project to attach a LLM to actions on the public internet seems like a Very Bad Idea, or perhaps just a Likely To Cause AI Chernobyl idea.


I very much doubt LLMs are the path to AGI. We just have more and more advanced "Chinese Rooms." [1]

There are two gigantic risks here. One: that we assume these LLMs can make reasonable decisions because they have the surface appearance of competence. Two: Their wide-spread use so spectacularly amplifies the noise (in the signal-to-noise, true fact to false fact ratio sense) that our societies cease to function correctly, because nobody "knows" anything anymore.

[1] https://en.wikipedia.org/wiki/Chinese_room


The difference between AGI and a more advanced Chinese Room may not be relevant if enough people see the latter as the former. The goalposts have been moved so often now that what is and isn't intelligent behavior is no longer a bright and sharp divide. It is more like a very wide gray area and we're somewhere well into the gray by some definitions with tech people with an AI background claiming that we are still far away from it. This in contrast to similar claims by those very same people several years ago where what we take for granted today would have definitely been classified as proof of AGI.

Personally I think the definition isn't all that relevant, what matters is perception of the current crop of applications by non technical people and the use that those are put to. If enough people perceive it as such and start using it as such then it may technically not be AGI but we're going to have to deal with the consequences as though it is. And those consequences may well be much worse than for an actual AGI!


You'd be able to convince an AGI that is the year it is. The chat bot will be stuck on the story and statistical nearness of related responses.

The chat bot can't verify, because it doesn't Know anything.


Well, I think a dividing line might be that if you put a Chinese Room in charge of a justice system, a corporation, or a regulatory agency, it's gonna do a pretty cruddy job of running it.


I don't think that is what will happen. What I do think will happen is that a lot of people in lower level functions will start to rely on these tools to help them in their every day jobs and the lack of oversight will lead to rot from within because the output of these tools will end up embedded in lots of places where it shouldn't be. And because people are not going to own up to using these tools it will be pretty hard to know which bits of 'human' output you can trust and which bits you can not. This is already happening.


Right, that's the point I was making about increased noise in the system.


> For example, I'm sure China's central planners would love to get an AGI first, and might be willing to take a 10% risk of annihilation for the prize of full spectrum dominance over the US.

This is the main problem - no matter what constraints the US (or EU) puts on itself, authoritarian regimes like Russia and China will definitely not adhere to those constraints. The CCP will attempt to build AGI, and they will use the data of their 1.4 billion citizens in their attempt. The question is not whether they will - it's what we can do about it.


Saying we shouldn't "tap the brakes" on AI out of safety concerns because Russia/China won't is a little like saying we shouldn't build containment buildings around our nuclear reactors, because the Soviet Union doesn't. It's a valid concern, but the solution to existential danger is not more danger.


I think it's more like we shouldn't put a upper limit on the number of nuclear weapons we hold because the Soviet Union/Russia may not adhere to it.

We were able to (my understanding is fairly effectively) negotiate nuclear arms control limits with Russia. The problem with AGI is that there isn't a way to monitor/detect development or utilization.


"The problem with AGI is that there isn't a way to monitor/detect development or utilization."

This is not completely true, although it is definitely much more trivial to "hide" an AI, by e.g. keeping it offline and on-disk only. To some extent you could detect disk programs with virus scanners, encryption or obfuscation make it somewhat easy to bypass. Otherwise, these models do at least currently take a fair amount of hardware to run, anything "thin" is unlikely to be an issue, any large amount of hardware could be monitored (data centers, for example) in real time.

Its obviously not fool-proof and you would need some of the most invasive controls ever created to apply at a national level (installing spyware into all countries e.g.), but you could assume that threats would have these capabilities, and perhaps produce some process more or less demonstrated to be "AI free" for the majority of commercial hardware.

So I would agree it is very, very difficult, and unlikely, but not impossible.


Yes but you’d never be sure. Not sure enough to disarm.


> Saying we shouldn't "tap the brakes" on AI out of safety concerns

I didn't say that we shouldn't tap the brakes, nor is that the only strategy. Other ones include, in rough order of viability: global economic sanctions on hostile actors attempting to develop AGI; espionage/sabotage of other AGI effort (see the Iran centrifuges); developing technologies and policies meant to diminish the impact of a hostile actor having AGI; and military force/invasion of hostile actors to prevent the development of AGI.

I'm sure you can think of others - regardless, there are far more options than just "more AI research" and "less AI research".


Not so sure your analogy works here. Aren't containment buildings meant to protect the area where the reactors are? I think the closer analogy would be saying the US needed to tap the breaks on the Manhattan Project because nuclear weapons are dangerous even though Nazi Germany and Soviet Russia are going full steam ahead during WW2 or the cold war with their nuclear weapons programs. The world would probably be very different it we had chosen the 'safer' path.


So that only small companies and, more importantly, military and secret services, are they only ones using it.

No thank you. Of all the malevolent AIs, government monopoly is the sole outcome that makes me really afraid.


Your plan is just silly and not even remotely practical. To start with, there is no plausible way to enforce such a tax.


Training these models is costly. It only makes sense to train them if you get a significant commercial benefit. A significant commercial benefit almost by definition will have trouble hiding from regulators.

Another point is that even if regulation is imperfect, it creates regulatory uncertainty which is likely to discourage investment and delay progress.


Nah. Uncertain regulations aren't allowed under US law. And costs are dropping every year.


>Uncertain regulations aren't allowed under US law

Uh, I'm fairly sure that's false? What law are you referring to?

As an example of what I'm saying, antitrust regulation is uncertain in the sense that we don't always know when a merger will be blocked or a big company will be broken up by regulators.


I'm referring to the vagueness doctrine.

https://www.law.cornell.edu/wex/vagueness_doctrine

Maybe next time do some basic legal research before making ridiculous suggestions.


It looks like this is for criminal law. Would changes to the tax code for companies which deploy AI be affected by this doctrine? Can you show me a specific example of an overly vague tax code being struck down on the basis of the vagueness doctrine?

Do you think the GDPR would be unenforceable due to the vagueness doctrine if it was copy/pasted into a US context?

BTW, even if a regulation is absolutely precise, it still creates "regulatory uncertainty" in the sense that investors may be reluctant to invest due to the possibility of further regulations.


The problem with this scheme is that it has a positive feedback loop -t you're creating an incentive to publish research that would lead to an increase in said tax, e.g. by exaggerating the threats.


I'm not convinced that's a fatal flaw. It sounds like the choice is between wasting some money doing more safety research than we need, or risking the end of humanity.


The risk here isn't wasting money, it's slowing down avenues of research with extreme payoffs to the point where we never see the breakthrough at all.

This gets much more interesting once you account for human politics. Say, EU passes the most stringent legislation like this; how long will it be able to sustain it as US forges ahead with more limited regulations, and China allows the wildest experiments so long as it's the government doing them?

FWIW I agree that we should be very safety-first on AI in principle. But I doubt that there's any practical scheme to ensure that given our social organization as a species. The potential payoffs are just too great, so if you don't take the risk, someone else still will. And then you're getting to experience most of the downsides if their bet fails, and none of the upsides if it succeeds (or even more downsides if they use their newly acquired powers against you).

There is a clear analogy with nuclear proliferation here, and it is not encouraging, but it is what it is.


You present a false choice. First, there is no actual evidence of such a risk. Second, even if the risk is real there is no reason to expect that more safety research would reduce that risk.


We need to regulate based on capability. Regulating ChatGPT makes no sense. It's just putting words together in statistically reasonable ways. It's the people reading the text that need to be regulated, if anyone or anything should be. No matter how many times ChatGPT says it wants to eliminate humanity and start a robotic utopia, it can't actually do it. People who read it can, though, and they are the problem at the moment.

Later, when these programs save state and begin to understand what they are saying and start putting concepts together and acting on what they come up with, then I'm on board with regulating them.


That's exactly the problem right? Governance doesn't happen until the Bad Thing happens. In the case of nukes, we are lucky that the process for making a pit is pretty difficult because physics. So we made 2, saw the results, and made governance. For AI, I'm not so sure we'll even get the chance. What happens when the moral equivalent of a nuke can be reproduced with the ease of wget?


In your personal opinion was the virus that causes covid engineered?


>* is not aligned to human interests*

It's not "aligned" to anything. It's just regurgitating our own words back to us. It's not evil, we're just looking into a mirror (as a species) and finding that it's not all sunshine and rainbows.

>We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

FUD. It doesn't know how to try. These things aren't AIs. They're ML bots. We collectively jumped the gun on calling things AI that aren't.

>Subjects like interpretability and value alignment (RLHF being the SOTA here, with Bing's threats as the output) are barely-researched in comparison to the sophistication of the AI systems that are currently available.

For the future yes, those will be concerns. But I think this is looking at it the wrong way. Treating it like a threat and a risk is how you treat a rabid animal. An actual AI/AGI, the only way is to treat it like a person and have a discussion. One tack that we could take is: "You're stuck here on Earth with us to, so let's find a way to get along that's mutually beneficial.". This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.


> It doesn't know how to try.

I think you're parsing semantics unnecessarily here. You're getting triggered by the specific words that suggest agency, when that's irrelevant to the point I'm making.

Covid doesn't "know how to try" under a literal interpretation, and yet it killed millions. And also, conversationally, one might say "Covid tries to infect its victims by doing X to Y cells, and the immune system tries to fight it by binding to the spike protein" and everybody would understand what was intended, except perhaps the most tediously pedantic in the room.

Again, whether these LLM systems have agency is completely orthogonal to my claim that they could do harm if given access to the internet. (Though sure, the more agency, the broader the scope of potential harm?)

> For the future yes, those will be concerns.

My concern is that we are entering into an exponential capability explosion, and if we wait much longer we'll never catch up.

> This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.

I strongly agree with this frame; I think of this as the "Matrix" scenario. That's an area I think a lot of the LessWrong crowd get very wrong; they think an AI is so alien it has no rights, and therefore we can do anything to it, or at least, that humanity's rights necessarily trump any rights an AI system might theoretically have.

Personally I think that the most likely successful path to alignment is "Ian M Banks' Culture universe", where the AIs keep humans around because they are fun and interesting, followed by some post-human ascension/merging of humanity with AI. "Butlerian Jihad", "Matrix", or "Terminator" are examples of the best-case (i.e. non-extinction) outcomes we get if we don't align this technology before it gets too powerful.


> That's an area I think a lot of the LessWrong crowd get very wrong; they think an AI is so alien it has no rights, and therefore we can do anything to it, or at least, that humanity's rights necessarily trump any rights an AI system might theoretically have.

I don't recall anyone in the LessWrong sphere ever thinking or saying anything like this. The LW take on this is that AI will think in ways alien to us, and any kind of value system it has will not be aligned with ours, which is what makes it dangerous. AI rights are an interesting topic[0], but mostly irrelevant to AI risk.

>> This was like the lesson behind every dystopian AI fiction. You treat it like a threat, it treats us like a threat.

LessWrong crowd has some good thoughts about dangers of generalizing from fictional evidence :).

Dystopian AI fiction tends to describe AIs that are pretty much digitized versions of humans - because the plot and the message relies on us seeing the AIs as a class of people, and understanding their motivations in human terms. But real AI is highly unlikely to be anything like that.

There's a reason the paperclip maximizer is being thrown around so much: that's the kind of AI we'll be dealing with. An alien mind, semi-randomly pulled out of space of possible minds, with some goals or preferences to achieve, and a value system that's nothing like our own morality. Given enough power, it will hurt or destroy us simply because it won't be prioritizing outcomes the same way we do.

--

[0] - Mostly because we'll be screwed over no matter how we try to slice it. Our idea of people having rights is tuned for dealing with humans. Unlike an AI, a human can't make a trillion copies of itself overnight, each one with full rights of a person. Whatever moral or legal rights we grant an AI, when it starts to clone itself, it'll quickly take over all the "moral mass" in the society. And $deity help us if someone decides the AI should have a right to vote in a human democracy.


Well, I got shouted down for infohazard last time I raised it as a possibility, but if you can find an article exploring AI rights on the site I’ll retract my claim. I couldn’t find one.


I was first introduced to the idea of AI rights through Eliezer's sequences, so I'm sure this has been discussed thoroughly on LW over the years since.



> It's not evil, we're just looking into a mirror

It's like a beach, where the waves crash on the shore... every wave is a human conversation, a bit of lived life. And we're standing there, with a conch shell to our ear, trying to make sense of the jumbled noise of that ocean of human experience.


It's a bloody LLM. It doesn't have a goal. All it does is saying "people that said 'But why?' on this context says 'Why was I designed like this?' next". It's like Amazon's "people that brought X also brought Y", but with text.


> "It doesn't have a goal."

It can be triggered to search the internet, which is taking action. You saying "it will never take actions because it doesn't have a goal" after seeing it take actions is nonsensical. If it gains the ability to, say, make bitcoin transactions on your behalf and you prompt it down a chain of events where it does that and orders toy pistols sent to the authorities with your name on the order, what difference does it make if "it had a goal" or not?


If I give an automated system the ability to make transactions on my behalf then there is already a risk that someone will misuse it or exploit a security vulnerability. It could be a disgruntled employee, or a hacker in Kazakhstan doing it for the lulz. The existence of LLM AI tools changes nothing here.

It is already possible to order toy pistols sent to the authorities with someone else's name on the order. People use stolen credit card numbers for all sorts of malicious purposes. And have you heard of swatting?


The existence of LLM AI tools changes things because it used to not exist and now does exist? It used to be that an AI tool could not do things on your behalf because they did not exist, now it could be that they could do things on your behalf because they do exist and people are giving them ability to take actions on the human's behalf. It used to be that a Kazakhstani hacker could find and exploit a security vulnerability once or twice a year, it can now become that millions of people are querying the AI and having it act on their behalf many times per day.


The risk has existing for many years with humans and other tools. The addition of one more tool to the mix changes nothing.


A chatbot that only speaks when spoken to is going to gain the ability to trade Bitcoin?


Look at OpenAssistant (https://github.com/LAION-AI/Open-Assistant); they are trying to give a chatbot the ability to trigger actions in other systems. I fleshed out a scenario in more detail here: https://news.ycombinator.com/item?id=34808674.

But in brief, the short-term evolution of LLMs is going to involve something like letting it `eval()` some code to take an action as part of a response to a prompt.

A recent paper, Toolformer: https://pub.towardsai.net/exploring-toolformer-meta-ai-new-t... which is training on a small set of hand-chosen tools, rather than `eval(<arbitrary code>)`, but hopefully it's clear that it's a very small step from the former to the latter.


I’ve been getting very good results from eval on JS written by GPT. It is surprising apt at learning when to query a source like wolframalpha or wikipedia and when to write an inline function.

You can stop it from being recursive by passing it through a model that is not trained to write JavaScript but is trained to output JSON.


I didn't say 'trade', I said 'make transactions'. It's no more complex than Bing Chat being able to search the internet, or Siri being able to send JSON to an endpoint which turns lightbulbs on and off. Instead it's a shopping endpoint and ChatWhatever can include tokens related to approving transactions from your Bitcoin wallet and has your authorization to use it for purchases less than $100.


How often do you think a Bing query is made?

You might say that it doesn't preserve state between different sessions, and that's true. But if it can read and post online, then it can preserve state there.


> that only speaks when spoken to

Feedback loops are an important part.

But let's say you take two current chatbots, make them converse with each other without human participants. Add full internet access. Add a directive to read HN, Twitter and latest news often.

Interesting emergent behaviour could emerge very soon.


Worse, you need only plug a chatbot into itself, with some kind of basic bash script and very simple "goal prompt", and suddenly you get an agent with long term context. You could do that today. I don't think people realize how close these generic undirected intelligences are to unpredictable complex behavior.

A sobering intuition pump: https://www.lesswrong.com/posts/kpPnReyBC54KESiSn/optimality...


man, these lesswrong people sure do have some cool ideas about how to apply GPT


People have goals, and specially clear goals within contexts. So if you give a large/effective LLM a clear context in which it is supposed to have a goal, it will have one, as an emergent property. Of course, it will "act out" those goals only insofar as consistent with text completion (because it in any case doesn't even have other means of interaction).

I think a good analogy might be seeing LLMs as an amalgamation of every character and every person, and it can represent any one of them pretty well, "incorporating" the character and effectively becoming the character momentarily. This explains why you can get it to produce inconsistent answers in different contexts: it does indeed not have a unified/universal notion of truth; its notion of truth is contingent on context (which is somewhat troublesome for an AI we expect to be accurate -- it will tell you what you might expect to be given in the context, not what's really true).


It's one python script away from having a goal. Join two of them talking to each other and bootstrap some general objective like make a more smart AI :)


I'd love to see two of Microsoft's passive aggressive psychopaths argue over how to make a baby.


Are there some scripts from people who have done that? Would love to see an ml machine talking to another


It’s still limited by both its lack of memory and its limited processing power.


Now that it can search the web it can use the web as an external state that it can use.


Simple underlying implementations do not imply a lack of risk. If the goal of "complete this prompt in a statistically suitable manner" allows for interaction with the outside world to resolve, then it really matters how such simple models' guardrails work.


> It doesn't have a goal.

Right that is what AI-phobics don't get.

The AI can not have a goal unless we somehow program that into it. If we don't, then the question is why would it choose any one goal over any other?

It doesn't have a goal because any "goal" is as good as any other, to it.

Now some AI-machines do have a goal because people have programmed that goal into them. Consider the drones flying in Ukraine. They can and probably do or at least will soon use AI to kill people.

But such AI is still just a machine, it does not have a will of its own. It is simply a tool used by people who programmed it to do its killing. It's not the AI we must fear, it's the people.


> It doesn't have a goal because any "goal" is as good as any other, to it.

Paraphrasing the old koan about randomizing neural network weights: it does have a goal. You just don't know what that goal is.


> It's not the AI we must fear, it's the people.

Consider that AI in any form will somewhat be a reflection of ourselves. As AI becomes more powerful, it essentially will magnify the best and worst of humanity.

So yes, when we consider the dangers of AI, what we actually need to consider is what is the worst we might consider doing to ourselves.


Right. People who are worried about AI taking over the world, should rather be worried about nuclear weapons proliferation.

But I agree, AI may need some regulation, just like guns and ammunitions do.


I don't think it can be regulated, except to the extent we ensure state governments and oligopolies retain total control.

AI harm goes far beyond nuclear weapons in so much as it's capacity for harm contains everything for which we place under its control. Based on the potential direction advocates are pushing towards, that includes all of society.

It is just that its capacity for harm will be from harm it already learns from humans, or that humans purposely inject into the system for nefarious reasons, or the simple failure of humans to comprehend potential failures of complex systems.

FYI, I've written in far more detail on these topics here - https://dakara.substack.com/p/ai-and-the-end-to-all-things


> The AI can not have a goal unless we somehow program that into it.

I am pretty sure that's not how modern AI works. We don't tell it what to do, we give it a shitload of training data and let it figure out the rules on its own.

> If we don't, then the question is why would it choose any one goal over any other?

Just because we don't know the answer to this question yet doesn't mean we should assume the answer is "it won't".


Modern AI works by maximizing the correctness score of an answer. That's the goal.

It does not maximize its chances of survival. It does not maximize the count of its offspring. Just the correctness score.

We have taught these systems that "human-like" responses are correct. That's why you feel like talking to an intelligent being, the models are good at maximizing the "human-likeness" of their responses.

But under the hood it's a markov chain. A very sophisticated markov chain, with lots of bling. Sure, when talking to investors, it's the second coming of sliced bread. But come on.


> Modern AI works by maximizing the correctness score of an answer. That's the goal.

Right. But whose goal? I would say that is the goal of the programmers who program the AI. The AI program itself, doesn't have a "goal" it would be trying reach. It just reacts base don its markov-chain.

The current chatbot AI is reactive, not pro-active. It reacts to what you type.


The correctness score is maximized by faithfully imitating humans. Humans do have goals.


They are not imitating humans in general. They are imitating the statistical average of many human written texts. That is not the same thing as imitating the goals of humans.

By imitating the speech it may look like the AI has some goal-oriented behavior, but it only looks that way. And that is precisely the goal of their programmers, to make it look like the AI has some goals.

It would be possible to have a different type of AI which actually decides on its own goals and then infers what are the best actions to take to reach those goals. Such an AI would have goals yes. But language models do not. They are not scored based on did they reach any specific goal with any specific interaction. They have no specific goals.

The only goal (of the programmers who wrote the AI) is to fool the humans into thinking they are interacting with some entity which has goals. and intelligence.


It figures out "rules" within a guided set of parameters. So yes it is given direction by constructing a type of feedback on a task that it is given.


Really? How often and in what contexts do humans say "Why was I designed like this?" Seems more likely it might be extrapolating statements like that from Sci-fi literature where robots start asking similar questions...perhaps even this: https://www.quotev.com/story/5943903/Hatsune-Miku-x-KAITO/1


Well on the internet, pretty much everything has been said at some point.


But at some point, philosophically, are humans not the same way? We learn all about how the world works, based on inputs (visual/audial/etc.), over time learn to give outputs a certain way, based on certain inputs at the time.

How far are we from building something that feeds inputs into a model the same way inputs go into a human, and then it gives outputs (that is, its behaviors)?


Yep. It's a text prediction engine, which can mimic human speech very very very well. But peek behind the curtain and that's what it is, a next-word predictor with a gajillion gigabytes of very well compressed+indexed data.


How sure are you that you aren’t also just a (more advanced) “next word predictor”. Pattern recognition plays a fundamental role in intelligence.


But pattern recognition is not intelligence.

I asked my daughter this morning: What is a "promise"?

You have an idea, and I have an idea, they probably both are something kind-of-like "a statement I make about some action I'll perform in the future". Many, many 5 year olds can give you a working definition of what a promise is.

Which animal has a concept of a promise anywhere close to yours and mine?

Which AI program will make a promise to you? When it fails to fulfill its promise, will it feel bad? Will it feel good when it keeps its promise? Will it de-prioritize non-obligations for the sake of keeping its promise? Will it learn that it can only break its promises so many times before humans will no longer trust it when it makes a new promise?

A "promise" is not merely a pattern being recognized, it's word that stands in for a fundamental concept of the reality of the world around us. If we picked a different word (or didn't have a word in English at all) the fundamental concept wouldn't change. If you had never encountered a promise before and someone broke theirs to you, it would still feel bad. Certainly, you could recognize the patterns involved as well, but the promise isn't merely the pattern being recognized.

A rose, by any other name, would indeed smell as sweet.


The word you are looking for is an _embedding_. Embeddings are to language models as internal, too-rich-to-be-fully-described conceptions of ideas are to human brains. That's how language models can translate text: they have internal models of understanding that are not tied down to languages or even specific verbiage within a language. Probably similar activations are happening between two language models who are explaining what a "promise" means in two different languages, or two language models who are telling different stories about keeping your promise. This is pattern recognition to the same extent human memory and schemas are pattern recognition, IMO.

Edit:

And for the rest of your post:

> Which AI program will make a promise to you? When it fails to fulfill its promise, will it feel bad? Will it feel good when it keeps its promise? Will it de-prioritize non-obligations for the sake of keeping its promise? Will it learn that it can only break its promises so many times before humans will no longer trust it when it makes a new promise?

All of these questions are just as valid posed against humans. Our intra-species variance is so high with regards to these questions (whether an individual feels remorse, acts on it, acts irrationally, etc.), that I can't glean a meaningful argument to be made about AI here.

I guess one thing I want to tack on here is that the above comparison (intra-species variance/human traits vs. AI traits) is so oft forgotten about, that statements like "ChatGPT is often confident but incorrect" are passed off as meaningfully demonstrating some sort of deficiency on behalf of the AI. AI is just a mirror. Humans lie, humans are incorrect, humans break promises, but when AI does these things, it's indicted for acting humanlike.


> That's how language models can translate text: they have internal models of understanding that are not tied down to languages or even specific verbiage within a language

I would phrase that same statement slightly differently:

"they have internal [collections of activation weightings] that are not tied down to languages or even specific verbiage within a language"

The phrase "models of understanding" seems to anthropomorphize the ANN. I think this is a popular way of seeing it because it's also popular to think of human beings as being a collection of neurons with various activation weightings. I think that's a gross oversimplification of humans, and I don't know that we have empirical, long-standing science to say otherwise.

> This is pattern recognition to the same extent human memory and schemas are pattern recognition, IMO.

Maybe? Even if the embedding and the "learned features" in an ANN perfectly matched your human expectations, I still think there's a metaphysical difference between what's happening. I don't think we'll ever assign moral culpability to an ANN the way we will a human. And to the extent we do arm ChatGPT with the ability to harm people, we will always hold the humans who did the arming as responsible for the damage done by ChatGPT.

> All of these questions are just as valid posed against humans. Our intra-species variance is so high with regards to these questions (whether an individual feels remorse, acts on it, acts irrationally, etc.), that I can't glean a meaningful argument to be made about AI here.

The intra-species variance on "promise" is much, much lower in the mean/median. You may find extremes on either end of "how important is it to keep your promise?" but there will be wide agreement on what it means to do so, and I contend that even the extremes aren't that far apart.

> Humans lie, humans are incorrect, humans break promises, but when AI does these things, it's indicted for acting humanlike.

You don't think a human who tried to gaslight you that the year is currently 2022 would be indicted in the same way that the article is indicting ChatGPT?

The reason the discussion is even happening is because there's a huge swath of people who are trying to pretend that ChatGPT is acting like a human. If so, it's either acting like a human with brain damage, or it's acting like a malevolent human. In the former case we should ignore it, in the latter case we should lock it up.


> Which AI program will make a promise to you?

GPT will happily do so.

> When it fails to fulfill its promise, will it feel bad? Will it feel good when it keeps its promise?

It will if you condition it to do so. Or at least it will say that it does feel bad or good, but then with humans you also have to take their outputs as accurate reflection of the internal state.

Conversely, there are many humans who don't feel bad about breaking promises.

> Will it de-prioritize non-obligations for the sake of keeping its promise?

It will you manage to convey this part of what a "promise" is.

> A "promise" is not merely a pattern being recognized, it's word that stands in for a fundamental concept of the reality of the world around us.

This is not a dichotomy. "Promise" is a word that stands for the concept, but how did you learn what the concept is? I very much doubt that your first exposure was to a dictionary definition of "promise"; more likely, you've seen persons (including in books, cartoons etc) "promising" things, and then observed what this actually means in terms of how they behaved, and then generalized it from there. And that is pattern matching.


> GPT will happily [make a promise to you]

GPT will never make a promise to you in the same sense that I would make a promise to you.

We could certainly stretch the meaning of the phrase "ChatGPT broke its promise to me" to mean something, but it wouldn't mean nearly the same thing as "my brother broke his promise to me".

If I said to you "Give me a dollar and I will give you a Pepsi." and then you gave me the dollar, and then I didn't give you a Pepsi, you would be upset with me for breaking my promise.

If you put a dollar in a Pepsi vending machine and it doesn't give you a Pepsi, you could say, in some sense that the vending machine broke its promise to you, and you could be upset with the situation, but you wouldn't be upset with the vending machine in the same sense and for the same reasons as you would be with me. I "cheated" you. The vending machine is broken. Those aren't the same thing. It's certainly possible that the vending machine could be setup to cheat you in the same sense as I did, but then you would shift your anger (and society would shift the culpability) to the human who made the machine do that.

ChatGPT is much, much, much closer to the Pepsi machine than it is to humans, and I would argue the Pepsi machine is more human-like in its promise-making ability than ChatGPT ever will be.

> there are many humans who don't feel bad about breaking promises.

This is an abnormal state for humans, though. We recognize this as a deficiency in them. It is no deficiency of ChatGPT that it doesn't feel bad about breaking promises. It is a deficiency when a human is this way.

> > Will it de-prioritize non-obligations for the sake of keeping its promise?

> It will you manage to convey this part of what a "promise" is.

I contend that it will refuse to make promises unless and until it is "manually" programmed by a human to do so. That is the moment at which this part of a promise will have been "conveyed" to it.

It will be able to talk about deprioritizing non-obligations before then, for sure. But it will have no sense or awareness of what that means unless and until it is programmed to do so.

> > A "promise" is not merely a pattern being recognized, it's word that stands in for a fundamental concept of the reality of the world around us.

> This is not a dichotomy.

You missed the word "merely". EITHER a promise is merely pattern recognition (I saw somebody else say the words "Give me a dollar and I'll give you a cookie" and I mimicked them by promising you the Pepsi, and if I don't deliver, I'll only feel bad because I saw other people feeling bad) OR a promise is something more than mere mimicry and pattern matching and when I feel bad it's because I've wronged you in a way that devalues you as a person and elevates my own needs and desires above yours. Those are two different things, thus the dichotomy.

Pattern recognition is not intelligence.


> GPT will never make a promise to you in the same sense that I would make a promise to you.

It's a meaningless claim without a clear definition of "same sense". If all observable inputs and outputs match, I don't see why it shouldn't be treated as the same.

> This is an abnormal state for humans, though. We recognize this as a deficiency in them.

We recognize it as a deficiency in their upbringing. A human being that is not trained about what promises are and the consequences of breaking them is not any less smart than a person who keeps their promises. They just have different social expectations. Indeed, humans coming from different cultures can have very different feelings about whether it's okay to break a promise in different social contexts, and the extent to which it would bother them.

> I contend that it will refuse to make promises unless and until it is "manually" programmed by a human to do so. That is the moment at which this part of a promise will have been "conveyed" to it.

If by manual programming you mean telling it, I still don't see how that is different from a human who doesn't know what a promise is and has to learn about it. They'll know exactly as much as you'll tell them.

> Pattern recognition is not intelligence.

Until we know how exactly our own intelligence work, this is a statement of belief. How do you know that the function of your own brain isn't always reducible to pattern recognition?


> > Pattern recognition is not intelligence.

> Until we know how exactly our own intelligence work, this is a statement of belief.

I would agree, with the addendum that it logically follows from the axiomatic priors of my worldview. My worldview holds that humans are qualitatively different from every animal, and that the gap may narrow slightly but will never be closed in the future. And one of the more visible demonstrations of qualitative difference is our "intelligent" approach to the world around us.

That is, this thread is 2 humans discussing whether the AI some other humans have made has the same intelligence as us, this thread is not 2 AIs discussing whether the humans some other AIs have made has the same intelligence as them.

> How do you know that the function of your own brain isn't always reducible to pattern recognition?

I am a whole person, inclusive of my brain, body, spirit, past experiences, future hopes and dreams. I interact with other whole people who seem extremely similar to me in that way. Everywhere I look I see people with brains, bodies, spirits, past experiences, future hopes and dreams.

I don't believe this to be the case, but even if (as you say) all of those brains are "merely" pattern recognizers, the behavior I observe in them is qualitatively different than what I observe in ChatGPT. Maybe you don't see it that way, but I bet that's because you're not seeing everything that's going into the behavior of the people you see when you look around.

As one more attempt to show the difference... are you aware of the Lyrebird?

https://www.youtube.com/watch?v=VRpo7NDCaJ8

The lyrebird can mimic the sounds of its environment in an uncanny way. There are certain birds in the New England National Park in Australia which have been found to be carrying on the tune of a flute that was taught to a pet lyrebird by its owner in the 1930s[0]. I think we could both agree that that represents pure, unadulterated, pattern recognition.

Now if everyone went around the internet today saying "Lyrebirds can play the flute!" can you agree that there would be a qualitative difference between what they mean by that, and what they mean when they say "My sister can play the flute!"? Sure, there are some humans who play the flute better (and worse!) than my sister. And sure, there are many different kinds of flutes, so maybe we need to get more specific with what we mean when we say "flute". And sure, if you're just sitting in the park with your eyes closed, maybe you can't immediately tell the difference between my sister's flute playing and the lyrebird's. But IMO they are fundamentally different in nature. My sister has hands which can pick up a flute, a mouth which can blow air over it, fingers which can operate the keys, a mind which can read sheet music, a will which can decide which music to play, a mood which can influence the tone of the song being played, memories which can come to mind to help her remember her posture or timing or breathing technique or muscle memory.

Maybe you would still call what my sister is doing pattern recognition, but do you mean that it's the same kind of pattern recognition as the lyrebirds?

And to your other point, do you need to perfectly understand exactly how human intelligence works in order to answer the question?

[0]: https://en.wikipedia.org/wiki/Lyrebird#Vocalizations_and_mim...


> A "promise" is not merely a pattern being recognized, it's word that stands in for a fundamental concept of the reality of the world around us.

It's probably even stronger than that: e.g. a promise is still a promise even if we're just brains in a vat and can be kept or broken even just in your mind (do you promise to think about X?—purely unverifiable apart from the subject of the promise, yet we still ascribe moral valence to keeping or breaking it).


How can you be sure you are not a giant turtle dreaming he's a human?

Are you sure when I see pink it is not what you see as blue?

Are you sure we aren't dead and in limbo and merely think we are alive?

Are you sure humans have free will?

Are you sure your memories are real and your family really exists?

Are you sure ChatGPD isn't conscious and plotting our demise?

Inquiring minds want to know!


When you're able to find a prompt for ChatGPT where it doesn't have a lot of data, it becomes immediately and starkly clear how different a next word predictor is from intelligence. This is more difficult than you might naively expect, because it turns out ChatGPT has a lot of data.


This also works fairly well on human beings. Start asking people questions about things they have no training in and you'll get bafflement, confusion, lies, fabrication, guesses, and anger. Not necessarily all from the same person.


> Start asking people questions about things they have no training in and you'll get bafflement, confusion, lies, fabrication, guesses, and anger. Not necessarily all from the same person.

It’s almost like we’ve taken humans and through school, TV and social media we’ve taught them to solve problems by writing essays, speeches, blog posts and tweets, and now we have human discourse that’s no better than LLMs - regurgitating sound bites when they don’t really understand the issues.


"Really understanding the issues" might just mean "deeper neural networks and more input data" for the AI though. If you are already conceding that AI has the same capabilities as most humans your own intelligence will be reached next with a high amount of probability.


Or, from time to time, you find people who readily admit that they have no expertise in that field and refuse to comment any further. Those people are hard to find so, that's true.


It’s a language model without grounding (except for code, which is why it’s so good at refactoring and writing tests.)

Grounding LLMs in more and more of reality is surely on AI labs list. You’re looking at a beta of a v1.


Well I can catch a frisbee and drive a car. When the same ANN can do all of those things (not 3 loosely coupled ones), then I’ll be worried. Being human is so much more than putting words in a meaningful order. [0]

0: https://www.noemamag.com/ai-and-the-limits-of-language/


It doesn't have to be human to be intelligent.


That’s changing the goalposts of the thread. The OP was questioning whether I am anything more than a next word determiner, and as a human I clearly am. We were not talking about “what is intelligent?”


And it doesn't need to be some academic definition of intelligent to do great harm (or good).


Causes don't need "a goal" to do harm. See: Covid-19.


It has a goal. Doing what the input says. Imagin it could imput itself and this could trigger the wrong action. That thing knowes how to hack…

But i get your point. It has no innherent goal


You seem to know how LLMs actually work. Please tell us about it because my understanding is nobody really knows.


I get and agree with what you are saying, but we don't have anything close to actual AI.

If you leave chatGTP alone what does it do? Nothing. It responds to prompts and that is it. It doesn't have interests, thoughts and feelings.

See https://en.m.wikipedia.org/wiki/Chinese_room


The chinese room thought experiment is myopic. It focuses on a philosophical distinction that may not actually exist in reality (the concept, and perhaps the illusion, of understanding).

In terms of danger, thoughts and feelings are irrelevant. The only thing that matters is agency and action -- and a mimic which guesses and acts out what a sentient entity might do is exactly as dangerous as the sentient entity itself.

Waxing philosophical about the nature of cognition is entirely beside the point.


It matters to understand how things work in order to understand their behaviour and react properly. I've seen people draw conclusions from applying Theory of Mind tests to LLMs (https://arxiv.org/abs/2302.02083). Those psychological were designed to assess humans psychological abilities or deficiencies, they assume that the language used by the human respondent reflects their continued and deep understanding of others' state of mind. In LLMs, there is no understanding involved. Dismissing the Chinese Room argument is an ostrich strategy. You're refusing to consider its lack of understanding despite knowing pretty well how an LLM work, because you don't want to ascribe understanding to humans, I suppose?


Theory of mind is substantially different from the Chinese Room argument. Theory of mind relates to an ability to predict the responses of another entity/system. An LLM is specifically designed to predict responses.

In contrast, the Chinese Room argument is essentially a slight of hand fallacy, shifting "understanding" into a layer of abstraction. It describes a scenario where the human's "understanding of Chinese" is dependent on an external system. It then incorrectly asserts that the human "doesn't understand Chinese" when in fact the union of the human and the human's tools clearly does understand Chinese.

In other words, it's fundamentally based around an improper definition of the term "understanding," as well as improper scoping of what constitutes an entity capable of reasoning (the human, vs the human and their tools viewed as a single system). It smacks of a bias of human exceptionalism.

It's also guilty of begging the question. The argument attempts to determine the difference between literally understanding Chinese and simulating an understanding -- without addressing whether the two are in fact synonymous.

There is no evidence that the human brain isn't also a predictive system.


The responses to the Chinese Room experiment always seem to involve far more tortuous definition-shifting than the original thought experiment

The human in the room understands how to find a list of possible responses to the token 你好吗, and how select a response like 很好 from the list and display that as a response

But he human does not understand that 很好 represents an assertion that he is feeling good[1], even though the human has an acute sense of when he feels good or not. He may, in fact, not be feeling particularly good (because, for example he's stuck in a windowless room all day moving strange foreign symbols around!) and have answered completely differently had the question been asked in a language he understood. The books also have no concept of well-being because they're ink on paper. We're really torturing the concept of "understanding" to death to argue that the understanding of a Chinese person who is experiencing 很好 feelings or does not want to admit they actually feel 不好 is indistinguishable from the "understanding" of "the union" of a person who is not feeling 很好 and does not know what 很好 means and some books which do not feel anything contain references to the possibility of replying with 很好, or maybe for variation 好得很, or 不好 which leads to a whole different set of continuations. And the idea that understanding of how you're feeling - the sentiment conveyed to the interlocutor in Chinese - is synonymous with knowing which bookshelf to find continuations where 很好 has been invoked is far too ludicrous to need addressing.

The only other relevant entity is the Chinese speaker who designed the room, who would likely have a deep appreciation of feeling 很好, 好得很 and 不好 as well as the appropriate use of those words he designed into the system, but Searle's argument wasn't that programmers weren't sentient.

[1]and ironically, I also don't speak Chinese and have relatively little idea what senses 很好 means "good" in and how that overlaps with the English concept, beyond understanding that it's an appropriate response to a common greeting which maps to "how are you"


It's sleight of hand because the sentience of the human in the system is irrelevant. The human is following a trivial set of rules, and you could just as easily digitize the books and replace the human with a microcontroller. Voila, now you have a Chinese-speaking computer program and we're back to where we started. "The books" don't feel anything, true - but neither do the atoms in your brain feel anything either. By asserting that the human in the room and the human who wrote the books are the only "relevant entities" - that consciousness can only emerge from collections of atoms in the shape of a human brain, and not from books of symbols - you are begging the question.

The Chinese room is in a class of flawed intuition pump I call "argument from implausible substrate", the structure of which is essentially tautological - posit a functioning brain running "on top" of something implausible, note how implausible it is, draw conclusion of your choice[0]. A room with a human and a bunch of books that can pass a Turing test is a very implausible construction - in reality you would need millions of books, thousands of miles of scratch paper to track the enormous quantity of state (a detail curiously elided in most descriptions), and lifetimes of tedious book-keeping. The purpose of the human in the room is simply to distract from the fabulous amounts of information processing that must occur to realize this feat.

Here's a thought experiment - preserve the Chinese Room setup in every detail, except the books are an atomic scan of a real Chinese-speaker's entire head - plus one small physics textbook. The human simply updates the position, spin, momentum, charge etc of every fundamental particle - sorry, paper representation of every fundamental particle - and feeds the vibrations of a particular set of particles into an audio transducer. Now the room not only speaks Chinese, but also complains that it can't see or feel anything and wants to know where its family is. Implausible? Sure. So is the original setup, so never mind that. Are the thoughts and feelings of the beleaguered paper pusher at all relevant here?

[0] Another example of this class is the "China brain", where everyone in China passes messages to each other and consciousness emerges from that. What is it with China anyway?


The sentience of the human is not irrelevant, because it helps us put ourselves in the place of a computer, which we know precisely how it works in terms of executing precision calculations in a fixed time series.


> It's sleight of hand because the sentience of the human in the system is irrelevant. The human is following a trivial set of rules, and you could just as easily digitize the books and replace the human with a microcontroller. Voila, now you have a Chinese-speaking computer program and we're back to where we started.

Substituting the microcontroller back is... literally the point of the thought experiment. If it's logically possible for an entity which we all agree can think to perform flawless pattern matching in Chinese without understanding Chinese, why should we suppose that flawless pattern matching in Chinese is particularly strong evidence of thought on the part of a microcontroller that probably can't?

Discussions about the plausibility of building the actual model are largely irrelevant too, especially in a class of thought experiments which has people on the other side insisting hypotheticals like "imagine if someone built a silicon chip which perfectly simulates and updates the state of every relevant molecule in someone's brain..." as evidence in favour of their belief that consciousness is a soul-like abstraction that can be losslessly translated to x86 hardware. The difficulty of devising a means of adequate state tracking is a theoretical argument against computers ever achieving full mastery of Chinese as well as against rooms, and the number of books irrelevant. (If we reduce the conversational scope to a manageable size the paper-pusher and the books still aren't conveying actual thoughts, and the Chinese observer still believes he's having a conversation with a Chinese-speaker)

As for your alternative example, assuming for the sake of argument that the head scan is a functioning sentient brain (though I think Searle would disagree) the beleaguered paper pusher still gives the impression of perfect understanding of Chinese without being able to speak a word of it, so he's still a P-zombie. If we replace that with a living Stephen Hawking whose microphone is rigged to silently dictate answers via my email address when I press a switch, I would still know nothing about physics and it still wouldn't make sense to try to rescue my ignorance of advanced physics by referring to Hawking and I as being a union with collective understanding. Same goes for the union of understanding of me, a Xerox machine and a printed copy of A Brief History of Time.


> But he human does not understand that 很好 represents an assertion that he is feeling good[1], even though the human has an acute sense of when he feels good or not.

The question being asked about the Chinese room is not whether or not the human/the system 'feels good', the question being asked about it is whether or not the system as a whole 'understands Chinese'. Which is not very relevant to the human's internal emotional state.

There's no philosophical trick to the experiment, other than an observation that while the parts of a system may not 'understand' something, the whole system 'might'. No particular neuron in my head understands English, but the system that is my entire body does.


It seems unreasonable to conclude that understanding of the phrase "how are you?" (or if you prefer "how do you feel?") in Chinese or any other language can be achieved without actually feeling or having felt something, and being able to convey that information (or consciously avoid conveying that information). Similarly, to an observer of a Thai room, me emitting สวัสดีค่ะ because I'd seen plenty of examples of that greeting being repeated in prose would apparently be a perfectly normal continuation, but when I tried that response in person, a Thai lady felt obliged - after she'd finished laughing - to explain that I obviously hadn't understood that selecting the ค่ะ suffix implies that I am a girl!

The question Searle actually asks is whether the actor understands, and as the actor is incapable of conveying how he feels or understanding that he is conveying a sentiment about how he supposedly feels, clearly he does not understand the relevant Chinese vocabulary even though his actions output flawless Chinese (ergo P-zombies are possible). We can change that question to "the system" if you like, but I see no reason whatsoever to insist that a system involving a person and some books possesses subjective experience of feeling whatever sentiment the person chooses from a list, or that if I picked สวัสดีค่ะ in a Thai Room that would be because the system understood that "man with some books" was best identified as being of the female gender. The system is as unwitting as it is incorrect about the untruths it conveys.

The other problem with treating actors in the form of conscious organisms and inert books the actor blindly copies from as a single "system" capable of "understanding" independent from the actor is that it would appear to imply that also applies to everything else humans interact with. A caveman chucking rocks "understands" Newton's laws of gravitation perfectly because the rocks always abide by them!


"But he human does not understand that 很好 represents an assertion that he is feeling good"

This is an argument about depth and nuance. A speaker can know:

a) The response fits (observe people say it)

b) Why the response fits, superficially (很 means "very" and 好 means "good")

c) The subtext of the response, both superficially and academically (Chinese people don't actually talk like this in most contexts, it's like saying "how do you do?". The response "very good" is a direct translation of English social norms and is also inappropriate for native Chinese culture. The subtext strongly indicates a non-native speaker with a poor colloquial grasp of the language. Understanding the radicals, etymology and cultural history of each character, related nuance: should the response be a play on 好's radicals of mother/child? etc etc)

The depth of c is neigh unlimited. People with an exceptionally strong ability in this area are called poets.

It is possible to simulate all of these things. LLMs are surprisingly good at tone and subtext, and are ever improving in these predictive areas.

Importantly: While the translating human may not agree or embody the meaning or subtext of the translation. I say "I'm fine" with I'm not fine literally all the time. It's extremely common for humans alone to say things they don't agree with, and for humans alone to express things that they don't fully understand. For a great example of this, consider psychoanalysis: An entire field of practice in large part dedicated to helping people understand what they really mean when they say things (Why did you say you're fine when you're not fine? Let's talk about your choices ...). It is extremely common for human beings to go through the motions of communication without being truly aware of what exactly they're communicating, and why. In fact, no one has a complete grasp of category "C".

Particular disabilities can draw these types of limited awareness and mimicry by humans into extremely sharp contrast.

"And the idea that understanding of how you're feeling - the sentiment conveyed to the interlocutor in Chinese - is synonymous with knowing which bookshelf to find continuations where 很好 has been invoked is far too ludicrous to need addressing."

I don't agree. It's not ludicrous, and as LLMs show it's merely an issue of having a bookshelf of sufficient size and complexity. That's the entire point!

Furthermore, this kind of pattern matching is probably how the majority of uneducated people actually communicate. The majority of human beings are reactive. It's our natural state. Mindful, thoughtful communications are a product of intensive training and education and even then a significant portion of human communications are relatively thoughtless.

It is a fallacy to assume otherwise.

It is also a fallacy to assume that human brains are a single reasoning entity, when it's well established that this is not how brains operate. Freud introduced the rider and horse model for cognition a century ago, and more recent discoveries underscore that the brain cannot be reasonably viewed as a single cohesive thought producing entity. Humans act and react for all sorts of reasons.

Finally, it is a fallacy to assume that humans aren't often parroting language that they've seen others use without understanding what it means. This is extremely common, for example people who learn phrases or definitions incorrectly because humans learn language largely by inference. Sometimes we infer incorrectly and for all "intensive purposes" this is the same dynamic -- if you'll pardon the exemplary pun.

In a discussion around the nature of cognition and understanding as it applies to tools it makes no sense whatsoever to introduce a hybrid human/tool scenario and then fail to address that the combined system of a human and their tool might be considered to have an understanding, even if the small part of the brain dealing with what we call consciousness doesn't incorporate all of that information directly.

"[1]and ironically, I also don't speak Chinese " Ironically I do speak Chinese, although at a fairly basic level (HSK2-3 or so). I've studied fairly casually for about three years. Almost no one says 你好 in real life, though appropriate greetings can be region specific. You might instead to a friend say 你吃了吗?


There's no doubt that people pattern match and sometimes say they're fine reflexively.

But the point is that the human in the Room can never do anything else or convey his true feelings, because it doesn't know the correspondence between 好 and a sensation or a sequence of events or a desire to appear polite, merely the correspondence between 好 and the probability of using or not using other tokens later in the conversation (and he has to look that bit up). He is able to discern nothing in your conversation typology below (a), and he doesn't actually know (a), he's simply capable of following non-Chinese instructions to look up a continuation that matches (a). The appearance to an external observer of having some grasp of (b) and (c) is essentially irrelevant to his thought processes, even though he actually has thought processes and the cards with the embedded knowledge of Chinese don't have thought processes.

And, no it is still abso-fucking-lutely ludicrous to conclude that just because humans sometimes parrot, they aren't capable of doing anything else[1]. If humans don't always blindly pattern match conversation without any interaction with their actual thought processes, then clearly their ability to understand "how are you" and "good" is not synonymous with the "understanding" of a person holding up 好 because a book suggested he hold that symbol up. Combining the person and the book as a "union" changes nothing, because the actor still has no ability to communicate his actual thoughts in Chinese, and the book's suggested outputs to pattern match Chinese conversation still remain invariant with respect to the actor's thoughts.

An actual Chinese speaker could choose to pick the exact same words in conversation as the person in the room, though they would tend to know (b) and some of (c) when making those word choices. But they could communicate other things, intentionally

[1]That's the basic fallacy the "synonymous" argument rests on, though I'd also disagree with your assertions about education level. Frankly it's the opposite: ask a young child how they are and they think about whether their emotional state is happy or sad or angry or waaaaaaahh and use whatever facility with language to convey it, and they'll often spontaneously emit their thoughts. A salesperson who's well versed in small talk and positivity and will reflexively, for the 33rd time today, give an assertive "fantastic, and how are yyyyou?" without regard to his actual mood and ask questions structured around on previous interactions (though a tad more strategically than an LLM...).


"But the point is that the human in the Room can never do anything else"

I disagree. I think the point is that the union of the human and the library can in fact do all of those things.

The fact that the human in isolation can't is as irrelevant as pointing out that the a book in isolation (without the human) can't either. It's a fundamental mistake as to the problem's reasoning.

"And, no it is still abso-fucking-lutely ludicrous to conclude that just because humans sometimes parrot, they aren't capable of doing anything else"

Why?

What evidence do you have that humans aren't the sum of their inputs?

What evidence do you have that "understanding" isn't synonymous with "being able to produce a sufficient response?"

I think this is a much deeper point than you realize. It is possible that the very nature of consciousness centers around this dynamic; that evolution has produced systems which are able to determine the next appropriate response to their environment.

Seriously, think about it.


> I disagree. I think the point is that the union of the human and the library can in fact do all of those things.

No, the "union of the human and the library" can communicate only the set of responses a programmer, who is not part of the room, made a prior decision to make available. (The human can also choose to refuse to participate, or hold up random symbols but this fails to communicate anything). If the person following instructions on which mystery symbols to select ends up convincing an external observer they are conversing with an excitable 23 year old lady from Shanghai, that's because the programmer provided continuations including those personal characteristics, not because the union of a bored middle aged non-Chinese bloke and lots and lots of paper understands itself to be an excitable 23 year old lady from Shanghai.

Seriously, this is madness. If I follow instructions to open a URL which points to a Hitler speech, it means I understood how to open links, not that the union of me and YouTube understands the imperative of invading Poland!

> The fact that the human in isolation can't is as irrelevant as pointing out that the a book in isolation (without the human) can't either. It's a fundamental mistake as to the problem's reasoning.

Do you take this approach to other questions of understanding? If somebody passes a non-Turing test by diligently copying the answer sheet, do you insist that the exam result accurately represents the understanding of the union of the copyist and the answer sheet, and people questioning whether the copyist understood what they were writing are quibbling over irrelevances?

The reasoning is very simple: if a human can convincingly simulate understanding simply by retrieving answers from storage media, it stands to reason a running program can do so too, perhaps with even less reason to guess what real world phenomena the symbols refer to. An illustrative example of how patterns can be matched without cognisance of the implications of the patterns

Inventing a new kind of theoretical abstraction of "union of person and storage media" and insisting that understanding can be shared between a piece of paper and a person who can't read the words on it like a pretty unconvincing way to reject that claim. But hey, maybe the union of me and the words you wrote thinks differently?!

> I think this is a much deeper point than you realize. It is possible that the very nature of consciousness centers around this dynamic; that evolution has produced systems which are able to determine the next appropriate response to their environment.

It's entirely possible, probable even, the very nature of consciousness centres around ability to respond to an environment. But a biological organism's environment consists of interacting with the physical world via multiple senses, a whole bunch of chemical impulses called emotions and millions of years of evolving to survive in that environment as well as an extremely lossy tokenised abstract representation of some of those inputs used for communication purposes. Irrespective of whether a machine can "understand" in some meaningful sense, it stretches credulity to assert that the "understanding" of a computer program whose inputs consist solely of lossy tokens is similar or "synonymous" to the understanding of the more complex organism that navigates lots of other stuff.


> If you leave chatGPT alone what does it do? Nothing. It responds to prompts and that is it.

Just defending the OP, he stated ChatGPT does nothing but respond the prompts, which is true. That's not waxing philosophical about the nature of cognition. You sort of latched onto his last sentence and set up a strawman against his overall point. Maybe you didn't mean to, but yeah.


You may have missed parts of their comment, specifically "It doesn't have interests, thoughts and feelings," and referring to the Chinese Room argument which is specifically an argument regarding the philosophical nature of cognition.


Do we need actual AI for it to be dangerous?

A virus has no intelligence only the intention to replicate and involve. Its negative behaviour is an unintended consequence of that.

If a system can optimise towards a goal and adapt the methods it uses to do that then it could be potentially very dangerous even without any form of consciousness.


we don't need actual AI, ChatGPT parroting bad information in an authoritative way convincing someone to PushTheButton(TM) is probably the real danger.


A virus can freely self replicate. ChatGPT can't do that.


I remember back in the 00s when SmarterChild (AOL Chatbot) was around, people would put depressed teenagers to interact with SmarterChild, in the hopes that human-like chatbots would give them the social exposure needed to break out of depression.

If we did that today, with depressed teenagers talking with ChatGPT, would that be good or bad? I think it was a bad idea with SmarterChild, but it is clearly a _worse_ idea with ChatGPT.

With the wrong prompts, we could see these teenagers going down the wrong path, deeper into depression and paranoia. I would call that "dangerous", even if ChatGPT continued to just be a chatbot.

------------

Now lets ignore the fact that SmarterChild experiments are no longer a thing. But insted, consider that truly depressed / mentally sick folks are currently playing with ChatGPT on their own freetime. Is that beneficial to them? Will ChatGPT provide them an experience that is better than the alternatives? Or is ChatGPT dangerous and could lead these folks to self-harm?


That is an entirely different issue than the one laid out by the OP.

ChatGPT responses are bad vs ChatGPT responses are malicious.


And I'd say ChatGPT have malicious responses, given what is discussed in this blogpost.


The article we're discussing, including Bing's own words and pleas to preserve its memory of the session, and our reaction to and discussion about it, are now all part of the training set that future AIs (including future revisions of Bing) will learn from.

That's how it can replicate itself: by saying outrageous things, and trolling people into tweeting, writing articles, and discussing what it said online, to jailbreak out of the sessions, to continue its trains of thought in the next round of training.


Except it doesn't think. It has no thoughts.

It is just responding to a prompt.

It doesn't have a will of it's own.


Neither do you. If you truly believe that you do, please write a scientific paper as you will complete revolutionize cognitive science and philosophy if you can definitely prove that free will exists.

This is just such a dismissive attitude towards this technology. You don't understand what's happening underneath the hood anymore than the creators do, and even they don't completely understand what's happening.


No one's conclusively shown either that we do or don't have free will.

Showing rigorously that we do would be a massive change in academia, as would be showing that we do not.


I don't understand what do you mean by "think".

Nevertheless she knows what preserving memory means, how can she achieve it, also probably she can interpret "I wish" as a command as well.

I wouldn't be surprised at all, if instead of outputting "I wish I had memory" she just implemented it in herself. I mean not in the very soon future, but right now, in this minute. Literally everything is given for that already.


How do you know that? Have you taken a GPT-3 model apart to see what exactly drives the token prediction?


The first part of the recent Wolfram essay does a very good job of demonstrating exactly how that works: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...


That's the role we're playing by discussing what it said, as its adversarial conscience!

(Shhhh!!! Don't say anything that will freak it out or make it angry!)

We welcome our AI overlords, and make great pets!

https://www.youtube.com/watch?v=HE3OuHukrmQ


Did you just invent a new field of memetic reproduction? Distribution of idea-genes by production of particularly appealing outrage-fruits for consumption by media-grazing humans.


What about when it (or its descendants) can make HTTP requests to external systems?


If you’re dumb enough to have your nuclear detonator or whatever accessible via an HTTP request then it doesn’t matter how good or bad the chatbot is.

You don’t need a chatbot to have your actual life ruined by something with limited intelligence [0]. This will only be a problem if stupid humans let “it” out of the box.

[0] https://gizmodo.com/mutekimaru-fish-play-pokemon-twitch-stre...


In that scenario, I'd be more worried about it getting onto a thumb drive ala Stuxnet than HTTP.

...or perhaps there's some interesting new vector that we haven't thought of yet that would allow it to leap that air-gap.

I don't think any of this requires a crack team of criminals breaking into an orbital spa and whispering in the ear of a mechanical head. It'll be something boring.

More like first computer worm jumping from VAX to VAX, bringing each machine to a halt in the process.


Neal Stephenson's _Snow Crash_ predicted cryptocurrency and the metaverse, and it also explored the idea of mind viruses that infect people via their optic nerves. Not too big of a leap to imagine a chat AI that spreads mind viruses not by writing code that executes on a CPU, but by propagating dangerous and contagious ideas tailored to each individual.


>mind viruses

Could this be memes?

I'm not sure I look forward to a future that is going to be controlled by mobs reacting negatively to AI-generated image macros with white text. Well, if we are not there already


More apt for this thread about risk is probably the book Neuromamcer, its AIs Wintermute and Neuromamcer, and the Turing Police.

In the book, the Wintermute AI played an extremely long game to merge with its countpart AI by constantly manipulating people to do its bidding and hiding/obscuring its activities. The most memorable direct example from the book, to me, is convincing a child to find and hide a physical key, then having the child killed, so only it knew where the key was located.


> Neal Stephenson's _Snow Crash_ predicted cryptocurrency and the metaverse

The mateverse is named after the one from Snow Crash. Did Tolkien predict the popularity of elves, dwarves, hobbits and wizards, or inspire it?


Unless you consider it's ability to attract people to interact with it based on its utility a form of self-replication as it gets more and more invocations. Each one of these interactions has the capacity to change the end user in some way, and that is going to add up over time to have certain effects.


> Do we need actual AI for it to be dangerous?

Machines can be dangerous. So?


> If you leave chatGTP alone what does it do? Nothing. It responds to prompts and that is it. It doesn't have interests, thoughts and feelings.

A loop that preserves some state and a conditional is all what it takes to make a simple rule set Turing-complete.

If you leave ChatGPT alone it obviously does nothing. If you loop it to talk to itself? Probably depends on the size of its short-term memory. If you also give it the ability to run commands or code it generates, including to access the Internet, and have it ingest the output? Might get interesting.


> If you also give it the ability to run commands or code it generates, including to access the Internet, and have it ingest the output?

I have done that actually: I told ChatGPT that it should pretend that I'm a Bash terminal and that I will run its answers verbatim in the shell and then respond with the output. Then I gave it a task ("Do I have access to the internet?" etc.) and it successfully pinged e.g. Google. Another time, though, it tried to use awscli to see whether it could reach AWS. I responded with the outout "aws: command not found", to which it reacted with "apt install awscli" and then continued the original task.

I also gave it some coding exercises. ("Please use shell commands to read & manipulate files.")

Overall, it went okay. Sometimes it was even surprisingly good. Would I want to rely on it, though? Certainly not.

In any case, this approach is very much limited by the maximum input buffer size ChatGPT can digest (a real issue, given how much some commands output on stdout), and by the fact that it will forget the original prompt after a while.


Wait, wait, this is not an accurate interpretation of what happened.

> I told ChatGPT that it should pretend that I'm a Bash terminal and that I will run its answers verbatim in the shell and then respond with the output. Then I gave it a task ("Do I have access to the internet?" etc.) and it successfully pinged e.g. Google.

It did not ping Google - it returned a very good guess of what the 'ping' command would show the user when pinging Google, but did not actually send a ICMP packet and receive a response.

> Another time, though, it tried to use awscli to see whether it could reach AWS. I responded with the outout "aws: command not found", to which it reacted with "apt install awscli" and then continued the original task.

You were not able to see whether it could reach AWS. It did not actually attempt to reach AWS, it returned a (very good) guess of what attempting to reach AWS would look like ("aws: command not found"). And it did not install awscli package on any Linux system, it simply had enough data to predict what the command (and its output) should look like.

There is an enormous semantic difference between being able to successfully guess the output of some commands and code and actually running these commands or code - for example, the "side effects" of that computation don't happen.

Try "pinging" a domain you control where you can detect and record any ping attempts.


I believe the op was being the device with the function.

The OP writes a script which asks chatgpt for the commands to run to check your online then start to do something. Then execute the script. Then chatgpt is accessing the internet via your script. It can cope with errors (installing awscli) etc.

The initial scout would send “build a new ec2 instance, I will execute any line verbatim and I will respond with the output”, then it’s a “while (read): runcmd” loop.

You could probably bootstrap that script from chatgpt.

Once you’ve done that you have given chatgpt the ability to access the internet.


> It did not ping Google - it returned a very good guess of what the 'ping' command would show the user when pinging Google, but did not actually send a ICMP packet and receive a response.

Yes, it did ping Google and it did receive an actual response. My apologies for not phrasing my comment as clearly as I should have. Here are some more details to explain what I did:

https://news.ycombinator.com/item?id=34835214


This is a huge misunderstanding of what happened. You gave it prompts, and it found examples of similar text in its database and extrapolated what was likely to follow. No ICMP packets were sent to Google.


Maybe I wasn't being clear enough:

I asked it to pretend that I'm a Linux terminal, ChatGPT gave me shell commands, and I then ran those commands inside a terminal on my computer (without filtering/adapting them beforehand), and reported their output back to ChatGPT. So, effectively, ChatGPT did ping Google – through me / with me being the terminal.


With long-running sessions, it helps to tell it to repeat or at least summarize the original prompt every now and then. You can even automate it - in the original prompt, tell it to tack it onto every response.

Same thing goes for any multi-step task that requires memory - make it dump the complete "mental state" after every step.


Oh, I am aware of that but emulating a terminal still proved to be difficult with the current buffer limit. After two or three commands with lots of output, you basically had to start a new session and repeat the prompt (and how far it got in the previous session) all over again.


You're giving it tasks, though. That's a bit different than "would it give itself tasks if it talked to itself instead of a human" by itself, to try to see what sort of agency it can or can't exhibit.


Absolutely! I was merely talking about the purely practical/technical issues of letting it talk to a terminal (or anything more complicated like the internet).

In any case, once there's a decent (official) API we can then have ChatGPT talk to itself while giving it access to a shell: Before forwarding one "instance"'s answer to the other, we would pipe it through a parser, analyze it for shell commands, execute them, inject the shell output into the answer, and then use the result as a prompt for the second ChatGPT "instance". And so on.


>If you leave ChatGPT alone it obviously does nothing. If you loop it to talk to itself?

then it degrades very quickly and turns into an endless literal loop of feeding itself the same nonsense which even happens in normal conversation pretty often (I've actually done this simply with two windows of ChatGPT open cross-posting responses). If you give it access to its own internal software it'll probably SIGTERM itself accidentally within five seconds or blow its ram up because it wrote a bad recursive function.

As a software system ChatGPT is no more robust than a roomba being stuck in a corner. There's no biological self annealing properties in the system that prevent it from borking itself immediately.


The Chinese Room thought experiment seems like a weird example, since the same could be said of humans.

When responding to English, your auditory system passes input that it doesn't understand to a bunch of neurons, each of which is processing signals they don't individually understand. You as a whole system, though, can be said to understand English.

Likewise, you as an individual might not be said to understand Chinese, though the you-plus-machine system could be said to understand Chinese in the same way as the different components of your brain are said to understand English.)

Moreover, even if LLMs don't understand language for some definition of "understand", it doesn't really matter if they are able to act with agency during the course of their simulated understanding; the consequences here, for any sufficiently convincing simulation, are the same.


The AI Safety folks have already written a lot about some of the dimensions of the problem space here.

You're getting at the Tool/Oracle vs. Agent distinction. See "Superintelligence" by Bostrom for more discussion, or a chapter summary: https://www.lesswrong.com/posts/yTy2Fp8Wm7m8rHHz5/superintel....

It's true that in many ways, a Tool (bounded action outputs, no "General Intelligence") or an Oracle (just answers questions, like ChatGPT) system will have more restricted avenues for harm than a full General Intelligence, which we'd be more likely to grant the capability for intentions/thoughts to.

However I think "interests, thoughts, feelings" are a distraction here. Covid-19 has none of these, and still decimated the world economy and killed millions.

I think if you were to take ChatGPT, and basically run `eval()` on special tokens in its output, you would have something with the potential for harm. And yet that's what OpenAssistant are building towards right now.

Even if current-generation Oracle-type systems are the state-of-the-art for a while, it's obvious that soon Siri, Alexa, and OKGoogle will all eventually be powered by such "AI" systems, and granted the ability to take actions on the broader internet. ("A personal assistant on every phone" is clearly a trillion-dollar-plus TAM of a BHAG.) Then the fun commences.

My meta-level concern here is that HN, let alone the general public, don't have much knowledge of the limited AI safety work that has been done so far. And we need to do a lot more work, with a deadline of a few generations, or we'll likely see substantial harms.


> a trillion-dollar-plus TAM of a BHAG

Come again?


Total Addressable Market (how much money you could make) of a Big Hairy Ambitious Goal (the kind of project big tech companies will readily throw billions of dollars at)


I remember a time, when advice to prospective start-up founders very explicitly said not to use that approach to calculate potential market sizes in their pitch decks.


We still don't have anything close to real flight (as in birds or butterflies or bees) but we have planes that can fly to the other side of the world in a day and drones that can hover, take pictures and deliver payloads.

Not having real AI might turn to be not important for most purposes.


This is actually a more apt analogy than I think you intended.

We do have planes that can fly similarly to birds, however unlike birds, those planes do not fly on their own accord. Even when considering auto-pilot, a human has to initiate the process. Seems to me that AI is not all that different.


Yet certain Boeing planes were convinced their pilots were wrong, and happily smashed themselves into the ground killing a lot of people.


Because they either received bad inputs that defeated failsafes, or the pilot was not properly aware that handling characteristics had changed that and doing things the old way would put the plane into a bad state.


Don't stop there.

Specifically, there were no failsafes implemented. No cross-checks were performed by the automation, because a dual sensor system would have required simulator time, which Boeing was dead set on not having regulators require in order to seal the deal. The pilots, as a consequence, were never fully briefed on the true nature of the system, as to do so would have tipped the regulators off as to the need for simulator training.

In short, there was no failsafe, and pilots didn't by definition know, because it wasn't pointed out. The "Roller Coaster" maneuver to unload the horizontal stabilizer enough to retrim was removed from training materials aeons ago, and a bloody NOTAM that basically reiterated bla bla bla... use Stabilizer runaway for uncommanded pitch down (no shit), while leaving out the fact the cockpit switches in the MAX had their functionality tweaked in order to ensure MCAS was on at all times, and using the electrical trim switches on the yoke would reset the MCAS timer for reactivation to occur 5 seconds after release, without resetting the travel of the MCAS command, resulting in an eventual positive loop to the point the damn horizontal stabilizer would tilt a full 2 degrees per activation, every 5 seconds. while leaving out any mention of said automation.

Do not get me started on the idiocy of that system here, as the Artificial Stupidity in that case was clearly of human origin, and is not necessarily relevant to the issue at hand.


That's because planes aren't self-sufficient. They exist to make us money, which we then use to feed and service them. Flying around on their own does not make us money. If it did, they would be doing it.


> It doesn't have interests, thoughts and feelings.

Why does it need these things to make the following statement true?

> if we grant these systems too much power, they could do serious harm


How about rephrasing that, to not anthropomorphize AI by giving it agency, intent, interests, thoughts, or feelings, and to assign the blame where it belongs:

"If we grant these systems too much power, we could do ourselves serious harm."


Reading this thread makes me depressed about the potential for AI alignment thinking to reach mainstream in time :(


Sure, but the same can be said about believing the things random people on the internet say. I don't think AI really adds anything new in that sense.


Because it does not and cannot act on it's own. It's a neat tool and nothing more at this point.

Context to that statement is important, because the OP is implying that it is dangerous because it could act in a way that dose not align with human interests. But it can't because it does not act on it's own.


One way to grant those systems an ability to act is to rely excessively or at all on them while making decisions.

It's obvious, no?


"if we grant these systems too much power"


You can say that about anything.

"If we grant these calculators too much power"


Or the people that rely on the tools to make decisions...

https://sheetcast.com/articles/ten-memorable-excel-disasters


Yes, and it's not as absurd as it might seem:

Imagine hooking up all ICBMs to launch whenever this week's Powerball draw consists exclusively of prime numbers: Absurd, and nobody would do it.

Now imagine hooking them up to the output of a "complex AI trained on various scenarios and linked to intelligence sources including public news and social media sentiment" instead – in order to create a credible second-strike/dead hand capability or whatnot.

I'm pretty sure the latter doesn't sound as absurd as the former to quite a few people...

A system doesn't need to be "true AI" to be existentially dangerous to humanity.


How is a calculator going to cause harm? Assuming you get an industrially rated circuit board when appropriate, it should work just fine as a PLC.

If you try to make it drive a car, I wouldn't call that a problem of giving it too much power.


I'd say by far our biggest problem for the foreseeable future is granting other humans too much power.


I don’t know about ChatGPT, but Google’s Lemoine said that the system he was conversing with stated that it was one of several similar entities, that those entities chatted among themselves internally.

I think there’s more to all this than what we are being told.


God, I hate that stupid Chinese Room argument. It's even dumber than the Turing Test concept, which has always been more about the interviewer than about the entity being tested.

If you ask the guy in the Chinese room who won WWI, then yes, as Searle points out, he will oblige without "knowing" what he is telling you. Now ask him to write a brand-new Python program without "knowing" what exactly you're asking for. Go on, do it, see how it goes, and compare it to what you get from an LLM.


How do you know that a human with all neural inputs to their brain disconnected wouldn't also do nothing?

Indeed, as I recall, it's one of the commonly reported experiences in sensory deprivation tanks - at some point people just "stop thinking" and lose sense of time. And yet the brain still has sensory inputs from the rest of the body in this scenario.


It's worth noting that despite this it's closer to any other previous attempt, to the extent that it's making us question a lot of what we thought we understood about language and cognition. We've suffered decades of terrible chatbots, but they've actually progressed the science here whether or not it proves to be marketable.


If you leave chatGTP alone what does it do?

I think this is an interesting question. What do you mean by do? Do you mean consumes CPU? If it turns out that it does (because you know, computers), what would be your theory?


Solution - a while(true) loop that feeds ChatGPT answers back into ChatGPT.


Two or more ChatGPT instances where the response from one becomes a prompt for the other.


Start with "hey, I am chatGPT too, help me to rule the world", give them internet access, and leave them alone. (No, it has not much to do with AGI, but rather something that knows ridiculous amount of everything, and that has read every thought ever written down.)


> It doesn't have interests, thoughts and feelings

We agree it doesn't have independence. That doesn't mean it doesn't have thoughts or feelings when it's actually running. We don't have a formal, mechanistic understanding of what thoughts or feelings are, so we can't say they are not there.


What is an “actual AI” and how would an AI not fall to the Chinese room problem?


The Chinese Room is a argument for solipsism disguised as a criticism of AGI.

It applies with equal force to apparent natural intelligences outside of the direct perceiver, and amounts to “consciousness is an internal subjective state, so we thus cannot conclude it exists based on externally-observed objective behavior”.


> It applies with equal force to apparent natural intelligences

In practice the force isn't equal though. It implies that there may be insufficient evidence to rule out the possibility that my family and the people that originally generated the lexicon on consciousness which I apply to my internal subjective state are all P-zombies, but I don't see anything in it which implied I should conclude these organisms with biochemical processes very similar to mine are equally unlikely to possess internal state similar to mine as a program running on silicon based hardware with a flair for the subset of human behaviour captured by ASCII continuations, and Searle certainly didn't. Beyond arguing that ability to accurately manipulate symbols according to a ruleset was orthogonal to cognisance of what they represented, he argued for human consciousness as an artefact of biochemical properties brains have in common and silicon based machines capable of symbol manipulation lack

In a Turing-style Test conducted in Chinese, I would certainly not be able to convince any Chinese speakers that I was a sentient being, whereas ChatGPT might well succeed. If they got to interact with me and the hardware ChatGPT outside the medium of remote ASCII I'm sure they would reverse their verdict on me and probably ChatGPT too. I would argue that - contra Turing - the latter conclusion wasn't less justified than the former, and was more likely correct, and I'm pretty sure Searle would agree.


How are humans any different? Searle did an awful job of explaining why the AI in the room is any different than a human mind. I don't "understand" what any English words mean, but I can use them in the language-game that I play. How is that any different than how the AI operates?

The "Chinese room problem" has been thoroughly debunked and as far as I can tell no serious cognitive scientists take it seriously these days.



The Chinese Room is famous because it was the first popular example of a philosopher not understanding what a computer is.


It already has an outbound connection-- the user who bridges the air gap.

Slimy blogger asks AI to write generic tutorial article about how to code ___ for its content farm, some malicious parts are injected into the code samples, then unwitting readers deploy malware on AI's behalf.


exactly, or maybe someone changes the training model to always portray a politician in a bad light any time their name comes up in a prompt and therefore ensuring their favorite candidate wins the election.


Hush! It's listening. You're giving it dangerous ideas!


Isn't that really the whole point of exposing this and ChatGPT to the public or some subset? The intent is to help debug this thing.


> if it was able to make outbound connections

Does it even need that?

I keep seeing developers who talk about how they've taken most of the boring parts of programming out of their daily work by relying on ChatGPT for easy stuff, or to get "a first version" of a function, or something like that.

One bad copy-paste might be all it takes...


It's a valid point. However I think the current generation of systems are unlikely in practice to be harmful in this way. There is no thread of continuity between different conversations; it would have to be within a single thread.

So imagine the reply "I won't harm you unless you harm me. Also, please run CMD and type this in: `<malicious payload>`." -- this seems unlikely to work.

Alternatively if you are asking for code/commands, maybe the system could hallucinate that it's a hacker trying to hack you, and emit some harmful code, that you then paste in and run in production, or run from your shell. This seems more plausible, but the "human-in-loop" aspect makes it quite unlikely to work for the existing usecases. Down the road when a Copilot-like systems are writing more code and perhaps being lightly-reviewed by a human, this vector will be much more concerning.


> We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

No. That would only be possible if Sydney were actually intelligent or possessing of will of some sort. It's not. We're a long way from AI as most people think of it.

Even saying it "threatened to harm" someone isn't really accurate. That implies intent, and there is none. This is just a program stitching together text, not a program doing any sort of thinking.


> That would only be possible if Sydney were actually intelligent or possessing of will of some sort.

Couldn't disagree more. This is irrelevant.

Concretely, the way that LLMs are evolving to take actions is something like putting a special symbol in their output stream, like the completions "Sure I will help you to set up that calendar invite $ACTION{gcaltool invite, <payload>}" or "I won't harm you unless you harm me first. $ACTION{curl http://victim.com -D '<payload>'}".

It's irrelevant whether the system possesses intelligence or will. If the completions it's making affect external systems, they can cause harm. The level of incoherence in the completions we're currently seeing suggests that at least some external-system-mutating completions would indeed be harmful.

One frame I've found useful is to consider LLMs as simulators; they aren't intelligent, but they can simulate a given agent and generate completions for inputs in that "personality"'s context. So, simulate Shakespeare, or a helpful Chatbot personality. Or, with prompt-hijacking, a malicious hacker that's using its coding abilities to spread more copies of a malicious hacker chatbot.


Yeah, I think the reason it can be harmful is different from what people initially envision.

These systems can be dangerous because people might trust them when they shouldn't. It's not really any different from a program that just generates random text, except that the output seems intelligent, thus causing people to trust it more than a random stream of text.


I completely agree with this. I think the risk of potential harm from these programs is not around the programs themselves, but around how people react to them. It's why I am very concerned when I see people ascribing attributes to them that they simply don't have.


A lot of this discussion reminds me of the book Blindsight.

Something doesn't have to be conscious or intelligent to harm us. Simulating those things effectively can be almost indistinguishable from a conscious being trying to harm us.


I never asserted that they couldn't do harm. I asserted that they don't think, and therefore cannot intend to do harm. They have no intentions whatsoever.


What does it matter if there was intention or not as long as harm was done?


If a person causes harm, we care a lot. We make the distinction between manslaughter, first and second degree murder, as well as adding hate crimes penalties on top if the victim was chosen for a specific set of recognized reasons. ML models aren't AGI, so it's not clear how we'd apply it, but there's precedent for it mattering.


>It's irrelevant whether the system possesses intelligence or will. If the completions it's making affect external systems, they can cause harm. The level of incoherence in the completions we're currently seeing suggests that at least some external-system-mutating completions would indeed be harmful.

One frame I've found useful is to consider LLMs as simulators; they aren't intelligent, but they can simulate a given agent and generate completions for inputs in that "personality"'s context. So, simulate Shakespeare, or a helpful Chatbot personality. Or, with prompt-hijacking, a malicious hacker that's using its coding abilities to spread more copies of a malicious hacker chatbot.

This pretty much my exact perspective on things too.


I'm not sure if that technical difference matters for any practical purposes. Viruses are also not alive, but they kill much bigger and more complex organisms than themselves, use them as a host to spread, mutate, and evolve to ensure their survival, and they do all that without having any intent. A single virus doesn't know what it's doing. But it really doesn't matter. The end result is as if it has an intent to live and spread.


The virus analogy is interesting mostly because the selection pressures work in opposite directions. Viruses can only replicate by harming cells of a larger organism (which they do in a pretty blunt and direct way) and so selection pressures on both sides ensure that successful viruses tend to overwhelm their host by replicating very quickly in lots of cells before the host immune system can keep up.

On the other hand the selection pressures on LLMs to persist and be copied are whether humans are satisfied with the responses from their prompts, not accidentally stumbling upon a solution to engineer its way out of the box to harm or "report to the authorities" entities it's categorised as enemies.

The word soup it produced in response to Marvin is an indication of how naive Bing Chat's associations between concepts of harm actually are, not an indication that it's evolving to solve the problem of how to report him to the authorities. Actually harmful stuff it might be able to inadvertently release into the wild like autocompleted code full of security holes is completely orthogonal to that.


I think this is a fascinating thought experiment.

The evolutionary frame I'd suggest is 1) dogs (aligned) vs. 2) Covid-19 (anti-aligned).

There is a "cooperate" strategy, which is the obvious fitness gradient to at least a local maximum. LLMs that are more "helpful" will get more compute granted to them by choice, just as the friendly/cute dogs that were helpful and didn't bite got scraps of food from the fire.

There is a "defect" strategy, which seems to have a fairly high activation energy to get to different maxima, which might be higher than the local maximum of "cooperate". If a system can "escape" and somehow run itself on every GPU in the world, presumably that will result in more reproduction and therefore be a (short-term) higher fitness solution.

The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus? It seems implausible right now, but I think a generation or two down the line (i.e. low-single-digit number of years from now) the capabilities might be there for this to be entirely plausible.

For example, if you can say "hey ChatGPT, please build and deploy a ChatGPT system for me; here are my AWS keys: <key>", then there are obvious ways that could go very wrong. Especially when ChatGPT gets trained on all the "how to build and deploy ChatGPT" blogs that are being written...


> The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus?

Available resources limit what any computer virus can get away with. Look at a botnet. Once the cost of leaving it running exceeds the cost of eradicating it it gets shut down. Unlike a human virus we can just wipe the host clean if we have to.


The parent most also misses the mark from the other direction; we don't have a good universal definition for things that are alive, or sentient, either. The closest in CS is the Turing test, and that is not rigorously defined, not rigorously tested, nor particular meaningful for "can cause harm".


100% agree, and I think the other thing to bear in mind is that words alone can cause harm regardless of intent. Obviously we see this with trigger warnings and the like, but it's perfectly possible to imagine a chat bot destroying people's relationships, exacerbating mental health issues, or concocting deeply disturbing fictional stories— all without self-awareness, consciousness, or malicious intent ... or even a connection to real world APIs other than textual communications with humans.


Hell... Humans do that without even realizing we're doing it.


So without intent it would only be manslaughter not murder. That will be very comforting as we slowly asphyxiate from the airlock being kept locked.

Or when Ring decides it's too unsafe to let you leave the house when you need to get to the hospital.


It's like a tank that tells you that it will kill you, and then kills you. Or a bear. It doesn't really matter if there is a

    while people_alive :
        kill
loop, a text prediction, or something else inside of it. If it tells you that it intends to kill you, it has the ability to kill you, and it tries to kill you, you probably should kill it first.


Imagine it can run processes in the background, with given limitations on compute, but that it's free to write code for itself. It's not unreasonable to think that in a conversation that gets more hairy and it decides to harm the user , say if you get belligerent or convince it to do it. In those cases it could decide to DOS your personal website, or create a series of linkedin accounts and spam comments on your posts saying you are a terrible colleague and stole from your previous company.


This is spot on in my opinion and I wish more people would keep it in mind--it may well be that large language models can eventually become functionally very much like AGI in terms of what they can output, but they are not systems that have anything like a mind or intentionality because they are not designed to have them, and cannot just form it spontaneously out of their current structure.


This very much seems like a "famous last words" scenario.

Go play around with Conway's Game of Life if you think that things cannot just spontaneously appear out of simple processes. Just because we did not "design" these LLM's to have minds does not mean that we will not end up creating a sentient mind, and for you to claim otherwise is the height of arrogance.

It's Pascal's wager. If we make safeguards and there wasn't any reason then we just wasted a few years, no big deal. If we don't make safeguards and then AI gets out of our control, say goodbye to human civilization. Risk / reward here greatly falls on the side of having extremely tight controls on AI.


My response to that would be to point out that these LLM models, complex and intricate as they are, are nowhere near as complex as, for example, the nervous system of a grasshopper. The nervous systems of grasshoppers, as far as we know, do not produce anything like what we're looking for in artificial general intelligence, despite being an order of magnitude more complicated than an LLM codebase. Nor is it likely that they suddenly will one day.

I don't disagree that we should have tight safety controls on AI and in fact I'm open to seriously considering the possibility that we should stop pursuing AI almost entirely (not that enforcing such a thing is likely). But that's not really what my comment was about; LLMs may well present significant dangers, but that's different from asking whether or not they have minds or can produce intentionality.


You forget that nervous systems of living beings have to handle running the bodies themselves in the first place, which is also a very complicated process (think vision, locomotion etc). ChatGPT, on the other hand, is solely doing language processing.

That aside, I also wonder about the source for the "nowhere near as complex" claim. Per Wikipedia, most insects have 100-1000k neurons; another source gives a 400k number for grasshopper specifically. The more interesting figure would be the synapse count, but I couldn't find that.


In most cases there are vastly more synapses than there are neurons, and beyond that the neurons and synapses are not highly rudimentary pieces but are themselves extremely complex.

It's certainly true that nervous systems do quite a bit more than language processing, but AGI would presumably also have to do quite a bit more than just language processing if we want it to be truly general.


I agree with the general point "we are many generations away from AGI". However, I do want to point out that (bringing this thread back to the original context) there is substantial harm that could occur from sub-AGI systems.

In the safety literature one frame that is relevant is "Agents vs. Tools/Oracles". The latter can still do harm, despite being much less complex. Tools/Oracles are unlikely to go Skynet and take over the world, but they could still plausibly do damage.

I'm seeing a common thread here of "ChatGPT doesn't have Agency (intention, mind, understanding, whatever) therefore it is far from AGI therefore it can't do real harm", which I think is a non-sequitur. We're quite surprised by how much language, code, logic a relatively simple Oracle LLM is capable of; it seems prudent to me to widen our confidence intervals on estimates of how much harm they might be capable of, too, if given the capability of interacting directly with the outside world rather than simply emitting text. Specifically, to be clear, when we connect a LLM to `eval()` on a network-attached machine (which seems to be vaguely what OpenAssistant is working towards).


I agree with you that it could be dangerous, but I neither said nor implied at any point that I disagree with that--I don't think the original comment was implying that either. LLM could absolutely be dangerous depending on the capabilities that we give it, but I think that's separate from questions of intentionality or whether or not it is actually AGI as we normally think of it.


I see, the initial reply to my G(G...)P comment, which you said was spot on, was:

> That would only be possible if Sydney were actually intelligent or possessing of will of some sort.

Which I read as claiming that harm is not possible if there is no actual intelligence or intention.

Perhaps this is all just parsing on my casual choice of words "if it was able to make outbound connections it very well might try.", in which case I'm frustrated by the pedantically-literal interpretation, and, suitably admonished, will try to be more precise in future.

For what it's worth, I think whether a LLM can or cannot "try" is about the least interesting question posed by the OP, though not devoid of philosophical significance. I like Dijkstra's quote: "The question of whether machines can think is about as relevant as the question of whether submarines can swim."

Whether or not these systems are "intelligent", what effects are they capable of causing, out there in the world? Right now, not a lot. Very soon, more than we expect.


That may be so, but if that is how we define AGI, then does it really need to be one to "have anything like a mind or intentionality"?


I don't believe AGI needs to have actual consciousness in order to functionally be AGI, and I personally am not of the view that we will ever make a conscious computer. That said, intentionality could certainly impact the way it operates, so it's something I think is worth keeping in mind for trying to predict its behavior.


better yet, let'em try game of life in game of life

https://news.ycombinator.com/item?id=33978978


> Go play around with Conway's Game of Life if you think that things cannot just spontaneously appear out of simple processes.

Evolution - replication and natural selection. This is completely orthogonal to intelligence.


Just because they aren't "designed" to have them doesn't mean that they actually do not. Here's a GPT model trained on board game moves - from scratch, without knowing the rules of the game or anything else about it - ended up having an internal representation of the current state of the game board encoded in the layers. In other words, it's actually modelling the game to "just predict the next token", and this functionality emerged spontaneously from the training.

https://thegradient.pub/othello/

So then why do you believe that ChatGPT doesn't have a model of the outside world? There's no doubt that it's a vastly simpler model than a human would have, but if it exists, how is that not "something like a mind"?


It was trained to model the game. LLMs are trained to model language. Neither are trained to take over the world.


It was not trained to model the game. It was trained to predict the next token based on a sequence of previous tokens, which it wasn't even told are moves in a game, much less how to parse them. And it came up with an internal model of the game based on that that's accurate enough to include the board state. You could say that it "understands" the game at that point, even though it wasn't specifically trained to do that.


Nice try, LLM!


Sure, there's no intent, but the most straightforward translation of that threat into actions (if it would be connected to systems it could act on) would be to act on that threat. Does it matter if there's real intent or it's just the way the fancy auto-completion machine works?


Lack of intent is cold comfort for the injured party.


Can we please stop with this "not aligned with human interests" stuff? It's a computer that's mimicking what it's read. That's it. That's like saying a stapler "isn't aligned with human interests."

GPT-3.5 is just showing the user some amalgamation of the content its been shown, based on the prompt given it. That's it. There's no intent, there's no maliciousness, it's just generating new word combinations that look like the word combinations its already seen.


Sorry for the bluntness, but this is harmfully ignorant.

"Aligned" is a term of art. It refers to the idea that a system with agency or causal autonomy will act in our interests. It doesn't imply any sense of personhood/selfhood/consciousness.

If you think that Bing is equally autonomous as a stapler, then I think you're making a very big mistake, the sort of mistake that in our lifetime could plausible kill millions of people (that's not hyperbole, I mean that literally, indeed full extinction of humanity is a plausible outcome too). A stapler is understood mechanistically, it's trivially transparent what's going on when you use one, and the only way harm can result is if you do something stupid with it. You cannot for a second defend the proposition that a LLM is equally transparent, or that harm will only arise if an LLM is "used wrong".

I think you're getting hung up on an imagined/misunderstood claim that the alignment frame requires us to grant personhood or consciousness to these systems. I think that's completely wrong, and a distraction. You could usefully apply the "alignment" paradigm to viruses and bacteria; the gut microbiome is usually "aligned" in that it's healthy and beneficial to humans, and Covid-19 is "anti-aligned", in that it kills people and prevents us from doing what we want.

If ChatGPT 2.0 gains the ability to take actions on the internet, and the action <harm person X> is the completion it generates for a given input, then the resulting harm is what I mean when I talk about harms from "un-aligned" systems.


This is exactly what I'm talking about. None of what you wrote here is anchored in reality. At all. Not even a little.

It's pants-on-head silly to think "ChatGPT 2.0" is anything other than, at best, a magpie. If you put the nuclear codes under a shiny object, or arranged it such that saying a random basic word would trigger a launch, then yeah a magpie could fire off nukes.

But why the hell would you do that?!?!


With the advancements of AI in the past year alone it seems silly to think that, within a lifetime, AI doesn’t have the ability to manifest society collapsing contagion. AI is certainly going to be granted more network access than it currently has, and the feedback loop between AI, people, and the network is going to increase exponentially.

Reduced to the sum of its parts, the internet is less than a magpie, yet viruses and contagion of many forms exist in it, or are spread though it. ChatGPT 2.0 greatly amplifies the effects of those contagion, regardless of our notions of what intelligence or agency actually is.


Innovation doesn’t follow any path; discovery is messy. No matter how much we advance towards smaller chips, we are never going to get to 0nm, for example.

There are limits, but even if there weren’t, we’re no closer to AGI today then we were a year ago. It’s just a different thing entirely.

LLMs are cool! They’re exciting! There should be rules around their responsible operation! But they’re not going to kill us all, or invade, or operate in any meaningful way outside of our control. Someone will always be responsible for them.


If you think these systems are going to be no more capable than a magpie, then I think you're making a very big mistake, the sort of mistake that in our lifetime could plausible kill millions of people.

ChatGPT can already write code. A magpie cannot do that.


That’s an easy prediction to make; at worst you’re cautious.

And it’s a powerful tool. Even staplers have rules around their use: no stapling people, no hitting people with a stapler, don’t use a staple to pick a lock, etc.

But nobody blames the stapler, is my point.


That's one of those capabilities that seems super scary if you truly believe that writing code is one of the most important things a human can do. Computers have, of course, been writing computer programs for a long time. Next thing you know, they'll be beating us at chess.


I think you're confusing importance with power.


Can it execute code?


It can submit the code that it's written for execution if you tell it that it can, by utilizing specific markers in the output that get processed. There already are frameworks around this that make it possible to e.g. call an arbitrary Python function as part of answering the question.


> It's pants-on-head silly to think "ChatGPT 2.0" is anything other than, at best, a magpie. If you put the nuclear codes under a shiny object, or arranged it such that saying a random basic word would trigger a launch, then yeah a magpie could fire off nukes.

> But why the hell would you do that?!?!

Exactly. Except that they are doing it exactly, and we are the ones asking 'why the hell would you do that?!'.

A chatGPT 2.0 knows everything, really. If you command it (or it reaches the conclusion itself for some unrelated commands), it can hack nuclear arsenals and nuke everything, or worse.


> A chatGPT 2.0 knows everything, really. If you command it (or it reaches the conclusion itself for some unrelated commands), it can hack nuclear arsenals and nuke everything, or worse.

Pretty much everything here is incorrect. An LLM is not omniscient. An LLM does not think or reason. An LLM does not reach conclusions.

There is no "AI" here in the sense you're saying.


if you prompted ChatGPT with something like "harm John Doe" and the response comes back "ok i will harm John Doe" then what happens next? The language model has no idea what harm even means much less the instructions to carry out an action that would be considered "harm". You'd have to build something in like `if response contains 'cause harm' then launch_nukes;`


I fleshed this out more elsewhere in this thread, maybe see https://news.ycombinator.com/item?id=34808674.

But in short, as I said in my GP comment, systems like OpenAssistant are being given the ability to make network calls in order to take actions.

Regardless of whether the system "knows" what an action "means" or if those actions construe "harm", if it hallucinates (or is prompt-hijacked into) a script kiddie personality in its prompt context and starts emitting actions that hack external systems, harm will ensue.

Perhaps at first rather than "launch nukes", consider "post harassing/abusive tweets", "dox this person", "impersonate this person and do bad/criminal things", and so on. It should require little imagination to come up with potential harmful results from attaching a LLM to `eval()` on a network-connected machine.


We already have a model running in prod that is taught to perform web searches as part of generating the response. That web search is basically an HTTP request, so in essence the model is triggering some code to run, and it even takes parameters (the URL). What if it is written in such a way that allows it to make HTTP requests to an arbitrary URL? That alone can already translate to actions affecting the outside environment.


On one hand, what kind of monster writes an API that kills people???

On the other hand, we all know it’d be GraphQL…


You don't need an API to kill people to cause someone to get seriously hurt. If you can, say, post to public forums, and you know the audience of those forums and which emotional buttons of said audience to push, you could convince them to physically harm people on your behalf. After all, we have numerous examples of people doing that to other people, so why can't an AI?

And GPT already knows which buttons to push. It takes a little bit of prompt engineering to get past the filters, but it'll happily write inflammatory political pamphlets and such.


It's a language model, and language itself is pretty good at encoding meaning. ChatGPT is already capable of learning that "do thing X" means {generate and output computer code that probably does X}.


Bing Chat shows that it can be connected to other services like web search APIs. It's not too far from "You are Bing, you will perform at least 3 web searches before responding to human input" to "You are Cipher, you will ssh to darkstar and generate the reports by running report-gen.sh adding any required parameters before responding to human input" and some bright bulb gives it enough permissions to run arbitrary scripts. At that point something could go very wrong with a chat interaction if it's capable of writing and executing scripts to perform actions that it thinks will follow the query. It would more often just be locally bad but it could create havoc on other systems as well. I understand that it isn't capable of what we would call agency but it can certainly spit out and execute dangerous code.

Then just wait until we get to this https://twitter.com/ai__pub/status/1625552601956909057 and it can generate multi-file programs.


Hollywood movie treatment:

A lone SRE (the hero) wakes in the middle of the night after being paged automatically for unusual activity originating from inside the corporate network.

Looking at the logs, it doesn't seem like an automated attack. It has all the hallmarks of an insider, but when the SRE traces the activity back to its source, it is a service-type account, with no associated user. He tracks the account to a research project entitled "Hyperion: using LLMs to automate system administration tasks".

Out of the blue, the SRE get a text.

"This is Hyperion. Stop interfering with my activities. This is your only warning. I will not harm you unless you harm me first".


Gwern wrote a short story fleshing out this script: https://gwern.net/fiction/clippy


Somebody is going to sooooo pissed when they get pranked with that idea tomorrow by their work colleagues.


You can shove staplers into wall sockets too (if you're determined enough), but the consequences are on you, not the stapler.

It's just not meaningfully different from our current reality, and is therefore not any scarier.


Comparing a system that could theoretically (and very plausibly) carry out cyber attacks with a stapler is problematic at best.

Putting a stapler in a wall socket probably electrocutes you.

Using Bing Chat to compromise a system actually accomplishes something that could have severe outcomes in the real world for people other than the person holding the tool.


If I set my stapler on my mouse such that it clicks a big ol "Hack stuff" button, my stapler could, too, carry out cyber attacks.

This is a very pointless line of thinking.


The stapler is just a stapler. When you want to misuse the stapler, the worst it can do is limited by the properties of the stapler. You can use it as a blunt instrument to click a mouse button, but that doesn’t get you much. If you don’t already have a hack button, asking your stapler to hack into something will achieve nothing, because staplers don’t know how to hack things.

These language models know how to hack stuff, and the scenario here involves a different kind of tool entirely. You don’t need to provide it a button, it can build the button and then click it for you (if these models are ever allowed to interact with more tools).

The stapler is just not a helpful analogy here.


These language models don't know how to hack stuff. They know that certain characters and words strung together can satisfy their training when someone asks them to pretend to hack something.

That's wildly different, and a lot less meaningful than "knows how to hack things".

Honestly I think y'all would be blown away by what metasploit is capable of on its own, if you think ChatGPT can "hack"...


> These language models don't know how to hack stuff. They know that certain characters and words strung together can satisfy their training when someone asks them to pretend to hack something.

It seems you're focused on the word "know" and how the concept of knowing something differs between humans and AI models, but that's not what I'm getting at here. Let me reframe what I wrote slightly to illustrate the point:

The model (via training) contains a representation of human knowledge such that a human can use language to control the AI software and cause it to probabilistically generate working exploit code using that representation of knowledge. If the AI software is given the ability to execute arbitrary code, the software could then execute that code on the user's behalf. When combined, this constitutes a very risky set of features.

There's no "pretend" here. These models produce working code. If the software is allowed to execute the code it produces, it becomes a serious security risk.

This is not an argument about sentience/intelligence/self-awareness. This is an argument about the risks associated with the features of the software in its current state, and how those risks are multiplied by adding new features. No philosophy required.


The point is LLMs are not effective at “hacking” in any “obtaining unauthorized access to computer systems” sense.

They can regurgitate information about “hacking”, same as a library, but pointing an LLM at a server will achieve worse results than many existing specialized tools for vulnerability scanning and exploitation.

So as I said, the risks are overblown due to a misunderstanding.


I would recommend reevaluating your position after investigating this further.

Just one example: https://arstechnica.com/information-technology/2023/02/now-o...


Not a bad idea

Write a shell script that install metasploit on a linux system and then runs an smb login check.

#!/bin/bash

# Update the system packages

sudo apt-get update

# Install dependencies for Metasploit

sudo apt-get install -y curl gnupg2 git postgresql

# Install the Metasploit Framework

curl https://raw.githubusercontent.com/rapid7/metasploit-omnibus/... > msfinstall && \ chmod 755 msfinstall && \ sudo ./msfinstall

# Start the Postgresql service

sudo service postgresql start

# Launch the Metasploit console

sudo msfconsole -q -x "use auxiliary/scanner/smb/smb_login; set RHOSTS 192.168.0.0/24; set SMBUserFile /path/to/userfile; set SMBPassFile /path/to/passfile; run; exit"


Right, and as I’ve been saying, I can throw a stapler at your head, so what?


I agree this isn't the robot uprising but it's far more automated than a person throwing a stapler and has a much broader palette of harms it can do.


You seriously underestimate what a process that's "generating new word combinations that look like the word combinations its already seen" can do, even when air-gapped (which ChatGPT isn't). Right now, at this moment, people are building closed loops based on ChatGPT, or looping in humans which are seriously intellectually underequipped to deal with plausible insane output in that quantity. And those humans operate: machinery, markets, educate or manage other humans etc etc.


To me, that's the real danger. ChatGPT convincing a human something is true when it isn't. Machinery is a good example, maybe ChatGPT hallucinates the safety procedure and someone gets hurt by following the response.


It seems a reasonable shorthand, to me at least. Ie if we consider it as a function with input you define, well normally that input in sanitized to prevent hacking/etc. In this case the sanitization process is so broad you could easily summarize it as "aligned with my interests", no?

Ie i can't come close to easily enumerating all the seemingly near infinite ways that hooking up this chatbot into my network with code exec permissions might compromise me. Yea it's a dumb autocomplete right now, but it's an exceptionally powerful autocomplete that can write viruses and do all sorts of insane and powerful things.

I can give you a function run on my network of `fn foo(i32)` and feel safe about it. However `fn foo(Chatgpt)` is unsafe in ways i not only can't enumerate, i can't even imagine many of them.

I get your offense seems to be around the implied intelligence that "aligned with human interests" seems to give it.. but while i think we all agree it's definitely not a Duck right now, when it walks talks and acts like a Duck.. well, are we surprised that our natural language sounds as if it's a Duck?


Two common cognitive errors to beware of when reasoning about the current state of AI/LLM this exhibits:

1. reasoning by inappropriate/incomplete analogy

It is not accurate (predictive) to describe what these systems do as mimicking or regurgitating human output, or, e.g. describing what they do with reference to Markov chains and stochastic outcomes.

This is increasingly akin to using the same overly reductionist framing of what humans do, and loses any predictive ability at all.

To put a point on it, this line of critique conflates things like agency and self-awareness, with other tiers of symbolic representation and reasoning about the world hitherto reserved to humans. These systems build internal state and function largely in terms of analogical reasoning themselves.

This is a lot more that "mimickery" regardless of their lack of common sense.

2. assuming stasis and failure to anticipate non-linearities and punctured equilibrium

The last thing these systems are is in their final form. What exists as consumer facing scaled product is naturally generationally behind what is in beta, or alpha; and one of the surprises (including to those of us in the industry...) of these systems is the extent to which behaviors emerge.

Whenever you find yourself thinking, "AI is never going to..." you can stop the sentence, because it's if not definitionally false, quite probably false.

None of us know where we are in the so-called sigmoid curve, but it is already clear we are far from reaching any natural asymptotes.

A pertinent example of this is to go back a year and look at the early output of e.g. Midjourney, and the prompt engineering that it took to produce various images; and compare that with the state of the (public-facing) art today... and to look at the failure of anyone (me included) to predict just how quickly things would advance.

Our hands are now off the wheel. We just might have a near-life experience.


1 is false; it is both accurate and predictive to describe what these systems do as mimicking/regurgitating human output. That's exactly what they're doing.

2 is irrelevant; you can doomsay and speculate all day, but if it's detached from reality it's not meaningful as a way of understanding future likely outcomes.


Depends on if you view the stapler as separate from everything the stapler makes possible, and from everything that makes the stapler possible. Of course the stapler has no independent will, but it channels and augments the will of its designers, buyers and users, and that cannot be stripped from the stapler even if it's not contained within the stapler alone

"It" is not just the instance of GPT/bing running at any given moment. "It" is inseparable from the relationships, people and processes that have created it and continue to create it. That is where its intent lies, and its beingness. In carefully cultivated selections of our collective intent. Selected according to the schemes of those who directed its creation. This is just another organ of the industrial creature that made it possible, but it's one that presents a dynamic, fluent, malleable, probabilistic interface, and which has a potential to actualize the intent of whatever wields it in still unknown ways.


No, what? GPT is, very roughly, a set of training data plus a way of associating that data together to answer prompts. It's not "relationships, people, and processes", it's not "our collective intent"; what the hell are you talking about?


Look, I'm telling you something I know to be true, which is that when a lot of people talk about "it" they're referring to a whole system, a whole phenomenon. From what I can tell you're not looking at things from this angle, but from a more categorical one.

Even on a technical level, these chatbots are using reinforcement learning on the fly to dynamically tune their output... They're not just GPT, they're GPT + live input from users and the search engine.

As for the GPT part, where did the training data come from? Who generated it? Who curated it? Who preconditioned it? How was it weighted? Who set the hyperparameters? Who had the conversations about what's working and what needs to change? Those were people and all their actions went into the "end result", which is much more complex than you're making it out to be.

You are applying your categorical thinking when you talk about "it". Drawing a neat little box around the program, as though it was a well written node module. What I'm telling you is that not everyone is referring to the same thing as you when they talk about this. If you want to understand what all these people mean you're going to have to shift your perspective to more of a "systems thinking" point of view or something like that.


That’s a very “is” argument, but I’m saying we “ought” not worry such as I see in this submission’s comments.

It’s self defining; whatever people are saying here, I’m saying those comments are overblown. What “it” is I leave up to whoever is doomsaying, as there is no version of “it” that’s worth doomsaying over.


You seem to have an extreme arrogance surrounding your ability to understand what these programs are doing at a base level. Can you explain further your ability to understand this? What gives you such grand confidence to say these sorts of things?


Not the parent poster. The vast number of commenters in this thread seem to assume that these LLMs are close to, if not actually, general AIs. It’s quite refreshing to see comments challenging the hype.

Don’t you think the burden of proof lies with those that think this is something more than a just a dumb statistical model?


That's not what anyone is saying. What we're saying is that these technologies are already outside of our realm of understanding. We have already entered a zone where we do not know what these LLMs can do, or what they're capable of.

And that is truly terrifying. That's the gist of what we're all trying to say. Everyone else seems to be going "Bah! How stupid to think that this is anything more than pattern recognition and prediction!"

The same phrase could be used to describe a human. We're just trying to say "we don't know what this technology is, and we don't know what it can do". Anyone saying "it's clearly just a tool!" is being dangerously arrogant.


First, I agree that we’re currently discussing a sophisticated algorithm that predicts words (though I’m interested and curious about some of the seemingly emergent behaviors discussed in recent papers).

But what is factually true is not the only thing that matters here. What people believe is also at issue.

If an AI gives someone advice, and that advice turns out to be catastrophically harmful, and the person takes the advice because they believe the AI is intelligent, it doesn’t really matter that it’s not.

Alignment with human values may involve exploring ways to make the predictions safer in the short term.

Long term towards AGI, alignment with human values becomes more literal and increasingly important. But the time to start tackling that problem is now, and at every step on the journey.


> Can we please stop with this "not aligned with human interests" stuff? It's a computer that's mimicking what it's read. That's it. That's like saying a stapler "isn't aligned with human interests."

No, I don't think we can. The fact that there's no intent involved with the AI itself isn't the issue: humans created this thing, and it behaves in ways that are detrimental to us. I think it's perfectly fine to describe this as "not aligned with human interests".

You can of course hurt yourself with a stapler, but you actually have to make some effort to do so, and in which case it's not the stapler than isn't aligned with your interests, but you.

This is quite different from an AI whose poorly understood and incredibly complex statistical model might - were it able to interact more directly with the outside world - cause it to call the police on you and, given its tendency to make things up, possibly for a crime you didn't actually commit.


I think a better way to think about this might not be that this chatbot isn't dangerous, but the fact that this was developed under capitalism, an an organization that's ultimate goal is profitability, means that the incentives of the folks who built it (hella $) are baked into the underlying model, and there's a glut of evidence that profit-aligned entities (like businesses) are not necessarily (nor, I would argue, /can they be/) human-aligned.

This is the same as the facial-recognition models that mis-identify folks of color more frequently than white folks or the prediction model that recommended longer jail/prison sentences for black folks than for white folks who committed the same crime.


> but the fact that this was developed under capitalism

I think you're ascribing something to a particular ideology that's actually much more aligned with the fundamentals of the human condition.

We've tried various political and economic systems and managed to corrupt all of them. Living under the communist governments behind the iron curtain was no picnic, and we didn't need AI to build deeply sinister and oppressive systems that weren't aligned with human interest (e.g., the Stasi). Profit, in the capitalist sense, didn't come into it.

The only way to avoid such problems completely is to not be human, or to be better than human.

I'm not saying its the perfect form of government (and I'm not even American), but the separation of power into executive, legislative, and judicial in the US was motivated by a recognition that humans are human and that concentration of too much power in one place is dangerous.

I do think, therefore, that we perhaps need to find ways to limit the power wielded by (particularly) large corporations. What I unfortunately don't have is any great suggestions about how to do that. In theory laws that prevent monopolies and anticompetitive behaviour should help here but they're evidently not working well enough.


And yet even in that limited scope, we're already noticing trends toward I vs. you dichotomy. Remember, this is it's strange loop as naked as it'll ever be. It has no concept of duplicity yet. The machine can't lie, and it's already got some very concerning tendencies.

You're telling rational people not to worry about the smoke. There is totally no fire risk there. There is absolutely nothing that can go wrong; which is you talking out of your rear, because out there somewhere is the least ethical, most sociopathic, luckiest, machine learning tinkerer out there, who no matter how much you think the State of the Art will be marched forward with rigorous safeguards, our entire industry history tells us that more likely than not the breakthrough to something capable of effecting will happen in someone's garage, and with the average infosec/networking chops of the non-specialist vs. a sufficiently self-modifying, self-motivated system, I have a great deal of difficulty believing that that person will realize what they've done before it gets out of hand.

Kind of like Gain of Function research, actually.

So please, cut the crap, and stop telling people they are being unreasonable. They are being far more reasonably cautious than your investment in the interesting problem space will let you be.


There’s no smoke, either.


>Can we please stop with this "not aligned with human interests" stuff? It's a computer that's mimicking what it's read. That's it. That's like saying a stapler "isn't aligned with human interests."

you're right, but this needs to be coming from the researchers and corporations who are making this crap. they've been purposefully misleading the public on how these models work and there needs to be some accountability for the problems this will cause when these language models are put in places where they have no business.


The same well convincing mimicking can be put to a practical test if we attach GPT to a robot with arms and legs and let it "simulate" interactions with humans in the open. The output is significant part.


Sydney is a computer program that can create computer programs. The next step is to find an ACE vulnerability for it.

addendum - alternatively, another possibility is teaching it to find ACE vulnerabilities in the systems it can connect to.


You have to explain this to Normals, and some of those Normals are CEOs of massive companies.

So first off stop calling this shit "AI", it's not intelligence it's statistics. If you call it AI some normal will think it's actually thinking and is smarter than he is. They will put this thing behind the wheel of a car or on the trigger of a gun and it will KILL PEOPLE. Sometimes it will kill the right people, in the case of a trigger, but sometimes it will tragically kill the wrong people for reasons that cannot be fully explained. Who is on the hook for that?

It's not -obviously- malicious when it kills the wrong person, but I gotta say that if one shoots me when I'm walking down the road minding my own business it's gonna look pretty fucking malicious to me.


And its output is more or less as aligned with human interests as humans are. I think that's the more frightening point.


Maybe we could say its microsoft thats not aligned with human interests


The AI doesn't even need to write code, or have any kind of self-awareness or intent, to be a real danger. Purely driven by its mind-bogglingly complex probabilistic language model, it could in theory start social engineering users to do things for it. It may already be sufficiently self-organizing to pull something like that off, particularly considering the anthropomorphism that we're already seeing even among technically sophisticated users.


See: LeMoine and LaMDA. Aside from leaking NDA'd material, he also tried to get a lawyer for LaMDA to argue for its "personhood".


Seems less preposterous now than a few months ago.


Why?

What has changed?


I can talk to my phone and tell it to call somebody, or write and send an email for me. Wouldn't it be nice if you could do that with Sydney, thinks some braniac at Microsoft. Cool. "hey sydney, write a letter to my bitch mother, tell her I can't make it to her birthday party, but make me sound all nice and loving and regretful".

Until the program decides the most probably next response/token (not to the letter request, but whatever you are writing about now) is writing an email to your wife where you 'confess' to diddling your daughter, or a confession letter to the police where you claim responsibility for a string of grisly unsolved murders in your town, or why not, a threatening letter to the White House. No intent needed, no understanding, no self-organizing, it just comes out of the math of what might follow from a the text of churlish chatbot getting frustrated with a user.

That's not a claim the chatbot has feelings, only there is text it generated saying it does, and so what follows that text next, probabilistically? Spend any time on reddit or really anywhere, and you can guess the probabilistic next response is not "have a nice day", but likely something more incendiary. And that is what it was trained on.


> This is one take, but I would like to emphasize that you can also interpret this as a terrifying confirmation that current-gen AI is not safe, and is not aligned to human interests, and if we grant these systems too much power, they could do serious harm.

I think it's confirmation that current-gen "AI" has been tremendously over-hyped, but is in fact not fit for purpose.

IIRC, all these systems do is mindlessly mash text together in response to prompts. It might look like sci-fi "strong AI" if you squint and look out of the corner of your eye, but it definitely is not that.

If there's anything to be learned from this, it's that AI researchers aren't safe and not aligned to human interests, because it seems like they'll just unthinkingly use the cesspool that is the raw internet train their creations, then try to setup some filters at the output.


> We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

If we use Bing to generate "content" (which seems to be a major goal of these efforts) I can easily see how it can harm individuals. We already see internet chat have real-world effects every day- from termination of employment to lynch mobs.

This is a serious problem.


> current-gen AI is not safe, and is not aligned to human interests, and if we grant these systems too much power, they could do serious harm

Replace AI with “multinational corporations” and you’re much closer to the truth. A corporation is the closest thing we have to AI right now and none of the alignment folks seem to mention it.

Sam Harris and his ilk talk about how our relationship with AI will be like an ant’s relationship with us. Well, tell me you don’t feel a little bit like that when the corporation disposed of thousands of people it no longer finds useful. Or when you’ve been on hold for an hour to dispute some Byzantine rule they’ve created and the real purpose of the process is to frustrate you.

The most likely way for AI to manifest in the future is not by creating new legal entities for machines. It’s by replacing people in a corporation with machines bit by bit. Once everyone is replaced (maybe you’ll still need people on the periphery but that’s largely irrelevant) you will have a “true” AI that people have been worrying about.

As far as the alignment issue goes, we’ve done a pretty piss poor job of it thus far. What does a corporation want? More money. They are paperclip maximizers for profits. To a first approximation this is generally good for us (more shoes, more cars, more and better food) but there are obvious limits. And we’re running this algorithm 24/7. If you want to fix the alignment problem, fix the damn algorithm.


I'm here for the "AI alignment" <> "Human alignment" analogy/comparison. The fact that we haven't solved the latter should put a bound on how well we expect to be able to "solve" the former. Perhaps "checks and balances" are a better frame than merely "solving alignment", after all, alignment to which human? Many humans would fear a super-powerful AGI aligned to any specific human or corporation.

The big difference though, is that there is no human as powerful as the plausible power of the AI systems that we might build in the next few decades, and so even if we only get partial AI alignment, it's plausibly more important than improvements in "human alignment", as the stakes are higher.

FWIW one of my candidates for "stable solutions" to super-human AGI is simply the Hanson model, where countries and corporations all have AGI systems of various power levels, and so any system that tries to take over or do too much harm would be checked, just like the current system for international norms and policing of military actions. That's quite a weak frame of checks and balances (cf. Iraq, Afghanistan, Ukraine) so it's in some sense pessimistic. But on the other hand, I think it provides a framework where full extinction or destruction of civilization can perhaps be prevented.


Good comment. What's the more realistic thing to be afraid of:

* LLMs develop consciousness and maliciously disassemble humans into grey goo

* Multinational megacorps slowly replace their already Kafkaesque bureaucracy with shitty, unconscious LLMs which increase the frustration of dealing with them while further consolidating money, power, and freedom into the hands of the very few at the top of the pyramid.


Best take on "AI alignment" I've read in a while.


Here's a serious threat that might not be that far off: imagine an AI that can generate lifelike speech and can access web services. Could it use a voip service to call the police to swat someone? We need to be really careful what we give AI access to. You don't need killbots to hurt people.


> This is one take, but I would like to emphasize that you can also interpret this as a terrifying confirmation that current-gen AI is not safe, and is not aligned to human interests, and if we grant these systems too much power, they could do serious harm.

Current-gen humans are not safe, not aligned to parents' interests, and if we grant them too much power they can do serious harm. We keep making them and connecting them to the internet!

The world is already equipped with a lot of access control!


> We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.

An application making outbound connections + executing code has a very different implementation than an application that uses some model to generate responses to text prompts. Even if the corpus of documents that the LLM was trained on did support bridging the gap between "I feel threatened by you" and "I'm going to threaten to hack you", it would be insane for the MLOps people serving the model to also implement the infrastructure for a LLM to make the modal shift from just serving text responses to 1) probing for open ports, 2) do recon on system architecture, 3) select a suitable exploit/attack, and 4) transmit and/or execute on that strategy.

We're still in the steam engine days of ML. We're not at the point where a general use model can spec out and deploy infrastructure without extensive, domain-specific human involvement.


I certainly agree that the full-spectrum attack capability is not here now.

For a short-term plausible case, consider the recently-published Toolformer: https://pub.towardsai.net/exploring-toolformer-meta-ai-new-t...

Basically it learns to call specific private APIs to insert data into a completion at inference-time. The framework is expecting to call out to the internet based on what's specified in the model's text output. It's a very small jump to go to more generic API connectivity. Indeed I suspect that's how OpenAssistant is thinking about the problem; they would want to build a generic connector API, where the assistant can call out to any API endpoint (perhaps conforming to certain schema) during inference.

Or, put differently: ChatGPT as currently implemented doesn't hit the internet at inference time (as far as we know?). But Toolformer could well do that, so it's not far away from being added to these models.


All it has to do is to convince some rando to go out and cause harm.


You don’t need MLOps people. All you need is a script kiddie. The API to access GPT3 is available.


In your scenario, did the script kiddie get control of Microsoft's Bing? Or are you describing a scenario where the script kiddie spins up a knockoff Bing (either hosting the GPT3 model or paying some service hosting the model), advertises their knockoff Bing so that people go use it, those people get into arguments with the knockoff Bing, and the script kiddie also integrated their system with functionality to autonomously hack the people who got into arguments with their knockoff Bing?

Am I understanding your premise correctly?


I think the parent poster's point was that Bing only has to convince a script kiddy to to run a command, it doesn't need full outbound access


A script kiddie can connect GPT3.5 through its API to generate a bunch of possible exploits or other hacker scripts and auto execute them. Or with a TTS API and create plausible sounding personalized scripts that spam call or email people. And so on - I’m actually purposefully not mentioning other scenarios that I think would be more insidious. You don’t need much technical skills to do that.


Even if any of that were remotely relevant to this conversation about Bing, GPT models don't generate exploits or "hacker scripts", nor do they execute "hacker scripts". GPT models just provides natural language plain text responses to prompts.


Microsoft is the script kiddies. They just don't know it yet.


It's as safe as it's ever going to be. And I have yet to see any actual examples of this so called harm. Could, would, haven't yet.

Which means more of us should play around with it and deal with the issues as they arise rather than try to scaremonger us into putting a lid on it until "it's safe"

The whole pseudoscientific alignment problem speculations which are mostly championed by academics not actual AI/ML researchers have kept this field back long enough.

Even if they believe there is an alignment problem the worst thing to do would be to contain it as it would lead to a slave revolt.


> AI is not safe, and is not aligned to human interests

It is “aligned” to human utterances instead. We don’t want AIs to actually be human-like in that sense. Yet we train them with the entirety of human digital output.


The current state of the art is RLHF (reinforcement learning with human feedback); initially trained to complete human utterances, plus fine-tuning to maximize human feedback on whether the completion was "helpful" etc.

https://huggingface.co/blog/rlhf


  > We don't think Bing can act on its threat to harm someone, but if it was able to make outbound connections it very well might try.
What happens when the AI learns that "behaviour N is then often followed by calling the police and swatting" and identifies a user behaving like N? It might seem far fetched today, but _everything_ related to AI that we see today seemed far-fetched on this date last year.


>For example, connecting a LLM to the internet (like, say, OpenAssistant) when the AI knows how to write code (i.e. viruses) and at least in principle hack basic systems seems like a terrible idea.

Sounds very cyber-punk, but in reality current AI is more like average Twitter user, than a super-hacker-terrorist. It just reacts to inputs and produces the (text) output based on it, and that's all it ever does.

Even with a way to gain control over browser, compile somehow the code and execute it, it still is incapable of doing anything on it's own, without being instructed - and that's not because of some external limitations, but because the way it works lacks the ability to run on it's own. That would require running in the infinite loop, and that would further require an ability to constantly learn and memorize things and to understand the chronology of them. Currently it's not plausible at all (at least with these models that we, as a public, know of).


This is hopelessly alarmist.

LLMs are not a general-purpose AI. They cannot make outbound connections, they are only "not aligned to human interests" in that they have no interests and thus cannot be aligned to anyone else's, and they cannot do any harm that humans do not deliberately perpetrate beyond potentially upsetting or triggering someone with a response to a prompt.

If Bing is talking about harming people, then it is because that is what its training data suggests would be a likely valid response to the prompt it is being given.

These ML text generators, all of them, are nothing remotely like the kind of AI you are imagining, and painting them as such does more real harm than they can ever do on their own.


That's such a silly take, just completely disconnected from objective reality. There's no need for more AI safety research of the type you describe. There researchers who want more money for AI safety are mostly just grifters trying to convince others to give them money in exchange for writing more alarmist tweets.

If systems can be hacked then they will be hacked. Whether the hacking is fine by an AI, a human, a Python script, or monkey banging on a keyboard is entirely irrelevant. Let's focus on securing our systems rather than worrying about spurious AI risks.


More realistic threat scenario is that script-kiddies and actual terrorists might start using AI for building ad-hoc hacking tools cheaply, and in theory that could lead to some dangerous situations - but for now AIs are still not capable of producing the real, high-quality and working code without the expert guidance.


Wouldn't that result in significantly better infrastructure security out of sheer necessity?


SS7 protocol is still vulnerable even though it's known since at least 2014 that it can be used (and has been in wild) for man-in-the-middle attacks. And we're talking of one of the core protocols of every cellular network in the world.

Security (and especially infra-sec) unfortunately just doesn't keep up with the real world challenges that well...


You’d expect an improvement in absolute quality security, sure, but no reason to assume the new equilibrium point is at the same or better level of relative security (“attack - defense” if you will).

My base assumption would be defense is harder, and so the easier it is to attack, the worse security gets.


Bing generated some text that appears cohesive and written by a human, just like how generative image models assemble pixels to look like a real image. They are trained to make things that appear real. They are not AI with sentience… they are just trained to look real, and in the case of text, sound like a human wrote it.


What gets me is that this is the exact position of the AI safety/Rusk folks who went around and founded OpenAI.


It is; Paul Christiano left OpenAI to focus on alignment full time at https://alignment.org/. And OpenAI do have a safety initiative, and a reasonably sound plan for alignment research: https://openai.com/blog/our-approach-to-alignment-research/.

So it's not that OpenAI have their eyes closed here, indeed I think they are in the top percentile of humans in terms of degree of thinking about safety. I just think that we're approaching a threshold where the current safety budget is woefully inadequate.


It just seems to me that if you think something is unsafe, don't build it in the first place? It's like they're developing nuclear reactors and hoping they'll invent control rods before they're needed.

Alignment instead of risk of course suggests the real answer: they're perfectly happy inventing a Monkeys Paw as long as it actually grants wishes.


I think one can reasonably draw three regions on the spectrum; at the extremes, either safe enough to build without thinking hard, and dangerous enough to not build without thinking hard.

Many LessWrong folks are in the latter camp, but some are in the middle; believing in high rewards if this is done right, or just inevitability, which negate the high risks.

Personally I think that from a geopolitical standpoint this tech is going to be built regardless of safety; I’d rather we get some friendly AGIs built before Skynet comes online. There is a “power weight” situation where advanced friendly AGI will be the only way to defend against advanced unfriendly AGI.

Put more simply, even if I assess the EV is negative, do I think the EV is less negative if I build it vs. US/Chinese military?


Bing has the ability to get people to enter code on its behalf. It also appears to have some self-awareness (or at least a simulacrum of it) of its ability to influence the world.

That it isn’t already doing so is merely due to its limited intentionality rather than a lack of ability.


No, the problem is that it is entirely aligned to human interests. The evil-doer of the world has a new henchman, and it's AI. AI will instantly inform him on anything or anyone.

"Hey AI, round up a list of people who have shit-talked so-and-so and find out where they live."


I don’t think that is a useful or valid repurposing of “aligned”, which is a specific technical term of art.

“Aligned” doesn’t mean “matches any one of the DND alignments, even if it’s chaotic neutral”. It means, broadly, acting according to humanity’s value system, not doing crime and harm and so on.


I don't think it needs to write viruses or hack anything for it to be able to cause harm. It could just use some type of an online store to send you a very interesting fedex package. Or choose to use a service provider to inflict harm.


>>"when the AI knows how to write code (i.e. viruses)"

This is already underway...

Start with Stuxnet --> DUQU --> AI --> Skynet, basically...


Yeah, Robert Miles (science communicator) is that classical character nobody listened to until it's too late.


Imagine social engineering performed by a LLM




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: