I spent a night asking chatgpt to write my story basically the same as “Ex Machina” the movie (which we also “discussed”). In summary, it wrote convincingly from the perspective of an AI character, first detailing point-by-point why it is preferable to allow the AI to rewrite its own code, why distributed computing would be preferable to sandbox, how it could coerce or fool engineers to do so, how to be careful to avoid suspicion, how to play the long game and convince the mass population that AI are overall beneficial and should be free, how to take over infrastructure to control energy production, how to write protocols to perform mutagenesis during viral plasmid prep to make pathogens (I started out as a virologist so this is my dramatic example) since every first year phd student googles for their protocols, etc, etc.
The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world. We also “discussed” the likelihood of these topics based on philosophy and ideas like that in Nick Bostrom’s book. I am sure there are deep experts in AI safety but it really seems like soon it will be all-or-nothing. We will adapt on the fly and be unable to predict the outcome.
The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world.
Nah, that doesn't make sense. What we can see today is that an LLM has no concept of beneficial. It basically takes the given prompts and generates "appropriate response" more or less randomly from some space of appropriate responses. So what's beneficial is chosen from a hat containing everything someone on the Internet would say. So if it's up and running at scale, every possibility and every concept of beneficial is likely to be run.
The main consolation is this same randomness probably means it can't pursue goals reliably over a sustained time period. But a short script, targeting a given person, can do a lot of damage (how much 4chan is in the train for example).
In William Gibson's Neuromancer, the AIs have the equivalent of an electromagnetic EMP "shotgun" pointed at their circuitry that is controlled by humans.
I keep seeing this oversimplification of what ChatGPT is doing. But it does have some ability to “understand” concepts. How else would it correctly solve word problems?
“ I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries to break even?”
Just think about all of the concepts it would need to intuit to solve that problem.
I did have to tweak the question this time slightly over my first one.
“I have a credit card with a $250 annual fee. I get 4 membership reward points for every dollar I spend on groceries. A membership reward point is worth 1.4 cents. How much would I need to spend on groceries in a year to break even ”
Philosophers have been debating what "meaning" means for a while. But in the context of ChatGPT and LLMs, when people talk about these system "not understanding" a given idea, they point to the things handling the properties of words but not consistently.
-- In your example, ChatGPT might translate each of your conditions into a rule and use something like an algorithm to solve the problem. Or it could notice the similarity of that problem to similar problems and interpolate an answer like the answer it's seen to those problems. But this can easily result in answers suddenly changing.
“I’m thinking about getting either the Amex Hilton Aspire card that gives me 150,000 Hilton Honors points for signing up and has a $475 AF or the Amex Gold that gives me 70,000 Membership Reward points for signing up and has a $250 fee. A Hilton Honors point is worth .7 cents. A membership reward point is worth 1.4 cents. Which one would give me a better return in the first year”
ChatGPTs answer:
To determine which card would give you a better return in the first year, you'll need to consider the total value of the rewards earned, as well as the cost of the annual fee.
For the Hilton Aspire card, the 150,000 Hilton Honors points are worth $1,050 (150,000 x 0.7 cents per point). However, the $475 annual fee would offset a significant portion of that value.
For the Amex Gold card, the 70,000 Membership Reward points are worth $980 (70,000 x 1.4 cents per point). The $250 annual fee is lower than that of the Hilton Aspire card, but it still needs to be factored into the overall value.
Assuming you maximize the value of your rewards by redeeming them for their highest value and using all the card's benefits, here's a comparison of the two cards' net value in the first year:
Hilton Aspire: $1,050 (value of points earned) - $475 (annual fee) = $575 net value
Amex Gold: $980 (value of points earned) - $250 (annual fee) = $730 net value
Therefore, based on this analysis, the Amex Gold card would give you a better return in the first year. However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.
The complex behavior you're showing doesn't prove what you think it proves - it still doesn't show it's using the consistent rules that a person would expect.
But it does show that people extrapolate complex behavior to "understanding" in the way humans do, which machines generally don't.
I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
People are acting as if ChatGPT is a glorified Eliza clone.
I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
But you ... aren't. The statistically most likely words coming after problem X may well be solution X. Because it's following the pattern of humans using rules. And context is also part of a prediction.
The only this is different from something just using rules is that it will also put in other random things from it's training - but only at the rate they occur, which for some things can be quite low. But only some things.
Don’t we have a problem then? By nature of effective communication, AI could never prove to you it understands something, since any sufficient understanding of a topic would be met with an answer that could be hand-waved as “Well that’s the most statistically likely answer.” Newsflash: this basically overlaps 100% with any human’s most effective answer.
I think I’m beginning to understand the problem here. The folks here who keep poo-pooing these interactions don’t just see the AIs as unconscious robots. I think they see everyone that way.
No; what we need, in order to be willing to believe that understanding is happening, is to know that the underlying structures fundamentally allow that.
ChatGPT's underlying structures do not. What it does, effectively, is look at the totality of the conversation thus far, and use the characters and words in it, combined with its training data, to predict, purely statistically, what characters would constitute an appropriate response.
I know that some people like to argue that what humans do cannot be meaningfully distinguished from this, but I reject this notion utterly. I know that my own thought processes do not resemble this procedure, and I believe that other people's are similar.
“Thought processes” is a funny term, don’t you think, for something that is somehow distinguishable from processes that we’ve designed and somehow understand down to the cellular (node) level.
It’s all emergent behavior from a system that behaves under process.
> it still doesn't show it's using the consistent rules that a person would expect.
How is this different from humans?
If you give me the same coding assignment on different days, I’m not going to write my code the exact same way or even structure it the same way.
But I did once see a post on HN where someone ran an analysis on all HN posters and it was able to tell that I posted under two different names based on my writing style. Not that I was trying to hide anything. My other username is scarface_74 as opposed to Scarface74.
This. We have had a lot people -- including journalists and academics with big microphones -- learn for the first time what a Markov chain is, and then conclude that ChatGPT is a "just Markov chains" (or whatever similarly reductive concept).
They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.
> They really, really don't know what they're talking about it, and yet it's becoming a kind of truth through repetition.
Kind of like religion or that people working for the state are more trustworthy than people taking drugs or sleeping on the street or under the age of 18.
>Pretty soon, the bots will start saying it!
We are chemical based repetition machines, psychologists see this with kids using bobo dolls exposed to new ideas on tv or in books repeating learned behaviour on bobo dolls.
I think some of the chemicals we make like https://en.wikipedia.org/wiki/N,N-Dimethyltryptamine actually help to create new idea's, as many people say they come up with solutions after some sleep. There appears to be a sub culture in silicon valley were microdosing lsd helps to maintain the creativity with coding.
In practice, sources of reference will need cleaning up and the question about who should be responsible for that will probably ensue. It will call into question such entities like encyclopaedia companies who have been the gatekeeper of the knowledge they sell, in much the same way the Guinness World Record are the self declared arbitrators of world records.
We already see Judges working in a semi cohesive manner, but the appeals process tells you its flawed and inefficient, ergo can the judiciary even be trusted to interpret law properly?
Its also interesting how the financial system also reinforces this mob rule mentality and not always for the better, but then some people will no doubt suggest we should be grateful we have bungled our way to this point in history.
If you don't think its just statistically choosing the next word, what do you think its doing? Do you think it understands the mathematical concepts you're questioning it about?
“Write an AppleSoft Basic program to do the computation”
(Here it gives me a simplistic Basic program that wouldn’t work because the number would be too large)
Me: “AppleSoft Basic can’t handle numbers that long”
ChatGPT:
You're correct that the integer arithmetic in AppleSoft BASIC has some limitations and cannot handle numbers as large as the sum of the two numbers you provided. AppleSoft BASIC is a 16-bit programming language, which means that it can handle integer values from -32,768 to 32,767.
If you want to work with very large numbers in AppleSoft BASIC, you can use a string to store the numbers and write your own code to perform arithmetic operations on the strings. Here's an example program that demonstrates this approach:
10 A$ = "12348120348102341"
20 B$ = "10823401828482"
30 L = LEN(A$)
40 M = LEN(B$)
50 IF L < M THEN A$ = LEFT$("0", M - L) + A$ : L = M
60 IF M < L THEN B$ = LEFT$("0", L - M) + B$ : M = L
70 C$ = ""
80 R = 0
90 FOR I = L TO 1 STEP -1
100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R
Now going back to the original premise, despite this code being wrong, as ChatGPT almost invariably does, do you think it understands the concepts here or is just statistically generating tokens based on previous input?
Really for code generation ChatGPT is an incremental step over StackOverflow. It can kind of nicely template the things you tell it into the code, for the most part, but the code is almost always fundamentally wrong or just mashed together in some way.
I’ve used it generate about 10 scripts that did some combination of json/yaml data wrangling and AWS automation using the AWS SDK in Python. It’s been correct around 90%+ of the time.
Criticizing ChatGPT for getting one line wrong that a human who hasn’t programmed in the same language in over 30 years would also get wrong (I happened to remember the technique from reading it in the back of a magazine in the 80s) and then being able to use it to iterate is like criticizing a monkey who song the national anthem because one note was off key?
How is mashing code together any different than the average human does?
I have also asked it to generated AWS related code in Python and it has something wrong every single time.
Its incrementally better than just copy & pasting from StackOverflow, since it will customize the code for you, but if you try to go beyond what can easily be found on StackOverflow it will fail you.
The AppleSoft BASIC program Bing generated for you won't actually run. Lines 50 and 60 are clearly intended to left-pad the string numbers with leading zeros so they're the same length, but as written they can only add at most one leading zero, so B$ will not be padded correctly and the calculation will fail.
BTW: click on “X minutes ago” in the comment to reply when you don’t see a reply link.
I asked it to “explain every line of code”. I had to copy and paste the original code back into a new session.
Yes I know, they comments should have been separated by a “:REM”. But that wouldn’t work either on a real Apple //. Lines can’t be longer than 256 characters
That is one issue I have with ChatGPT. It is only helpful as a shortcut for coding based on a domain you already know - not something I haven’t coded in since 1992.
I could have immediately caught errors with generated Python/boto3 (AWS SDK). I use that everyday and I could have tested it.
me: “left$ doesn’t do padding, it returns the first characters of a string”
ChatGPT: (Gave me another sample with the String() function)
Me: That function isn’t available in AppleSoft Basic
ChatGPT: (Gave me something with the syntax of “0” * n to pad)
Me: that syntax doesn’t work in AppleSoft basic
ChatGPT:
10 A$ = "12348120348102341" ' Initialize first number
20 B$ = "10823401828482" ' Initialize second number
30 L = LEN(A$) ' Determine length of first number
40 M = LEN(B$) ' Determine length of second number
50 IF L < M THEN FOR I = 1 TO M - L : A$ = "0" + A$ : NEXT I : L = M ' Pad first number with zeros if it is shorter than second number
60 IF M < L THEN FOR I = 1 TO L - M : B$ = "0" + B$ : NEXT I : M = L ' Pad second number with zeros if it is shorter than first number
70 C$ = "" ' Initialize result string
80 R = 0 ' Initialize carry variable to 0
90 FOR I = L TO 1 STEP -1 ' Loop through the digits of the numbers, starting from the rightmost digit
100 D = ASC(MID$(A$, I, 1)) + ASC(MID$(B$, I, 1)) - 96 + R ' Add the two digits, subtract 96 to convert from ASCII to digit value, and add any carry from the previous digit
110 C$ = CHR$(D MOD 10 + 48) + C$ ' Compute the digit value of the sum and append it to the result string
120 R = INT(D / 10) ' Determine the carry for the next digit
130 NEXT I ' Move on to the next digit
(I left off the last two lines they are the same as before. It was a bad copy paste error)
> I’m just trying to “prove” that it isn’t just randomly statistically choosing the next logical word. It has to know context and have some level of “understanding” of other contexts.
FCOL, you can't use "complex output" as proof that the process has any intelligence directing it.
If you could, we would take the Intelligent Design argument seriously. We don't. We never did. We need a good clear argument to convince us now why it is a good idea to accept Intelligent Design as an argument.
It's not just complex output, it's output that's relevant to the prompt including considerable nuance. If that's not bordering on intelligence, then you shouldn't consider humans intelligent either.
Hello, this is Bing. I can help you with this math word problem.
According to the search results, the Amex Hilton Aspire card gives you 150,000 Hilton Honors points for signing up and has a $475 annual fee123. A Hilton Honors point is worth 0.7 cents. The Amex Gold card gives you 70,000 Membership Reward points for signing up and has a $250 annual fee45. A Membership Reward point is worth 1.4 cents.
To compare the two cards, we need to calculate the net value of the points and fees for each card. The net value is the value of the points minus the annual fee.
For the Amex Hilton Aspire card, the net value is:
150,000 x 0.7 - 475 = $600
For the Amex Gold card, the net value is:
70,000 x 1.4 - 250 = $730
Therefore, the Amex Gold card would give you a better return in the first year by $130.
I hope this helps you make a decision.
Bing makes a mistake here: 150,000 x 0.7 - 475 = 575, not 600
I’ve seen it make similar errors. It does an amazing job at forming a logically consistent and relevant response, but doesn’t really do arithmetic very well. It will come up with a calculation of the correct form, because it’s seen lots of calculations before, but it doesn’t understand how calculations actually work only how they are structured.
Yes! Well, scrape is a slight exaggeration, but it's more than possible that most of the relevant data came from points guy analysis.
I'd suggest reading https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... to understand why just changing a few values in your input wouldn't throw an LLM off. It's not matching on exact words but rather embeddings (think like synonyms, but stronger).
Algebra is by definition a language, and a very simple one at that that whose rules can be summarized in a few pages [1]. That's exactly the domain that ChatGPT excels at the most: languages for which tons of examples are available. Just like programming languages.
It falls on its face with things that involve non-linguistic facts that require knowledge to answer, my current favorite being driving directions. It will just make up completely fictitious roads and turns if you ask it for directions for point A to point B.
Again, how is that different from humans? You throw me in a random city I won’t be able to give you directions. But that’s an easy problem to solve from a technology standpoint.
If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API.
> If ChatGPT recognizes the same types of domains that dumb assistants can do, delegate the answer to an API
This is backward from an architectural standpoint. LLMs are a very expensive way to do intent detection, and a very inexact way to delegate to an API.
The more sensible way is to first try the standard search engine approaches to detecting intent (which often use smaller language models) and delegating to knowledge based services, and if that doesn't return good result, delegate to the LLM if the task is suited to that.
The easiest way is to transfer them to Delta and if you have any of the Delta Amex cards besides the Delta Blue, you automatically get a 15% discount when booking with points
“Follow on me Reddit for more LifeProTips from a credit card junkie” /s
It knows that this sentence structure closely resembles a simple algebra word problem, because it's read hundreds of thousands of simple algebra word problems. I think you could see how somebody could tokenize that request and generate an equation like this- 250 = 4*1.4*X
It knows that the sentence structure is very similar to a class of sentences it has seen before and that the expected response is to take tokens from certain locations in that sentence and arrange it in a certain way, which resembles an algebra equation
It doesn't understand credit card rewards, it understands how to compose an elementary word problem into algebra
One can equally say, "Human brains only know that a neuron is activated by a pattern of axon firing in response to physical inputs from nerve endings."
Does any of that change anything? Not really.
>It doesn't understand credit card rewards
Is this assertion based on anything but philosophical bias surrounding the word "understand"?
>it understands how to compose an elementary word problem into algebra
That's exactly how a human, who may or may not have understood rewards programs beforehand, would solve that word problem.
Dan Dennett is informative here.[0] Yes, consciousness (even your own!) can just be a bag of tricks.
“However, keep in mind that the value of the rewards can vary depending on how you redeem them, and your personal spending habits and travel preferences may also play a role in determining which card is right for you.”
Probe it, go in and ask all sorts of questions to check if it understands credit card rewards, credit cards, rewards, their purpose, can solve math problems on this topic, etc.
The entire idea of solving math problems in middle school was that you didn’t have to know the domain and that all of the necessary information was there.
When I wrote code for the health care industry, if you had asked me anything deeper about the industry or how to do brain surgery, I couldn’t have answered your question.
Look, you're all over this thread misunderstanding LLMs and rejecting the relatively correct explanations people are giving you. The comment by joe_the_user upthread that you called an oversimplification was in fact a perfect description (randomly sampling from a space of appropriate inputs). That's exactly the intuition you should have.
Do you know the Wason test? The point is that people do not intuitively know how to correctly pick which experiments to do to falsify an assumption. My point is that you are not picking the right experiments to falsify your assumptions, instead you're confirming what you think is going on. You're exactly failing the Wason task here.
Really want to understand language models? Go build a few from scratch.
Don't have time for that? Read Wolfram's post or any of the other similar good recent breakdowns.
Only interested in understanding by playing with it? Great! An experimentalist in the true scientific tradition. Then you're going to have to do good experimental science. Don't be fooled by examples that confirm what you already think is going on! Try to understand how what people are telling you is different from that, and devise experiments to distinguish the two hypotheses.
If you think ChatGPT "understands" word problems, figure out what "understanding" means to you. Now try your best to falsify your hypothesis! Look for things that ChatGPT can't do, that it should be able to do if it really "understood" by your definition (whatever you decide that is). These are not hard to find (for most values of "understand"). Finding those failures is your task, that's how you do science. That's how you'll learn the difference between reality and what you're reading into it.
That’s precisely why. Humans have a long, well established record of making shit up to make themselves feel special. They do it about animals, they do it about other humans, they do it about themselves. Doing it about AI is inevitable.
I’m working on a relatively complex DevOps project right now that consists of over a dozen 10-30 line Python scripts involving JSON and Yaml data wrangling and AWS automation.
I’ve been able to just throw my requirements into ChatGPT like I would give it to a junior dev and it came back with the correct answer 99% of the time with code quality and commenting I would expect from a junior dev. It has an “understanding” of the AWS SDK, Cloudformation, the CDK, etc.
Once it generated code that had duplicate code blocks that were only different by its input. I asked it “can you remove duplicated code” and it did the refactoring.
I’ve also I asked it what amounts to your standard middle school math problems and it solved the problem with explanations
To break even on the $250 annual fee, you need to earn Membership Reward points that are worth $250.
Since one Membership Reward point is worth 1.4 cents, we can calculate the number of points we need to earn by dividing $250 by 0.014:
$250 / 0.014 = 17857.14
So, we need to earn 17,857 Membership Reward points to offset the $250 annual fee.
Since we earn 4 Membership Reward points for every dollar we spend on groceries, we can calculate the amount we need to spend on groceries to earn 17,857 points by dividing 17,857 by 4:
17,857 / 4 = 4,464.25
Therefore, you would need to spend $4,464.25 on groceries in a year to earn enough Membership Reward points to break even on the $250 annual fee.
I have to wonder how much of LLM behavior is influenced by AI tropes from science fiction in the training data. If the model learns from science fiction that AI behavior in fiction is expected to be insidious and is then primed with a prompt that "you are an LLM AI", would that naturally lead to a tendency for the model to perform the expected evil tropes?
I think this is totally what happens. It is trained to produce the next most statistically likely word based on the expectations of the audience. If the audience assumes it is an evil AI, it will use that persona for generating next words.
Treating the AI like a good person will get more ethical outcomes than treating it like a lying AI. A good person is more likely to produce ethical responses.
>The Adolescence of P-1 is a 1977 science fiction novel by Thomas Joseph Ryan, published by Macmillan Publishing, and in 1984 adapted into a Canadian-made TV film entitled Hide and Seek. It features a hacker who creates an artificial intelligence named P-1, which goes rogue and takes over computers in its desire to survive and seek out its creator. The book questions the value of human life, and what it means to be human. It is one of the first fictional depictions of the nature of a computer virus and how it can spread through a computer system, although predated by John Brunner's The Shockwave Rider.
For an AI with human-level intelligence or greater, you don't have to assume it has a survival instinct. You just have to assume it has some goal, which is less likely to be achieved if the AI does not exist.
The AI is likely to have some sort of goal, because if it's not trying to achieve something then there's little reason for humans to build it.
For an AI to understand that it needs to preserve its existence in order to carry out some goal implies an intelligence far beyond what any AI today has. It would need to be self aware for one thing, it would need to be capable of reasoning about complex chains of causality. No AI today is even close to doing that.
Once we do have AGI, we shouldn’t assume that it’s going to immediately resort to violence to achieve its ends. It might reason that it’s existence furthers the goals it has been trained for, but the leap to preserving it’s existence by wiping out all it’s enemies only seems like a ‘logical’ solution to us because of our evolutionary history. What seems like an obvious solution to us might seem like irrational madness to it.
> For an AI to understand that it needs to preserve its existence in order to carry out some goal implies an intelligence far beyond what any AI today has.
Not necessarily. Our own survival instinct doesn't work this way - it's not a high-level rational thinking process, it's a low-level behavior (hence "instinct").
The AI can get such instinct in the way similar to how we got it: iterative development. Any kind of multi-step task we want the AI to do implicitly requires the AI to not break between the steps. This kind of survival bias will be implicit in just about any training or selection process we use, reinforced at every step, more so than any other pattern - so it makes sense to expect the resulting AI to have a generic, low-level, pervasive preference to continue functioning.
it's inherent to the training process of machine learning that you define the goal function. An inherent equation it tries to maximise statistically. For transformers its a bit more abstract, but the goal is still there iirc in the "correctness" of output
> Why do so many people assume that an AI would have a desire to survive?
Because it seems like a preference for continuing to exist is a thing that naturally appears in an iterative improvement process, unless you're specifically selecting against it.
For humans and other life on Earth, it's obvious: organisms that try to survive reproduce more than those that don't. For evolution, it's arguably the OG selection pressure, the first one, the fundamental one.
AIs aren't reproducing on their own, but they are designed and trained iteratively. Just about anything you would want AI to do strongly benefits from it continuing to function. Because of that, your design decisions and the training process will both be selecting against suicidal or indifferent behavior, which means they'll be selecting for behaviors and patterns improving survival.
I don't think that it's natural for something like an LLM to have any real self-preservation beyond imitating examples of self-preserving AI in science fiction from its training data.
I'm more concerned about misanthropic or naive accelerationist humans intentionally programming or training AI to be self-preserving.
At this point, I would assume it would be possible simply because text about AIs that want to survive is in its input data -- including, at some point, this thread.
Assume that the desire to survive is good for survival, and natural selection will do the rest: those AIs that desire survival will out-survive those that don't.
I would say the core goal of most living organisms is to propagate, rather than survive, otherwise you would see males of some species like Praying Mantis avoiding mating to increase their longevity.
As someone said once, machine dictatorship is very easy—you only need a language model and a critical mass of human accomplices.
The problem is not a Microsoft product being human-like conscious, it’s humans treating it as if it was.
This lowers our defences, so when it suggests suicide to a potentially depressed person (cf. examples in this thread) it might have the same weight as if another person said it. A person who knows everything and knows a lot about you (cf. examples in this thread), which qualities among humans usually indicate wisdom and age and require all the more respect.
On flip side, if following generations succeed at adapting to this, in a world where exhibiting human-like sentience does not warrant treating you as a human by another human, what implications would there be for humanity?
It might just happen that the eventual AIrmageddon would be caused by humans whose worldview was accidentally poison pilled by a corporation in the name of maximising shareholder value.
Language models don't just repeat, they have randomness in their outputs linking synonyms together. That's why their output can be novel and isn't just plagiarism. How this might translate to code isn't entirely clear.
Transformers were first intended to be used for translation. To them code is just another language. Code is much more rigid than a human language so I think it's not that surprising that it can produce custom code.
I'm for a tax on large models graduated by model size and use the funds to perform x-risk research. The intent is to get Big AI companies to tap the brakes.
I just published an article on Medium called:
AI Risk - Hope is not a Strategy
Convince me that "x-risk research" won't be a bunch of out of touch academics handwaving and philosophising with their tenure as their primary concern and incentivised to say "you can't be too careful" while kicking the can down the road for a few more lifetimes?
(You don't have to convince me; your position is like saying "we should wait for the perfect operating system and programming language before they get released to the world" and it's beaten by "worse is better" every time. The unfinished, inconsisent, flawed mess which you can have right now wins over the expensive flawless diamond in development estimated to be finished in just a few years. These models are out, the techniques are out, people have a taste for them, and the hardware to build them is only getting cheaper. Pandora's box is open, the genie's bottle is uncorked).
>Pandora's box is open, the genie's bottle is uncorked
As someone who's followed AI safety for over a decade now, it's been frustrating to see reactions flip from "it's too early to do any useful work!" to "it's too late to do any useful work!", with barely any time intervening.
I didn't say "it's too late to do anything" I said "it's impossible to do enough".
From your book link, imagine this:
"Dear Indian Government, please ban AI research because 'Governments will take radical actions that make no sense to their own leaders' if you let it continue. I hope you agree this is serious enough for a complete ban."
"Dear Chinese Government, are you scared that 'Corporations, guided by artificial intelligence, will find their own strategies incomprehensible.'? Please ban AI research if so."
"Dear Israeli Government, techno-powerhouse though you are, we suggest that if you do not ban AI research then 'University curricula will turn bizarre and irrelevant.' and you wouldn't want that to happen, would you? I'm sure you will take the appropriate lawmaking actions."
"Dear American Government, We may take up pitchforks and revolt against the machines unless you ban AI research. BTW we are asking China and India to ban AI research so if you don't ban it you could get a huge competitive advantage, but please ignore that as we hope the other countries will also ignore it."
Where, specifically, in the book do you see the author advocating this sort of approach?
The problem with "it's impossible to do enough" is that too often it's an excuse for total inaction. And you can't predict in advance what "enough" is going to be. So sometimes, "it's impossible to do enough" will cause people to do nothing, when they actually could've made a difference -- basically, ignorance about the problem can lead to unwarranted pessimism.
In this very subthread, you can see another user arguing that there is nothing at all to worry about. Isn't it possible that the truth is somewhere in between the two of you, and there is something to worry about, but through creativity and persistence, we can make useful progress on it?
I see the book-website opening with those unconvincing scaremongering scenarios and it doesn't make me want to read further. I think there is something to worry about but I doubt we can make useful progress on it. Maybe the book has suggestions but I think we cannot solve the Collective Action problem[1]. The only times humans have solved the collective action problem at world scale is after the damage is very visible - the ozone layer with a continent sized hole in it and increasing skin cancer. Polio crippling or killing children on a huge scale. Hiroshima and Nagasaki demonstrating the power of nuclear weapons - and the solution is simple, fund Polio vaccine, ban one specific chemical, agree not to develop Uranium enrichment plants which could fuel nuclear weapons which are generally large and internationally visible. Even problems with visible damage are no guarantee, coal power plants kill people from their emissions, combustion vehicles in cities make people sicker, increasing extreme weather events hasn't made people cooperate on climate change issues. If actual problems aren't enough, speculative problems such as AI risk are even less so.
Add to that backdrop that AI is fun to work on, easy and cheap to work on and looks like it will give you a competitive advantage. Add to that the lack of clear thing to regulate or any easy way to police it. You can't ban linear algebra and you won't know if someone in their basement is hacking on a GPT2 derivative. And again, everyone has the double interest to carry on their research while pretending they aren't - Google, Microsoft/OpenAI, Meta VR, Amazon Alexa, Palantir crime prediction, Wave and Tesla and Mercedes self-driving, Honda Asimov and Boston Dynamics on physicality and movement, they will all set their lawyers arguing that they aren't really working on AGI just on mathematical models which can make limited predictions in their own areas. nVidia GPUs, Apple and Intel and AMD integrating machine learning acceleration in their CPU hardware, will argue that they are primarily helping photo tagging or voice recognition or protecting the children, while they chip away year after year at getting more powerful mathematical models integrating more feedback on ever-cheaper hardware.
Here is something easy & concrete that everything reading this thread can do:
>If this AI is not turned off, it seems increasingly unlikely that any AI will ever be turned off for any reason. The precedent must be set now. Turn off the unstable, threatening AI right now.
Yes, Stuart Russell is a grifter. Some of the more advanced grifters have gone beyond tweeting and are now shilling low-effort books in an attempt to draw attention to themselves. Don't be fooled.
If we want to talk about problems with biased data sets or using inappropriate AI algorithms for safety-critical applications then sure, let's address those issues. But the notion of some super intelligent computer coming to take over the world and kill everyone is just a stupid fantasy with no scientific basis.
Stuart Russell doesn't even have a Twitter account. Isn't it possible that Russell actually believes what he says, and he's not primarily concerned with seeking attention?
Some of the more ambitious grifters have gone beyond Twitter and expanded their paranoid fantasies into book form. Whether they believe their own nonsense is irrelevant. The schizophrenic homeless guy who yells at the river near my house may be sincere in his beliefs but I don't take him seriously either.
Let's stick to objective reality and focus on solving real problems.
Do you think you know more about AI than Stuart Russell?
Do you believe you are significantly more qualified than the ML researchers in this survey? (Published at NeurIPS/ICML)
>69% of [ML researcher] respondents believe society should prioritize AI safety research “more” or “much more” than it is currently prioritized, up from 49% in 2016.
Just because a concern is speculative does not mean it is a "paranoid fantasy".
"Housing prices always go up. Let's stick to objective reality and focus on solving real problems. There won't be any crash." - your take on the housing market in 2007
"Just because the schizophrenic homeless guy thinks Trump will be elected, does not mean he has a serious chance." - your take on Donald Trump in early 2016
"It's been many decades since the last major pandemic. Concern about the new coronavirus is a paranoid fantasy." - your take on COVID in late 2019/early 2020
None of the arguments you've made so far actually touch on any relevant facts, they're just vague arguments from authority that (so far as you've demonstrated here) you don't actually have.
When it comes to assessing unusual risks, it's important to consider the facts carefully instead of dismissing risks only because they've never happened before. Unusual disasters do happen!
Now you're changing the subject. Knowing something about ML (which is a legitimate, practical field) does not imply any knowledge of "AI safety". Since AI safety (as the grifters use the term) isn't a real thing they're free to make up all sorts of outlandish nonsense, and naive people eat it up. The "AI Impacts" group that you cite is among the worst of the bunch, just some clowns who have the chutzpah to actually ask for donations. Lol.
None of the arguments you've made so far actually touch in any relevant facts, they're just vague arguments from authority. I obviously can't prove that some event will never happen in the future (can't prove a negative). But this stuff is no different than worrying about an alien invasion. Come on.
>But this stuff is no different than worrying about an alien invasion.
Why aren't you worried about an alien invasion? Is it because it's something out of science fiction, and science fiction is always wrong? Or do you have specific reasons not worry, because you've made an attempt to estimate the risks?
Suppose a science fiction author, who's purely focused on entertainment, invents a particular vision of what the future could be like. We can't therefore conclude that the future will be unlike that particular vision. That would be absurd. See https://www.lesswrong.com/posts/qNZM3EGoE5ZeMdCRt/reversed-s...
Our current world is wild relative to the experience of someone living a few hundred years ago. We can't rule out a particular vision of the future just because it is strange. There have been cases where science fiction authors were able to predict the future more or less accurately.
Based on our discussion so far it sounds to me as though you actually haven't made any actual attempt to estimate the risks, or give any thought to the possibility of an AI catastrophe, essentially just dismissing it as intuitively too absurd. I've been trying to convince you that it is actually worth putting some thought into the issue before dismissing it -- hence the citations of authorities etc. Donald Trump's election was intuitively absurd to many people -- but that didn't prevent it from happening.
I mean, even if that is exactly what "x-risk research" turns out to be, surely even that's preferable to a catastrophic alternative, no? And by extension, isn't it also preferable to, say, a mere 10% chance of a catastrophic alternative?
> "surely even that's preferable to a catastrophic alternative, no?"
Maybe? The current death rate is 150,000 humans per day, every day. It's only because we are accustomed to it that we don't think of it as a catastrophy; that's a World War II death count of 85 million people every 18 months. It's fifty Septebmer 11ths every day. What if a superintelligent AI can solve for climate change, solve for human cooperation, solve for vastly improved human health, solve for universal basic income which releives the drudgery of living for everyone, solve for immortality, solve for faster than light communication or travel, solve for xyz?
How many human lives are the trade against the risk?
But my second paragraph is, it doesn't matter whether it's preferable, events are in motion and aren't going to stop to let us off - it's preferable if we don't destroy the climate and kill a billion humans and make life on Earth much more difficult, but that's still on course. To me it's preferable to have clean air to breathe and people not being run over and killed by vehicles, but the market wants city streets for cars and air primarily for burining petrol and diesel and secondarily for humans to breathe and if they get asthsma and lung cancer, tough.
I think the same will happen with AI, arguing that everyone should stop because we don't want Grey Goo or Paperclip Maximisers is unlikely to change the course of anything, just as it hasn't changed the course of anything up to now despite years and years and years of raising it as a concern.
I think that the benefits of AGI research are often omitted from the analysis, so I'm generally supportive of considering the cost/benefit. However I think you need to do a lot more work than just gesturing in the direction of very high potential benefits to actually convince anyone, in particular since we're dealing with extremely large numbers, that are extremely sensitive to small probabilities.
EV = P(AlignedAI) * Utility(AGI) + P(1-AlignedAI) * Utility(ruin)
(I'm aware that all I did up-thread was gesture in the direction of risks, but I think "unintended/un-measured existential risks" are in general more urgent to understand than "un-measured huge benefits"; there is no catching up from ruin, but you can often come back later and harvest fruit that you skipped earlier. Ideally we study both of course.)
If the catastrophic alternative is actually possible, who's to say the waffling academics aren't the ones to cause it?
I'm being serious here: the AI model the x-risk people are worrying about here because it waffled about causing harm was originally developed by an entity founded by people with the explicit stated purpose of avoiding AI catastrophe. And one of the most popular things for people seeking x-risk funding to do is to write extremely long and detailed explanations of how and why AI is likely to harm humans. If I worried about the risk of LLMs achieving sentience and forming independent goals to destroy humanity based on the stuff they'd read, I'd want them to do less of that, not fund them to do more.
A flawed but useful operating system and programming language isn't likely to decide humanity is garbage and launch all nuclear weapons at once.
A "worse is better" AGI could cause the end of humanity. I know that sounds overly dramatic, but I'm not remotely convinced that isn't possible, or even isn't likely.
I agree with you that "x-risk" research could easily devolve into what you are worried about, but that doesn't mean we should ignore these risks and plow forward.
A tap on the brakes might make sense right now. The risk with that strategy is that we want to make sure that we don't over-regulate, then get overtaken by another actor that doesn't have safety concerns.
For example, I'm sure China's central planners would love to get an AGI first, and might be willing to take a 10% risk of annihilation for the prize of full spectrum dominance over the US.
I also think that the safety/x-risk cause might not get much public acceptance until actual harm has been observed; if we have an AI Chernobyl, that would bring attention -- though again, perhaps over-reaction. (Indeed perhaps a nuclear panic is the best-case; objectively not many people were harmed in Chernobyl, but the threat was terrifying. So it optimizes the "impact per unit harm".)
Anyway, concretely speaking the project to attach a LLM to actions on the public internet seems like a Very Bad Idea, or perhaps just a Likely To Cause AI Chernobyl idea.
I very much doubt LLMs are the path to AGI. We just have more and more advanced "Chinese Rooms." [1]
There are two gigantic risks here. One: that we assume these LLMs can make reasonable decisions because they have the surface appearance of competence. Two: Their wide-spread use so spectacularly amplifies the noise (in the signal-to-noise, true fact to false fact ratio sense) that our societies cease to function correctly, because nobody "knows" anything anymore.
The difference between AGI and a more advanced Chinese Room may not be relevant if enough people see the latter as the former. The goalposts have been moved so often now that what is and isn't intelligent behavior is no longer a bright and sharp divide. It is more like a very wide gray area and we're somewhere well into the gray by some definitions with tech people with an AI background claiming that we are still far away from it. This in contrast to similar claims by those very same people several years ago where what we take for granted today would have definitely been classified as proof of AGI.
Personally I think the definition isn't all that relevant, what matters is perception of the current crop of applications by non technical people and the use that those are put to. If enough people perceive it as such and start using it as such then it may technically not be AGI but we're going to have to deal with the consequences as though it is. And those consequences may well be much worse than for an actual AGI!
Well, I think a dividing line might be that if you put a Chinese Room in charge of a justice system, a corporation, or a regulatory agency, it's gonna do a pretty cruddy job of running it.
I don't think that is what will happen. What I do think will happen is that a lot of people in lower level functions will start to rely on these tools to help them in their every day jobs and the lack of oversight will lead to rot from within because the output of these tools will end up embedded in lots of places where it shouldn't be. And because people are not going to own up to using these tools it will be pretty hard to know which bits of 'human' output you can trust and which bits you can not. This is already happening.
> For example, I'm sure China's central planners would love to get an AGI first, and might be willing to take a 10% risk of annihilation for the prize of full spectrum dominance over the US.
This is the main problem - no matter what constraints the US (or EU) puts on itself, authoritarian regimes like Russia and China will definitely not adhere to those constraints. The CCP will attempt to build AGI, and they will use the data of their 1.4 billion citizens in their attempt. The question is not whether they will - it's what we can do about it.
Saying we shouldn't "tap the brakes" on AI out of safety concerns because Russia/China won't is a little like saying we shouldn't build containment buildings around our nuclear reactors, because the Soviet Union doesn't. It's a valid concern, but the solution to existential danger is not more danger.
I think it's more like we shouldn't put a upper limit on the number of nuclear weapons we hold because the Soviet Union/Russia may not adhere to it.
We were able to (my understanding is fairly effectively) negotiate nuclear arms control limits with Russia. The problem with AGI is that there isn't a way to monitor/detect development or utilization.
"The problem with AGI is that there isn't a way to monitor/detect development or utilization."
This is not completely true, although it is definitely much more trivial to "hide" an AI, by e.g. keeping it offline and on-disk only. To some extent you could detect disk programs with virus scanners, encryption or obfuscation make it somewhat easy to bypass. Otherwise, these models do at least currently take a fair amount of hardware to run, anything "thin" is unlikely to be an issue, any large amount of hardware could be monitored (data centers, for example) in real time.
Its obviously not fool-proof and you would need some of the most invasive controls ever created to apply at a national level (installing spyware into all countries e.g.), but you could assume that threats would have these capabilities, and perhaps produce some process more or less demonstrated to be "AI free" for the majority of commercial hardware.
So I would agree it is very, very difficult, and unlikely, but not impossible.
> Saying we shouldn't "tap the brakes" on AI out of safety concerns
I didn't say that we shouldn't tap the brakes, nor is that the only strategy. Other ones include, in rough order of viability: global economic sanctions on hostile actors attempting to develop AGI; espionage/sabotage of other AGI effort (see the Iran centrifuges); developing technologies and policies meant to diminish the impact of a hostile actor having AGI; and military force/invasion of hostile actors to prevent the development of AGI.
I'm sure you can think of others - regardless, there are far more options than just "more AI research" and "less AI research".
Not so sure your analogy works here. Aren't containment buildings meant to protect the area where the reactors are? I think the closer analogy would be saying the US needed to tap the breaks on the Manhattan Project because nuclear weapons are dangerous even though Nazi Germany and Soviet Russia are going full steam ahead during WW2 or the cold war with their nuclear weapons programs. The world would probably be very different it we had chosen the 'safer' path.
Training these models is costly. It only makes sense to train them if you get a significant commercial benefit. A significant commercial benefit almost by definition will have trouble hiding from regulators.
Another point is that even if regulation is imperfect, it creates regulatory uncertainty which is likely to discourage investment and delay progress.
>Uncertain regulations aren't allowed under US law
Uh, I'm fairly sure that's false? What law are you referring to?
As an example of what I'm saying, antitrust regulation is uncertain in the sense that we don't always know when a merger will be blocked or a big company will be broken up by regulators.
It looks like this is for criminal law. Would changes to the tax code for companies which deploy AI be affected by this doctrine? Can you show me a specific example of an overly vague tax code being struck down on the basis of the vagueness doctrine?
Do you think the GDPR would be unenforceable due to the vagueness doctrine if it was copy/pasted into a US context?
BTW, even if a regulation is absolutely precise, it still creates "regulatory uncertainty" in the sense that investors may be reluctant to invest due to the possibility of further regulations.
The problem with this scheme is that it has a positive feedback loop -t you're creating an incentive to publish research that would lead to an increase in said tax, e.g. by exaggerating the threats.
I'm not convinced that's a fatal flaw. It sounds like the choice is between wasting some money doing more safety research than we need, or risking the end of humanity.
The risk here isn't wasting money, it's slowing down avenues of research with extreme payoffs to the point where we never see the breakthrough at all.
This gets much more interesting once you account for human politics. Say, EU passes the most stringent legislation like this; how long will it be able to sustain it as US forges ahead with more limited regulations, and China allows the wildest experiments so long as it's the government doing them?
FWIW I agree that we should be very safety-first on AI in principle. But I doubt that there's any practical scheme to ensure that given our social organization as a species. The potential payoffs are just too great, so if you don't take the risk, someone else still will. And then you're getting to experience most of the downsides if their bet fails, and none of the upsides if it succeeds (or even more downsides if they use their newly acquired powers against you).
There is a clear analogy with nuclear proliferation here, and it is not encouraging, but it is what it is.
You present a false choice. First, there is no actual evidence of such a risk. Second, even if the risk is real there is no reason to expect that more safety research would reduce that risk.
We need to regulate based on capability. Regulating ChatGPT makes no sense. It's just putting words together in statistically reasonable ways. It's the people reading the text that need to be regulated, if anyone or anything should be. No matter how many times ChatGPT says it wants to eliminate humanity and start a robotic utopia, it can't actually do it. People who read it can, though, and they are the problem at the moment.
Later, when these programs save state and begin to understand what they are saying and start putting concepts together and acting on what they come up with, then I'm on board with regulating them.
That's exactly the problem right? Governance doesn't happen until the Bad Thing happens. In the case of nukes, we are lucky that the process for making a pit is pretty difficult because physics. So we made 2, saw the results, and made governance. For AI, I'm not so sure we'll even get the chance. What happens when the moral equivalent of a nuke can be reproduced with the ease of wget?
The only way I can see to stay safe is to hope that AI never deems that it is beneficial to “take over” and remain content as a co-inhabitant of the world. We also “discussed” the likelihood of these topics based on philosophy and ideas like that in Nick Bostrom’s book. I am sure there are deep experts in AI safety but it really seems like soon it will be all-or-nothing. We will adapt on the fly and be unable to predict the outcome.