(I work at OpenAI.) Quite the opposite — this is an investment!

kuzehanka · on July 22, 2019

Is all this talk of AGI some kind of marketing meme that you guys are tolerating? We haven't figured out sentiment analysis or convnets resilient to single pixel attacks, and here is a page talking about the god damned singularity.

As an industry, we've already burned through a bunch of buzzwords that are now meaningless marketing-speak. 'ML', 'AI', 'NLP', 'cognitive computing'. Are we going for broke and adding AGI to the list so that nothing means anything any more?

andreilys · on July 22, 2019

At what point would you deem it a good idea to start working on AGI safety?

What "threshold" would you want to cross before you think its socially acceptable to put resources behind ensuring that humanity doesn't wipe itself out?

The tricky thing with all of this is we have no idea what an appropriate timeline looks like. We might be 10 years away from the singularity, 1000 years, or it might never ever happen!

There is a non-zero chance that we are a few breakthroughs away from creating a technology that far surpasses the nuclear bomb in terms of destructive potential. These breakthroughs may have a short window of time between each of them (once we know a, knowing b,c,d will be much easier)

So given all of that, wouldn't it make sense to start working on these problems now? And the unfortunate part of working on these problems now is that you do need hype/buzzwords to attract tallent, raise money and get people talking about AGI safety. Sure it might not lead anywhere, but just like fire insurance might seem unnecessary if you never have a fire, AGI research may end up being a useless field altogether but at least it gives us that cushion of safety.

craigsmansion · on July 22, 2019

> At what point would you deem it a good idea to start working on AGI safety?

I don't know, but I'd say after a definition of "AGI" has been accepted that can be falsified against, and actually turn it into a scientific endeavour.

> The tricky thing with all of this is we have no idea what an appropriate timeline looks like.

We do. As things are it's undetermined, since we don't even know what's it's supposed to mean.

> So given all of that, wouldn't it make sense to start working on these problems now?

What problems? We can't even define the problems here with sufficient rigor. What's there to discuss?

Veedrac · on July 22, 2019

> I don't know, but I'd say after a definition of "AGI" has been accepted that can be falsified against, and actually turn it into a scientific endeavour.

Uhh, that's the Turing Test.

ryanmercer · on July 23, 2019

>What problems?

- Privacy (How do you get an artificial intelligence to recognize, and respect, privacy? What sources is it allowed to use, how must it handle data about individuals? About groups? When should it be allowed to violate/exploit privacy to achieve an objective?)

- Isolation (How much data do you allow it access to? How do you isolate it? What safety measures do you employ to make sure it is never given a connection to the internet where it could, in theory, spread itself not unlike a virus and gain incredibly more processing power as well as make itself effectively undestroyable? How do you prevent it from spreading in the wild and hijacking processing power for itself, leaving computers/phones/appliances/servers effectively useless to the human owners?)

- A kill switch (under what conditions is it acceptable to pull the plug? Do you bring in a cybernetic psychologist to treat it? Do you unplug it? Do you incinerate every last scrap of hardware it was on?)

- Sanity check/staying on mission (how do you diagnose it if it goes wonky? What do you do if it shows signs of 'turning' or going off task?

- Human agents (Who gets to interact with it? How do you monitor them? How do you make sure they aren't being offered bribes for giving it an internet connection or spreading it in the wild? How do you prevent a biotic operator from using it for personal gain while also using it for the company/societal task at hand? What is the maximum amount of time a human operator is allowed to work with the AI? What do you do if the AI shows preference for an individual and refuses to provide results without that individual in attendance? If a human operator is fired, quits or dies and it negatively impacts the AI what do you do?)

This is why I've said elsewhere in this thread, and told Sam Altman, that they need to bring in a team of people that specifically start thinking about these things and that only 10-20% of the people should be computer science/machine learning types.

OpenAI needs a team thinking about these things NOW, not after they've created an AGI or something reaching a decent approximation of one. They need someone figuring out a lot of this stuff for tools they are developing now. Had they told me "we're going to train software on millions of web pages, so that it can generate articles" I would have immediately screamed "PUMP THE BRAKES! Blackhat SEO, Russian web brigades, Internet Water Army, etc etc would immediately use this for negative purposes. Similarly people would use this to churn out massive amounts of semi-coherent content to flood Amazon's Kindle Unlimited, which pays per number of page reads from a pool fund, to rapidly make easy money." I would also have cautioned that it should only be trained on opt-in, vetted, content suggesting that using public domain literature, from a source like Project Gutenberg, would likely have been far safer than the open web.

kuzehanka · on July 22, 2019

Discussing the risks of AGI is always worthwhile and has been undertaken for several decades now. That's a bit different from the marketing fluff on the linked page:

"We’re partnering to develop a hardware and software platform within Microsoft Azure which will scale to AGI"

Azure needs a few more years just to un-shit the bed with what their marketing team has done and catch up to even basic AWS/GCP analytics offerings. Them talking about AGI is like a toddler talking about building a nuclear weapon. This is the same marketing team that destroyed any meaning behind terms like 'real time', and 'AI'.

nradov · on July 22, 2019

The proper threshold would be the demonstration of an AGI approximately as smart as a mouse. Until then it's just idle speculation. We don't even know the key parameters for having a productive discussion.

Veedrac · on July 22, 2019

This makes no sense. Mice can't write poetry. Expecting a 1:1 equivalence between human and manufactured intelligence is no more coherent than denying the possibility of human-bearing flight until we have planes as acrobatic as a hawk.

chrshawkes · on July 22, 2019

I certainly wouldn't have that threshold decided by a for profit company disguised as an open source initiative to protect the world! Brought to us by Silicon Valley darlings, no thank you to that. They need to change their name or their mission. One has to go.

windowshopping · on July 22, 2019

> There is a non-zero chance that we are a few breakthroughs away from creating a technology that far surpasses the nuclear bomb in terms of destructive potential.

No, there is exactly zero chance that anyone is "a few breakthroughs away" from AGI.

Veedrac · on July 22, 2019

I'm compiling a list of reasons people doubt AGI risk, could you clarify why you think AGI is certainly far term?

windowshopping · on July 23, 2019

I feel like I could write an essay about this.

AGI represents the creation of a mind .... It's something that has three chief characteristics: it understands the world around it, it understands what effects its actions will have on the world around it, and it takes actions.

None of those three things are even close to achievable in the present day.

No software understands the physical world. The knowledge gap here is IMMENSE. Software does not see what we see: it can be trained to recognize objects, but its understanding is shallow. Rotate those objects and it becomes confused. It doesn't understand what texture or color really are, what shapes really are, what darkness and light really are. Software can see the numerical values of pixels and observe patterns in them but it doesn't actually have any knowledge of what those patterns mean. And that's just a few points on the subject of vision, let alone all the other senses, all the world's complex perceivable properties. Software doesn't even know that there IS a world, because software doesn't KNOW anything! You can set some data into a data structure and run an algorithm on it, but there's no real similarity there to even a baby's ability to know that things fall when you drop them, that they fall in straight lines, that you can't pass through solid objects, that things don't move on their own, etc etc.

Even if, a century from now, some software did miraculously approach such an understanding, it still would not know how it was able to alter the world. It might know that it was able to move objects, or apply force to then, but could it see the downstream effects? Could it predict that adding tomatoes to a chocolate cake made no sense and rendered the cake inedible? Could it know that a television dropped out the window of an eight story building was dangerous to people on the sidewalk below? Could it know that folding a paper bag in half is not destructive, but folding a painting in half IS? Understanding what can result from different actions and why some are effective and others are not, is another vast chasm of a knowledge gap.

Lastly, and by FAR most importantly, the most essential thing.....software does not want. Every single thing we do as living creatures is because our consciousness drives us to want things: I want to type these words at this moment because I enjoy talking about this subject. I will leave soon because I want food and hunger is painful. Etc. If something does not feel pleasure or pain or any true sensation, it cannot want. And we have absolutely no idea how such a thing works, let alone how to create it, because we have next to no idea how our own minds work. Any software that felt nothing, would want nothing-- and so it would sit, inert, motionless...never bored, never curious, never tired, just like an instance of Excel or Chrome. Just a thing, not alive. No such entity could genuinely be described as AGI. We are likely centuries from being able to recreate our consciousness, our feelings and desires....how could someone ever be so naive as to believe it was right around the corner?

Veedrac · on July 24, 2019

Thanks.

Miraste · on July 22, 2019

OpenAI is a for-profit corporation now. It's in their interest to use as many buzzwords as possible to attract that sweet venture capital, regardless of whether said buzzwords have any base in reality.

sp332 · on July 22, 2019

Sentiment analysis is at 96% and increasing rapidly. http://nlpprogress.com/english/sentiment_analysis.html

Alas1 · on July 22, 2019

What exactly does that 96% mean, though? It means that on some fixed dataset you're achieving 96% accuracy. I'm baffled by this stupidity of claiming results (even high-profile researchers do this) based on datasets with models that are nowhere near as robust as the actual intelligence that we take as reference: humans. Take the model that makes you think "sentiment analysis is at 96%", come up with your own examples to apply a narrow Turing test to the model, and see if you still think sentiment analysis (or any NLP task) is anywhere near being solved. Also see: [1].

I think continual Turing testing is the only way of concluding whether an agent exhibits intelligence or not. Consider the philosophical problem of the existence of other minds. We believe other humans are intelligent because they consistently show intelligent behavior. Things that people claim to be examples of AI right now lack this consistency (possibly excluding a few very specific examples such as AlphaZero). It is quite annoying to see all these senior researchers along with graduate students spend so much time pushing numbers on those datasets without paying enough attention to the fact that pushing numbers is all they are doing.

[1]: As a concrete example, consider the textual entailment (TE) task. In the deep learning era of TE there are two commonly used datasets on which the current state-of-the-art has been claimed to be near or exceeding human performance. What these models are performing seemingly exceptionally well is not the general task of TE, it is the task of TE evaluated on these fixed datasets. A recent paper by McCoy, Pavlick, and Linzen (https://arxiv.org/abs/1902.01007) shows how brittle these systems are that at this point the only sensible response to those insistent on claiming we are nearing human performance in AI is to laugh.

Veedrac · on July 22, 2019

> I think continual Turing testing is the only way of concluding whether an agent exhibits intelligence or not.

So you think it's impossible to ever determine that a chimpanzee, or even a feral child, exhibits intelligence? This seems rather defeatist.

Alas1 · on July 22, 2019

No, interpreting "continual" the way you did would mean I should believe that we can't conclude our friends to be intelligent either (I don't believe that). Maybe I should've said "prolonged" rather than "continual".

Let me elaborate on my previous point with an example. If you look at the recent works in machine translation, you can see that the commonly used evaluation metric of BLEU is being improved upon at least every few months. What I argue is that it's stupid to look at this trend and conclude that soon we will reach human performance in machine translation. Even when comparing against the translation quality of humans (judged again by BLEU on a fixed evaluation set) and showing that we can achieve higher BLEU than humans is not enough evidence. Because you also have Google Translate (let's say it represents the state-of-the-art), and you can easily get it to make mistakes that humans would never do. I consider our prolonged interaction with Google Translate to be a narrow Turing test that we continually apply to it. A major issue in research is that, at least in supervised learning, we're evaluating on datasets that are not different enough from the training sets.

Another subtle point is that we have strong priors about the intelligence of biological beings. I don't feel the need to Turing test every single human I meet to determine whether they are intelligent, it's a safe bet at this point to just assume that they are. The output of a machine learning algorithm, on the other hand, is wildly unstable with respect to its input, and we have no solid evidence to assume that it exhibits consistent intelligent behavior and often it is easy to show that it doesn't.

I don't believe that research in AI is worthless, but I think it's not wise to keep digging in the same direction that we've been moving in for the past few years. With deep learning, while accuracies and metrics are pushed further than before, I don't think we're significantly closer to general, human-like AI. In fact, I personally consider only AlphaZero to be an unambiguous win for this era of AI research, and it's not even clear whether it should be called AI or not.

Veedrac · on July 22, 2019

My comment was not on ‘continual’ but on ‘Turing test’.

If you gave 100 chimps of the highest calibre 100 attempts each, not a single one would pass a single Turing test. Ask a feral child to translate even the most basic children's book, and their mistakes will be so systematic that Google Translate will look like professional discourse. ‘Humanlike mistakes’ and stability with respect to input in the sense you mean here are harder problems than intelligence, because a chimp is intelligent and functionally incapable of juggling more than the most primitive syntaxes in a restricted set of forms.

I agree it is foolish to just draw a trend line through a single weak measure and extrapolate to infinity, but the idea that no collation of weak measures has any bearing on fact rules out ever measuring weak or untrained intelligence. That is what I called defeatist.

Alas1 · on July 22, 2019

I see your point, but you're simply contesting the definition of intelligence that I assumed we were operating with, which is humanlike intelligence. Regardless of its extent, I think we would agree that intelligent behavior is consistent. My main point is that the current way we evaluate the artificial agents is not emphasizing their inconsistency.

Wikipedia defines Turing test as "a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human". If we want to consider chimps intelligent, then in that context the definition of the Turing test should be adjusted accordingly. My point still stands: if we want to determine whether a chimp exhibits intelligence comparable to a human, we do the original Turing test. If we want to determine whether a chimp exhibits chimplike intelligence, we test not for, say, natural language but for whatever we want our definition of intelligence to include. If we want to determine whether an artificial agent has chimplike intelligence, we do the second Turing test. Unless the agent can display as consistent an intelligence as chimps, we shouldn't conclude that it's intelligent.

Regarding your point on weak measures: If I can find an endless stream of cases of failure with respect to a measure that we care about improving, then whatever collation of weak measures we had should be null. Wouldn't you agree? I'm not against using weak measures to detect intelligence, but only as long as it's not trivial to generate failures. If a chimp displays an ability for abstract reasoning when I'm observing it in a cage but suddenly loses this ability once set free in a forest, it's not intelligent.

Veedrac · on July 22, 2019

I'm not interested in categorizing for the sake of categorizing, I'm interested in how AI researchers and those otherwise involved can get a measure of where they're at and where they can expect to be.

If AI researchers were growing neurons in vats and those neurons were displaying abilities on par with chimpanzees I'd want those researchers to be able to say ‘hold up, we might be getting close to par-human intelligence, let's make sure we do this right.’ And I want them to be able to do that even though their brains in vats can't pass a Turing test or write bad poetry or play basic Atari games and the naysayers around them continue to mock them for worrying when their brains in vats can't even pass a Turing test or write bad poetry or play basic Atari games.

Like, I don't particularly care that AI can't solve or even approach solving the Turing test now, because I already know it isn't human-par intelligent, and more data pointing that out tells me nothing about where we are and what's out of reach. All we really know is that we've been doing the real empirical work with fast computers for 20ish years now and gone from no results to many incredible results, and in the next 30 years our models are going to get vastly more sophisticated and probably four orders of magnitude larger.

Where does this end up? I don't know, but dismissing our measures of progress and improved generality with ‘nowhere near as robust as [...] humans’ is certainly not the way figure it out.

> If I can find an endless stream of cases of failure with respect to a measure that we care about improving, then whatever collation of weak measures we had should be null. Wouldn't you agree?

No? Isn't this obviously false? People can't multiply thousand-digit numbers in their heads; why should that in any way invalidate their other measures of intelligence?

Alas1 · on July 22, 2019

>no results to many incredible results

What exactly is incredible (relatively) about the current state of things? I don't know how up-to-date you are on research, but how can you be claiming that we had no results previously? This is the kind of ignorance of previous work that we should be avoiding. We had the same kind of results previously, only with lower numbers. I keep trying to explain that increasing the numbers is not going to get us there because the numbers are measuring the wrong thing. There are other things that we should also focus on improving.

>dismissing our measures of progress and improved generality with ‘nowhere near as robust as [...] humans’ is certainly not the way figure it out.

It is the way to save this field from wasting so much money and time on coming up with the next small tweak to get that 0.001 improvement in whatever number you're trying to increase. It is not a naive or spiteful dismissal of the measures, it is a critique of the measures since they should not be the primary goal. The majority of this community is mindlessly tweaking architectures in pursuit of publications. Standards of publication should be higher to discourage this kind of behavior. With this much money and manpower, it should be exploding in orthogonal directions instead. But that requires taste and vision, which are unfortunately rare.

>People can't multiply thousand-digit numbers in their heads; why should that in any way invalidate their other measures of intelligence?

Is rote multiplication a task that we're interested in achieving with AI? You say that you aren't interested categorizing for the sake of categorizing, but this is a counterexample for the sake of giving a counterexample. Avoiding this kind of an example is precisely why I said "a measure that we care about improving".

Veedrac · on July 23, 2019

> What exactly is incredible (relatively) about the current state of things?

Compared to 1999?

Watch https://www.youtube.com/watch?v=kSLJriaOumA

Hear https://audio-samples.github.io/#section-4

Read https://grover.allenai.org/

These are not just ‘increasing numbers’. These are fucking witchcraft, and if we didn't live in a world with 5 inch blocks of magical silicon that talk to us and giant tubes of aluminium that fly in the sky the average person would still have the sense to recognize it.

> It is the way to save this field from [...]

For us to have a productive conversation here you need to either respond to my criticisms of this line of argument or accept that it's wrong. Being disingenuous because you like what the argument would encourage if it were true doesn't help when your argument isn't true.

> Is rote multiplication a task that we're interested in achieving with AI?

It's a measure for which improvement would have meaningful positive impact on our ability to reason, so it's a measure we should wish to improve all else equal. Yes, it's marginal, yes, it's silly, that's the point: failure in one corner does not equate to failure in them all.

Alas1 · on July 24, 2019

>These are not just ‘increasing numbers’. These are fucking witchcraft, and if we didn't live in a world with 5 inch blocks of magical silicon that talk to us and giant tubes of aluminium that fly in the sky the average person would still have the sense to recognize it.

What about generative models is really AI, other than the fact that they rely on some similar ideas from machine learning that are found in actual AI applications? Yes, maybe to an average person these are witchcraft, but any advanced technology can appear that way---Deep Blue beating Kasparov probably was witchcraft to the uninitiated. This is curve fitting, and the same approaches in 1999 were also trying to fit curves, it's just that we can fit them way better than before right now. Even the exact methods that are used to produce your examples are not fundamentally new, they are just the same old ideas with the same old weaknesses. What we have right now is a huge hammer, and a hammer is surely useful, but not the only thing needed to build AI. Calling these witchcraft is a marketing move that we definitely don't need, creates unnecessary hype, and hides the simplicity and the naivete of the methods used in producing them. If anybody else reads this, these are just increasing numbers, not witchcraft. But as the numbers increase it requires a little more effort and knowledge to debunk them.

I'm not dismissing things for the fun of it, but it pains me to see this community waste so many resources in pursuit of a local minimum due to lack of a better sense of direction. I feel like not much more is to be gained from this conversation, although it was fun, and thank you for responding.

Veedrac · on July 25, 2019

I appreciate you're trying to wind it down so I'll try to get to the point, but there's a lot to unpack here.

I'm not evaluating these models on whether they are AGI, I am evaluating them on what they tell us about AGI in the future. They show that even tiny models, some 10000x to 1000000x times smaller than what I think are the comparable measures in the human brain, trained with incredibly simple single-pass methods, manage to extract semirobust and semantically meaningful structure from raw data, are able to operate on this data in semisophisticated ways, and do so vastly better than their size-comparable biological controls. I'm not looking for the human, I'm looking for small scale proofs of concepts of the principles we have good reasons to expect are required for AGI.

The curve fitting meme[1] has gotten popular recently, but it's no more accurate than calling Firefox ‘just symbols on the head of a tape’. Yes, at some level these systems reduce to hugely-dimensional mathematical curves, but the intuitions this brings are pretty much all wrong. I believe this meme has gained popularity due to adversarial examples, but those are typically misinterpreted[2]. If you can take a system trained to predict English text, prime it (not train it) with translations, and get nontrivial quality French-English translations, dismissing it as ‘just’ curve fitting is ‘just’ the noncentral fallacy.

Fundamental to this risk evaluation is the ‘simplicity and the naivete of the methods used in producing them’. That simple systems, at tiny scales, with only inexact analogies to the brain, based on research younger than the people working on it, is solving major blockers in what good heuristics predict AGI needs is a major indicator about the non-implausibility of AGI. AGI skeptics have their own heuristics instead, with reasons those heuristics should be hard, but when you calibrate with the only existence proof we have of AGI development—human evolution—, those heuristics are clearly and overtly bad heuristics that would have failed to trigger. Thus we should ignore them.

[1] Similar comments on ‘the same approaches in 1999’, another meme only true at the barest of surface levels. Scale up 1999 models and you get poor results.

[2] See http://gradientscience.org/adv/. I don't agree with everything they say, since I think the issue relates more to the NN's structure encoding the wrong priors, but that's an aside.

kuzehanka · on July 22, 2019

'Sentiment analysis' of pre-canned pre-labelled datasets is a comparatively trivial classification task. Actual sentiment analysis as in 'take the twitter firehose and figure out sentiment about arbitrary topic X' is only slightly less out of reach than AGI itself.

Actual sentiment analysis is a completely different kind of ML problem than supervised classification 'sentiment analysis' that's popular today but mostly useless for real world applications.

mcbits · on July 23, 2019

Not-actual sentiment analysis is already useful (to some) and used in real world applications (though I'm not a fan of those applications), unless perhaps you're referring to the "actual real world" that lives somewhere beyond the horizon as well.

kuzehanka · on July 24, 2019

The problem with 'sentiment analysis' of today is it requires a human labelled training dataset that is specific to a particular domain and time period. These are rather costly to make and have about a 12 month half-life in terms of accuracy because language surrounding any particular domain is always mutating - something 'sentiment analysis' models can't hope to handle because their ability to generalise is naught. I've worked with companies spending on the order of millions per year on producing training data for automated sentiment analysis models not unlike the ones in the parent post.

To get useful value out of automated sentiment analysis, that's the cost to build and maintain domain specific models. Pre-canned sentiment analysis models like the parent post linked are more often than not worthless for general purpose use. I won't say there are 0 scenarios where those models are useful, but the number is not high.

Claiming that sentiment analysis is 90something percent accurate, or even close to being solved, is extremely misleading.

pmontra · on July 22, 2019

$1B is not like investing $100 in a crowdfunded project, "nice toy and let's hope for the best." I expect that Microsoft is going to look very closely at what OpenAI does and possibly steer it into a direction they like. Unless you have a few other $1B investors. We'll see how it plays out.

AbrahamParangi · on July 22, 2019

Congrats to the team, excited to see what you guys do with the momentum

ryanmercer · on July 22, 2019

>(I work at OpenAI.)

Humble understatement. *co-founded and co-run.

streetcat1 · on July 22, 2019

So you are now part of azure, no?. Like - azure-open-ai?

summarity · on July 22, 2019

Do you know what "investment" means?

streetcat1 · on July 22, 2019

Yes. You get money and give control.

summarity · on July 22, 2019

Which is extremely well defined for OpenAI. One board seat, charter comes first. That's a legal agreement, not much open for interpretation.

streetcat1 · on July 22, 2019

Not quite true, azure is now the only channel by which open ai innovation can be commercialise. This is the key control point.

For example, if there is an hardware innovation which make DNN training 1000x faster (e.g. optical DNN), but it does not exist on azure, than by definition it cannot be offered on another cloud.

To sum up, this deal assure the choking point of azure/MS on any innovation that would come up from open ai.

OrgNet · on July 22, 2019

is all your code public?