That's the narrow narrative. The actual wide narrative is that the current langu...

abetlen · on Feb 14, 2023

I think you're focusing on a few narrow examples where LLMs are underperforming and generalising about the technology as a whole. This ignores the fact that Microsoft already has a succesful LLM-based product in the market with Github Copilot. It's a real tool (not a party-trick technology) that people actually pay for and use every day.

Search is one application, and it might be crap right now, but for Microsoft it only needs to provide incremental value, for Google it's life or death. Microsoft is still better positioned in both the enterprise (Azure, Office365, Teams) and developer (Github, VSCode) markets.

FractalHQ · on Feb 14, 2023

Copilot mostly spews distracting nonsense, but when it’s useful (like with repetitive boilerplate where it doesn’t have to “think” much) it’s really nice. But if that’s the bar, I don’t think were ready for something like search, which is much more difficult and important to get right for the average person to get more good than harm from it.

lolinder · on Feb 14, 2023

Few people seem to know this, but you can disable auto-suggest in Copilot, so it only suggests things when you proactively ask it to. I only prompt it when I know it will be helpful and it's a huge time saver when used that way.

PartiallyTyped · on Feb 14, 2023

Sometimes, Copilot is brilliant. I have encountered solutions that are miles better than anything i had found on the internet nor expected to find in the first place.

The issue involved heavy numerical computation with numpy, and it found a library call for that that covered exactly my issue.

lordnacho · on Feb 14, 2023

I've had similar experiences. Sometimes it just knows what you want and saves you a minute searching. Sometimes way more than a minute.

But I find it also hallucinates in code, coming up with function calls that aren't in the API but would sound like a natural thing to call.

Overall it's a positive though, it's pretty easy to tell for your other coding tools if the suggestion is for something made up, and the benefits of filling in your next little thought are very real.

2c2c2c · on Feb 15, 2023

do you consider things like extrapolating out the else half out of an if-else given the if half as boilerplate?

these tools are incredible productivity boosts if you leverage them well.

here's a sample from gpt: a low effort question and a code dump that would get you flamed on stackoverflow.

https://cdn.discordapp.com/attachments/263091858505334784/10...

I love it. As long as we continue to use these tools as augmentive, it's just going to get better and better

nightski · on Feb 14, 2023

Google's search results are pretty terrible. I actually have a hard time telling which is a result and which is an ad anymore tbh. I really don't think the bar is that high.

lopatin · on Feb 14, 2023

Maybe the internet is actually that terrible now, and Google is just the messenger?

jonathankoren · on Feb 14, 2023

The internet has been terrible since Yahoo dominated search.

In fact, it was the glut of SEO nonsense like keyword stuffing is that PageRank countered.

If Google search sucks, someone will make one that doesn’t suck, and people will switch.

minsc_and_boo · on Feb 14, 2023

Search still relies on content that doesn't suck though, and like GP said, if the internet today sucks, then the competing search will also suck.

PaulDavisThe1st · on Feb 15, 2023

The internet is fucking awesome, and has been for decades.

anothernewdude · on Feb 15, 2023

The profit incentive is for search to suck. Making it shitty is what brings in the money.

soiler · on Feb 14, 2023

The internet is terrible and Google is the reason.

soiler · on Feb 15, 2023

Ok everyone, enjoy your SEO spam

tdesilva · on Feb 14, 2023

That sounds like an endorsement of their ads platform?

snickerbockers · on Feb 14, 2023

>a few narrow examples

It's Microsoft's own advertisement.

dpflan · on Feb 14, 2023

That what I find so funny. Again, UX innovation over LLMs is what makes ChatGPT so hot right now, like Hansel, but I mean, the product is tragically flawed as like all LLMs at the moment.

azinman2 · on Feb 14, 2023

I believe that’s because people are using it wrong. Asking for facts is its weakness. Aiding creativity and more narrowly productivity (by way of common sense reasoning) is its greatest strength.

allturtles · on Feb 14, 2023

The product Microsoft is showing off is a fact-finding engine. Just look at the demo, they have built the AI model into the search experience and the demo shows it used exclusively to provide (supposedly) factual information [0]. It's not the users' fault that companies are building the wrong product.

[0]: https://www.youtube.com/watch?v=FLsr_sUVgrA

peyton · on Feb 15, 2023

“Give me a meal plan for next week for my family of four that has vegetarian options and caters to those who don’t like nuts”

“Create a summary of this itinerary in an email that I can send to my family”

“65-inch TV” -> refine with “Which is the best for gaming?”

Seems like more than a fact-finding engine.

azinman2 · on Feb 14, 2023

I don’t not include Microsoft here in the audience of “using it wrong.”

It’s also very tempting when you get coherent text out to believe it. Hopefully the underlying tech will get better and/or people will understand it’s weaknesses… except the inability to ascertain clear misinformation gives me pause.

saghm · on Feb 14, 2023

I think there's also potential value in being able to give feedback on results. If I try to search for something on Google right now and it doesn't give me what I want, my only options are to try a different query or give up. This puts the onus on me to learn how to "ask" properly. On the other hand, using something like ChatGPT and asking it a question gives me the option to tell it "no, you got this part wrong, try again". This isn't necessarily useful for all queries, but some queries might have answers that you can verify easily.

Over the weekend, I was shopping for laptops and tried searching "laptops AMD GPU at least 2560x1440 resolution at least 16 GB RAM", and of course Google gave all sorts of results that didn't fit those criteria. I could use quotes around "16 GB RAM", but then some useful results might get excluded (e.g. a table with "RAM" or even "Memory" in one column and "16 GB" in another, or a laptop with a higher resolution like 4K), and I'd still get many incorrect results (e.g. an Amazon page for a laptop with 1920x1080 resolution and then a different laptop in "similar options" with 2560x1440 resolution but an Nvidia GPU). I decided to try using ChatGPT to list me some laptops with those criteria; it immediately listed five correct models. I asked for five more, and it gave one correct option and four incorrect ones, but when I pointed out the mistakes and asked for 10 more results that did fit my criteria, it was able to correctly do this. Because I can easily verify externally if a given laptop fits my criteria or not, I'm not at risk of acting on false information. The only limitation is that ChatGPT currently won't search the internet and has data limited to 2021 and earlier. If it had access to current data, I think there would be a lot of places that it would be useful, especially given that it wouldn't necessarily replace existing search engines, but complement them.

azinman2 · on Feb 14, 2023

I would argue this would be better done but google or someone else specializing faceted search over structured data. GPT may smooth over results that are coded as near misses (eg USB vs USB3),but as you said it gave you nearly half wrong incorrect data. There are also ways with toolformer it could call the right APIs and maybe interpret the data, but as is LLMs aren’t the right tech to fetch data from like this.

bhawks · on Feb 15, 2023

Most of OPs dissatisfaction with shopping for laptops on Google stem from query understanding fails. Google needs to understand that the challenge is for them to evolve their UX so that users can intuitively tune their searches in a natural manner.

A pure LLM approach is going to quickly lose its novelty as reprompting is frustrating and it is difficult to maintain a long lived conversation context that actually 'learns' how to interact with the user (as a person would).

saghm · on Feb 15, 2023

Maybe, but I think the point of all this discussion is that Google _hasn't_ done something like this. It's not an unreasonable take that their lack of progress on this front is exactly why solutions like this are noticeable improvements in the first place. Sure, Bing AI isn't better than Google with ChatGPT, but the fact that it's a discussion at all is a sign of how far Google as fallen; if we're setting the bar at the same place for both Microsoft and Google for search products, then Google has already lost their lead, and that's a story on its own.

dpflan · on Feb 14, 2023

Agree, areas where correctness can be sub 90%. When a BA is making business docs, do they want creativity in querying factual data for a report they are creating? How tempting is it to not just use it for "creative" tasks?

brookst · on Feb 15, 2023

Have you used it?

onion2k · on Feb 14, 2023

Microsoft have the advantage that people saw GPT2, GPT3, and ChatGPT and how those models progressed and improved. Bard is Google's first public AI product so it looks like GPT2 while Microsoft are teasing at GPT4. People will assume that Google are a long way off fixing the accuracy problem because there isn't any trajectory or iteration, while they believe Microsoft will crack it quite soon because they've already seen how things can change.

There's a lesson for founders in this. If you develop in secret and try to launch a perfect product then anything less that perfect is unforgivable. If you launch early with something that has obvious problems people will forgive them because they see the potential and trust you to fix them.

stonemetal12 · on Feb 14, 2023

>people will forgive them because they see the potential and trust you to fix them.

That seems very optimistic to me. Having seen Siri, Google Assistant, Cortana, and Alexa I trust that changes will be made, some of them will even be positive, but generally a net negative until they are completely irrelevant.

Notice how neither of these announcements mention their digital assistants getting an upgrade to be less garbage.

saberience · on Feb 15, 2023

I think your "wide" narrative is actually still a very narrow narrative.

The actual wide narrative is this: yes the models lie and hallucinate, but people are realizing now that this is essentially what every human AND website currently does now! Every human presents their "facts" and "viewpoints" as though they know the whole truth but really they are just parroting whatever talking points they got from BBC or TheGuardian or Fox News, and all of those journalists are just using other sources with their own biases and inaccuracies. Basically, it's bullshit and inaccuracies all the way down!

I was chatting with my friends when out for dinner last night about ChatGPT and we concluded that while it does have inaccuracies, it's still better than asking humans for information and still better then the average Google SEO-Spam website. That is, what makes us think random human made website about say space travel is more or less accurate than ChatGPT or what our friend Bob thinks about space travel.

The truth is, most of the information we receive on a daily basis is inaccurate or hallucinated to some degree, we just have gotten "used" to taking whatever the BBC or Bloomberg or ArsTechnica says as "the truth."

gen220 · on Feb 15, 2023

I'd strongly disagree with the idea that ChatGPT is essentially as trustworthy as humans and human-generated content because humans occasionally bullshit and misrepresent reality.

You can rationalize what people are saying based on their experiences, opinions, and backgrounds. You can engage in the Socratic method with people, to unpack where their claims come from and get to the grounding "truth" of primary experience.

You can't do any of these things with ChatGPT, because ChatGPT isn't grounded – it goes in circles at a level of abstraction where truth doesn't exist.

EGreg · on Feb 14, 2023

The reality, as usual, is that Google is far, far ahead of the other companies. Just like Waymo is ahead of Tesla, and DeepMind is ahead of OpenAI by miles...:

https://www.youtube.com/watch?v=0QEDBEdL7HY

They even have a far more advanced language model, they just don't release it publicly. It's scary how far Google is ahead, with its army of Ph. D's. They're the ones that pioneered the papers and techniques that OpenAI used -- but they did it 5 years ago.

This is just Google being Google ... they sunsetted Reader when it was popular, went through like 20 different chat products (GMail Chat, Hangouts, Google Meet, etc. etc.) and cannibalized their own projects. But as far as technology, they've got AI they're not disclosing to the world yet.

saberience · on Feb 15, 2023

I'm not sure if you're being serious or not, but I'm going to assume serious and write a serious response.

Having PhDs or doing research does not equal the ability to create products people want to use and/or pay for. History has shown us time and time again there are some people who are amazing at creating original and groundbreaking research and other people who are amazing at turning research into money making, people-pleasing products. See all the research that came out of Xerox PARC which ended up doing nothing for Xerox and everything for Apple (and other companies).

Google has been spending a fortune on research in AI for 15+ years and, if anything, the company's main product (Search) has only gotten worse! They have been second, third best, or moribund in mobile phones, cloud computing, videogames, social media, and many others I've forgotten.

Now I'm not sure what the moral of the story here is but I can say it definitely isn't that doing the most research equals success because it clearly isn't that! I'd say it's probably a culture issue and also a motivation issue (which are clearly related). You're sitting on a money printing machine, employees all earning 350k+ per year, in an office filled with bean-bags, gourmet food, and living a chill life with nice and comfortable working hours... where is the motivation and drive to try and build a really innovative and amazing new product? "Sounds like a lot of work man..." It surprises me little that OpenAI beat them to the punch.

dovin · on Feb 14, 2023

Even if those benchmarks showing their models are X% above SoTA actually translate into qualitatively significant improvements (which they probably would), Google still has the most to lose even if they do release widely and little to gain. Their best case outcome is to not lose any users, and maybe gain back a few % of the users they've lost in recent years. Search result quality has been declining noticeably for a while now and users want something different.

refulgentis · on Feb 14, 2023

I’m bearish on Google but you’re correct and shouldn’t be in light gray text. Bard is a smaller version of Lamda, not PALM, so we know for sure they had a much more advanced model some time ago

yellow_postit · on Feb 14, 2023

If google can’t get things out of the lab into products that accrue value to their business they’ll head the way of Xerox PARC. A legacy of research innovations that others successfully capitalized on. For many that may be a laudable end goal. For shareholders though it’s probably a tough pill to swallow.

nr2x · on Feb 15, 2023

Innovators dilemma. Search has huge margins because it is cheap to run. LLMs are not. Google gets 80% of revenue from search, Microsoft is forcing them to put a dent in those margins and laughing their asses off. Or as Nadella said in an interview “we made Google dance”.

nr2x · on Feb 15, 2023

Google has the same number of people working for them that wrote the big LLM paper as OpenAI: exactly one. Almost all of them left a while ago.

adastra22 · on Feb 14, 2023

There is a project called Stable Attribution which can tell you what training set sources were used for generating your image. The same tech applied to chatgpt results would let it operate like a traditional search engine (and make it easier to filter out hallucinated factoids or citations)

simonw · on Feb 14, 2023

Stable Attribution is highly misleading: it does NOT tell you which images in the training set were used to generate your image. It shows you images in the training set that are most visually similar to the image that you show it.

adastra22 · on Feb 14, 2023

Yes, that is extremely misleading. Thank you for the correction!

PaulDavisThe1st · on Feb 14, 2023

Can SA prevent the generation of 8 fingered hands?

simonw · on Feb 14, 2023

Stable Diffusion 2 supports negative prompt weights, and amusingly you can give it a negative prompt of "weird looking hands" and it will generate much better hands!

dgivney · on Feb 15, 2023

Move fast and hallucinate things.

brookst · on Feb 15, 2023

How is Bing Chat citing sources with links not a plan to address the hallucination problem?

It’s not a perfect mitigation, but surely it’s at least one step above “no plan”.

PaulDavisThe1st · on Feb 15, 2023

First of all, the "citing sources" needs to be out-of-band from the regular response, since we've already seen that these systems are entirely comfortable inventing ficticious sources.

Second, if I give you a wonderfully written paragraph or two about the history of the British Parliament, and 3 links that supposedly back me up, how likely are you to check the links? Because that's what is actually going to happen. The LM will not "cite" a source, it will provide one out-of-band and your motivation to read it will need to be high, which is unlikely given the apparent quality of the LM's own answer.

brookst · on Feb 15, 2023

You seem to be talking about chatgpt, not bing chat. Bing chat literally uses the search engine queries and links to those sources. I have seen its summaries include mistakes, but I have never seen it invent sources (I’ve tried maybe 500 chat queries).

It’s ironic that you’re very confidently presenting erroneous information here. I’d really recommend trying the actual product, or at least looking at the demos. It has some problems. It does not have the same problems that chatgpt does, because it does not rely solely on LLM baked-in data.

dathinab · on Feb 14, 2023

they will because like so offen perfection and accidental consequences matter less then "have the new tech" for both the companies and parts of the users

most important while both AIs get sometimes very important and fundamental things wrong they do get enough right to be of help in some tasks to be of a lot of help

PaulDavisThe1st · on Feb 14, 2023

> most important while both AIs get sometimes very important and fundamental things wrong they do get enough right to be of help in some tasks to be of a lot of help

that's only true if you can easily and cheaply identify when they are correct. But at this time, and for the foreseeable future, that's not the case.

dathinab · on Feb 14, 2023

thats also true if the fallout from wrong usages is less in cost then the savings, which for huge coperations is often the case for situations where it shouldn't

also you can't just use this tech to "get the truth" but to generate things in which it sometimes is rather simple to identify and fix mistakes, and where due to human error you anyway have a "identity and fix mistakes step"

have most of this kind of usages likely negative impact on society? Sure. But for adoption that doesn't matter because "negative impact on society" doesn't the main adopters money, or at least not more then they make from it

PaulDavisThe1st · on Feb 15, 2023

I have never found the "big corporations will do <X> so we should all just prepare to suck it" a very convincing argument.

dathinab · on Feb 15, 2023

I never said we should or that's good.

I said it will happen because you can make/save money with it.

And it's not just big cooperation, it's already good enough for many applications of all kinds and sizes of companies to be used from a financial point of view.

And "it makes money so people will do it" is a truth we have to live with as long as we don't fundamentally overhaul our economic system.

gardenhedge · on Feb 14, 2023

narrow narrative, wide narrative? what is that?

ramesh31 · on Feb 14, 2023

>The actual wide narrative is that the current language models hallucinate and lie

So do people. And ChatGPT is a whole lot smarter than most people I know. It's funny that we've blown the Turing test out of the water at this point, and people are still claiming it's not enough.

tensor · on Feb 14, 2023

The comparison is not against "most people" though. When we search the web we usually want an answer from an "expert" not some random internet poster. If you compare ChatGPT even to something like WebMD, well, I'll trust the latter over ChatGPT in an instant.

It's no better for other domains either. It can give programming advice, but it's often wrong in important ways and so no, I'd rather have the answer verified by an actual developer who knows whatever technology I'm asking about.

And finally, when you talk about what is "enough", I'd ask "for what?" This is what people in this thread are saying. That ChatGPT is not enough for the majority of what people wish it to be, but it may be enough for some tasks such as a creative writing aid or other human in the loop tasks.

nr2x · on Feb 15, 2023

Honestly you need to learn a bit more about WebMD in that case. I trust a schizophrenic screaming at a bus more.

PaulDavisThe1st · on Feb 14, 2023

Also, ChatGPT isn't smart. It is very good at stringing words together, a facility which, when over-developed in humans, is not generally termed "smart". We reserve that sort of term for the capacity to reason, something ChatGPT has absolutely no capability for.

Workaccount2 · on Feb 14, 2023

Can you show that humans aren't just language models?

astrange · on Feb 14, 2023

Humans are agents who eat food and have memories from before their last conversation, so yes.

nr2x · on Feb 15, 2023

For now.

astrange · on Feb 15, 2023

If you get turned into a computer but have to pay your own AWS bill, you're still an agent.

PaulDavisThe1st · on Feb 15, 2023

TBH, I think it is likely that quite a lot of human speech behavior is LM-like in terms of its internal "implementation".

But humans do so much more than that, as the other reply indicates.

PaulDavisThe1st · on Feb 14, 2023

The Turing test was never intended to be a determination of "should I use this computer program to help guide me through the world's knowledge".

CrazyStat · on Feb 14, 2023

As of a week ago, which was the last time I tested, ChatGPT still says 1023^2 is even about half the time.

ChatGPT is good at generating text that sounds smart. It is not smarter than most people you know.

busyant · on Feb 14, 2023

> It is not smarter than most people you know.

You and I live in a rarefied world. It's far better at writing than most people I know. and i suspect it's (sadly) better at logic/ math than many people, which, i fully admit isn't saying much.

HDThoreaun · on Feb 14, 2023

Most people I know would have no idea whether 1023^2 is even or odd for what its worth.

doesnt_know · on Feb 14, 2023

They are being integrated into the most widely used information retrieval systems (search engines). It's not enough that they are "smarter then most people", they have to always be correct when the question asked of them has a definitive answer otherwise they are just another dangerous avenue for misinformation.

Yes, not all questions have definitive answers, which is fine, then you can argue that they are better then going to the smartest human you know and that might be enough. Although I personally would still disagree with this argument, since I think it's better that the answer provided is "I don't know".

qu4z-2 · on Feb 15, 2023

We have not blown the Turing Test out of the water. I guarantee you that out of two conversations, I can tell which one is ChatGPT and which is human 95%+ of the time. (even leaving aside cheap tricks like asking about sensitive topics and getting the "I am a bot!" response)

adastra22 · on Feb 14, 2023

The one thing people are universally good at is shifting goal posts.

PaulDavisThe1st · on Feb 14, 2023

The Turing test was originated in the 1950s. The goal posts haven't moved much in 70 years. The development of these new language models is revealing that, as impressive as the models are at generating language, it is possible that the Turing test was mis-conceived if the goal was to identify AGI.

At the time, it was inconceivable that a program could interact the way that ChatGPT does (or the way that Dall-E does) without AGI. We now know that this is not the case, and that means that it might finally be time to recognize that the Turing test, while a brilliant idea at the time, doesn't actually differentiate in the way that we want to.

70 years without moving the goal posts is, frankly, pretty good.

adastra22 · on Feb 14, 2023

Au contraire, the whole history of AI is one of moving goal posts. One professor I worked with quipped that a field is called AI only so long as it remains unsolved.

Logic arguments and geometric analogies were once considered the epitome of human thinking. They were the first to fall. Computer vision, expert systems, complex robotic systems, and automated planning and scheduling were all Turing-hard problems at some point. Even Turing thought that Chess was a domain which required human intellect to master, until DeepMind. Then it was assumed Go would be different. Even in the realm of chat bots, Eliza successfully passed the Turing test when it was first released. Most people who interacted with it could not believe that there was a simple algorithm underlying its behavior.

PaulDavisThe1st · on Feb 15, 2023

> One professor I worked with quipped that a field is called AI only so long as it remains unsolved.

Not just one professor you worked with, this has been a common observation across the field for decades.

But the deeper debate about this is absolutely not about moving goal posts, it is about research revealing that our intuitions were (and thus likely still are) wrong. People thought that very conscious, high-cognition tasks like playing chess likely represented the high water mark of "intelligence". They turned out to be wrong. Ditto for other similar tasks.

There have been people in the AI field as long as I've been reading pop-sci articles and books about who have cautioned about these sorts of beliefs, but they've generally been ignored in favor of "<new approach> will get us to AGI!". It didn't happen for "expert systems", it didn't happen for the first round of neural nets, it didn't happen for the game playing systems, it didn't happen for the schedulers and route creators.

The critical thing that has been absent from all the high-achieving approaches to AI (or some subset of it) thus far is that the systems do not have a generalized capacity for learning (both cognitive learning and proprioceptive learning). We've been able to build systems that are extremely good at a task; we have failed (thus far) at building systems which start out with limited abilities and grow (exponentially, if you want to compare it with humans and other animals) from there. Some left-field AI folks would also say that the lack of embodiment hampers progress towards AGI, because actual human/animal intelligence is almost always situated in a physical context, and that for humans in particular, we manipulate that context ahead of time to alter the cognitive demands we will face.

Also, most people do not accept that Eliza passed the Turing test. The program was a good model of a Rogerian psychotherapist, but could not engage in generalized conversation (without sounding like a relentlessly monofocal Rogerian psychotherapist, to a degree that was obviously non-human). The program did "fool" people into feeling that they were talking to a person, but in a highly constrained context, which violates the premise of the Turing test.

Anyway, as is clear, I don't think that we've moved the goal posts. It's just that some hyperactive boys (and they've nearly all been boys) got over-excited about computer systems capable doing frontal lobe tasks and forgot about the overall goal (which might be OK, if they did not make such outlandish claims).