Hacker News new | past | comments | ask | show | jobs | submit login

If I am trying to interact with a company and they direct me to their chatbot, I expect that chatbot to provide me with accurate answers 100% of the time (or to say it can't help me in the event that I ask a question that it's not meant to solve, and connect me to a representative who can).

If I have to double-triple check elsewhere to make sure that the chatbot is correct, or if anything the chatbot tells me is non-binding, then what's the point of using the chat bot in the first place? If you can't trust it 99% of the time, or if the company says "use this, but nothing it says should be taken as fact", then why would i waste my time?

If a company is going to provide a tool, they should take responsibility for that tool.




Yes, I think people underestimate the amount of imagined LLM use cases that require accurate responses. To the point that hallucinations will cost money in fines & lawsuits.

This is a new frontier in short sighted customer service staffing (non-staffing in this case). The people who are on the frontline communicating with customers can convert unhappy customers to repeat customers, or into ex-customers. There's a few brands I won't buy from again after having to jump through too many hoops to get (bad) warranty service.


It's not like human call center staff has never given anyone wrong information, or cost companies money in fines and lawsuits.

The bar LLMs have to clear to beat the average front line support operations isn't that high, as your own experience shows. And compared to a large force of badly paid humans with high turnover, LLMs are pretty consistent and easy to train to an adequate level.

They won't beat great costumer support agents, but most companies don't have many of those


>It's not like human call center staff has never given anyone wrong information, or cost companies money in fines and lawsuit

A human will be more likely to say "I don't know" or pass you along, rather than outright lie.


I find it common for human customer support people to give inaccurate information. I don't know about "outright lying", but I've had people tell me things that are factually incorrect.


Depends. Saying "thing X should not fail" is factually incorrect, when you called because thing X failed.

However I would not expect an airline customer support to make up a completely fictional flight that has never existed. Maybe they could confuse flights or read a number wrong, but making one up?


Humans won't fabricate too much but when confronted with yes/no questions and they have a 50-50 shot of being right and any blowback will likely be on someone else....they'll answer whatever to get you out of their hair.

Case in point, I asked my bank if they had any FX conversion fees or markup. Guy said no. I asked if there was any markup on the spread. Said no. Guess what? They absolutely mark up that spread. Their exchange rates are terrible. Just because there isn't a line-item with a fee listed doesn't mean there isn't a hidden fee in there. He's either incompetent or a liar.


Maybe... Over two decades ago (holy crap, I'm old), I used to work in the call center for a major airline. I was the guy you got when you complained to the 1st level rep that you wanted their manager, then you got sent to the 2nd level rep and wanted their manager. And that was me.

90% of my job was undoing and compensating passengers for the incorrect information either the phone agent or gate agent gave them. The other 10% was dealing with workarounds to technical issues in our booking software.


> It's not like human call center staff has never given anyone wrong information, or cost companies money in fines and lawsuits.

If a company representative told me in writing (perhaps via email support) that I could claim a refund retroactively, and that turned out to not be their policy, I would still expect the company to honor what I was told in the email.

Phone calls are difficult more because there is no record of what was said. But if I had a recording of the phone call... I'm not actually sure what I would expect to happen. It's just so socially unusual to record phone calls.


> It's just so socially unusual to record phone calls.

Is it? I can not remember the last time I called some business where I did not get a “this call may be monitored or recorded for quality and training purposes…”. whatever perceived social hangups the company had they got over them and you don’t even need to ask in a 2PC jurisdiction, it’s already taken care of, just record the call.


Which implies most humans calling any form of customer service should probably have a local audio recording, because otherwise you're in a lopsided witness situation. Don't want there to be a recording? That call just happened to not be recorded. Notably, this is getting difficult anyway with how easy it is to manufacture vocal data.


Sure, but there is such a thing as “the human element”. Humans aren’t perfect, and that is the expectation. That is not the same case with computers.

And especially for something where it’s just pulling data from an internal system. There is absolutely no reason to invent made up information and saying “well humans do it all the time” is just an excuse.


Yes, further, expectations wise..

On the phone with a customer service rep, I might understand a little wishy washy answer, slip of the tongue or slightly inaccurate statement. I've never really had a rep lie to me, usually its just I don't knows & them escalation as needed.

There is something about the written word from a company that makes it feel more like "binding statement".


It's still way too easy to send LLMs into a complete tangent of rambling incoherently, opening yourself up to the LLM making written statements to customers you really don't want.

I recently asked some LLMs "How many gallons in a mile?" and got some very verbose answers, which turned into feats of short story short stories when I refined to "How many gallons of milk in a mile?"


Only because the models have seemingly only been trained on generating text that matches a prompt, ie prompt completion. Rather than knowledge retrieval/parsing/organisation.

If part of the training was to only use knowledge sourced from a vector db and that it is allowed to use its trained knowledge only for grammar rules, phrasing or rewriting information then I think it would do a lot better.

Doesn't seem like many models are trained on prompts like "Question Q"->"[no data] I'm sorry but I don't know that" = accepted during training.

This would help immensely for not just for chatbots but for personal use too. I don't want my LLM assistant to invent a trip to Mars when I ask it "what do I have to do today" and my calendar happens to be empty.


I just tried the latter with gab’s AI and it was excellent.


The bar isn't even that high.

They only need to increase the lawsuit/settlement amount by less than the amount the companies saves by automation.


To me that’s totally fine. I don’t even particularly care whether the LLM is better or not. The only thing they really matters is if you are gonna use that LLM, when it inevitably messes up you don’t get to claim that it wasn’t your fault because computers are hard and AI is an emerging field. IDGAF. Pay up. The fact that you dabble in emerging technologies doesn’t give you any excuse to provide lesser services


Right whether you employ a person or some software which lies, the liability should be the same.


I think you're underestimating the quality of customer support. People are going to be out there testing every flaw in the support system, staffed or unstaffed. LLMs have no hope.


>that hallucinations will cost money in fines & lawsuits.

Sure. They are now out about $600. They probably already laid off 500+ customer service jobs costing conservatively 30k a year each. Not including mgmt,training,health,ect. I don't think it will make a difference to the ivory tower C levels. We will just all get used to a once again lower quality help/product. Another great "enshitification" wave of the future with "AI"

It also assumes that the customer service people dont make mistakes at a similar level anyway.

Another "new normal" How come anything that is "new normal" is never good?


> Another "new normal" How come anything that is "new normal" is never good?

If it allows them to reduce costs (and there's enough competition to force them to pass that on as reduced prices), I'm fairly happy with a new normal.

See also how air travel in general used to be a lot more glamorous, but also a lot more expensive.


> and there's enough competition to force them to pass that on as reduced prices

i found the bug.


Cynicism aside, air travel is one of the industries with pretty healthy levels of competition. (At least in Europe and South East Asia. I haven't spent much time in North America, so can't judge the market there.)

People love to hate eg RyanAir, but their effect on prices is felt throughout the industry; even if you never take a single RyanAir flight.


Yeah they pass those cost saving right onto record corporate profits for the last 20 years...


Huh? Airlines are notorious for being bad for investors.

(And even without looking up any data, I find your 'record profits for the last 20 years' hard to square with my memories of covid.)

EDIT: I tried to find some indices for airlines. The closest I found was https://finance.yahoo.com/quote/JETS/performance/ which didn't exactly have a stellar performance.

So I'm not sure where you get your claim from?


I wasn't referring to airlines specifically. I see how I was unclear now. We are in a decade+ era of record corporate profits yet incomes are stagnate and costs are rising.

Airlines are weird. I think warren buffet said something about airlines being the most complicated way to guarantee losing money as a business or something like that once.


Depends how long it takes to get the 600$ and whether you need to be a customer to get it. I know many people who would happily ask for that money once a week.


With RAG it's entirely possible to essentially eliminate 100% of hallucinations, given you are ok with responding with "I don't know" once in a while. These situations are likely coming from poorly implemented chatbot, or they decided that "I dont know" was not acceptable, and really that should be a queue to send you to a real human.


This claim seems wildly inaccurate, as even with GPT-4 in a single conversation thread with previous human-written answers included, a repeat of a similar question just resulted - in my testing today - in a completely hallucinated answer.

I think your claim might be based on anecdotal testing. (I used to have that same feeling after my first implementation of RAG)... Once you get a few thousand users running RAG-based conversations, you quickly see that it's "good enough to be useful", but far from being as dreamy as promised.


There are no guarantees with RAG either, and RAG only works when the answer to the question is already printed out somewhere explicitly in the text, otherwise it’s definitely prone to hallucinate


Yeah, RAG can't provide such guarantees. Moreover, even if the correct answer is printed somewhere, LLM+RAG still may produce wrong answer. Example from MS Copilot with GPT-4: https://sl.bing.net/ct6wwRjzkPc It claims that OnePlus 6 has 6.4-inch display, but all linked pages actually claim that it's 6.28. Display resolution and aspect ratio are also wrong in the response.


It's funny it seems to have a lot of trouble extracting tabular data, which arguably is one of the things I hear people trying to do with it..


do the people managing the chatbot know that though?

this shit gets sold as a way to replace employees with, essentially, just the middle manager that was over them, who is now responsible for managing the chatbot instead of managing people

while managers are often actually not great at people management, it's at least a somewhat intuitive skill for many. interacting with and directing other humans is something that many people are able to gain experience with outside of work, since it's a necessary life skill unless you're a hermit. furthermore, as a hedge against managerial ineptitude, humans are adaptable creatures that can recognize their manager's shortcomings and determine when and how to work around them to actually get the job done

understanding the intricacies training a machine learning system is a highly specialized and technical skill that nobody is going to pick up base knowledge for in the regular course of life. the skill floor for the average person tasked with it will be much lower than that of people management, and they will probably fuck up, a lot

the onus is ostensibly on AI system vendors to make their systems idiot-proof, but how many vendors actually do so past the point of "looks good enough to close the sale in a demo"? designing such a system is _incredibly_ hard, and the unfortunate reality is that if you try, you'll lose sales to snake oil salesmen who are content to push hokum trash with a fancy coat of paint.

these systems can work as a force multiplier in the hands of the capable, but work as an incompetence magnifier in the hands of the incapable, and there are plenty of dunning-krugerites lusting to magnify their incompetence


Well if incompetence is cheaper to implement, out come the lawyers to erase whatever perceived savings there were.


The fines and lawsuits may be way cheaper than human staff.


Especially once we have ai lawyers ;)


This initially sounded pretty good until I thought it through. Democratizing access to council and forcing troll lawyers to deal with trolling bots seems good but it will shape up like other spam arms races while legal systems gear up to deal with the ddos attacks. Good for spammers and most entrenched players, bad for the public at large.

Already we can’t manage to prosecute ex presidents in a timely manner before the next election cycle. If delays seem absurd now what will it be like when anything and everything remotely legal takes 10+ years and already sky-high costs triple?


Don't worry, AlphaJudge will provide swift justice at scale.


Judge, I'll refer you to my legal chatbot. I rest my case.


Your selection of generative counsel has been confirmed.

Please do not navigate away from this page while the trial runs. You will receive a notification when the verdict has been reached. This may take up to a minute.


Thank you for the best 2 paragraph cyberpunk story.


> then what's the point of using the chat bot in the first place?

The point is quite literally to make you give up trying to contact customer service and just pay them money, while getting their legal obligations as close to a heads-I-win, tails-you-lose situation as possible. That's not the mysterious part. The mysterious part is, why did they even let this drag into court for such a small sum?!


> "The mysterious part is, why did they even let this drag into court for such a small sum?!"

Because most people wouldn't bother taking it to court.

If they rolled over and paid up every time their chatbot made a mistake, that gets expensive, and teaches customers that they can easily be compensated if the chatbot screws up.

If they fight it tooth and nail and drag it all the way to court, it teaches customers that pursuing minor mistakes is personally painful and probably not worth it.

Scorched-earth defense tactics can be effective at deterring anyone from seeking redress.

It's the same fundamental reason why customer support is so hard to reach for many companies - if you make it painful enough maybe the customer will just not bother. A valuable tactic if your company imagines customers as annoying fleshy cash dispensers that talk too much. Having flown many times with Air Canada I can confirm that they do seem to perceive their passengers as annoying cash dispensers.


> Because most people wouldn't bother taking it to court.

Wait, couldn't they have tried to settle as soon as they realized it was actually going to court? I thought that was the modus operandi in the US... is it not a thing in Canada?


Well...they lost, and now it made the news. Are they going to keep the chatbot? Is the judge going to so lenient next time, now that there's precedent of wrongdoing?


To disincentivize anyone from calling them on their be in the future. If I know I will have to drag them through court just to get even a low payout I will be less likely to fight as its not worth the hassle.


Unfortunately, my impression is that human customer support often works just as well as a current-generation chatbot: They'll tell you what you want to hear, because they get rated by customer satisfaction. You get the survey, indicate that your request was resolved to your satisfaction, the agent gets their bonus... and a week later you realize everything you have been told was a lie.

This got so bad that when a customer support agent at Amazon genuinely resolved my issue well once, I was surprised that it actually worked out as promised.


Really depends on the company. Generally, for high quality on shore call centres, you do not use customer satisfaction as a metric for individual agents.

You’d use first contact resolution, average handle time, and their ability to stick to the flow they’re meant to (like transferring the customer to a survey after the call).

Like you say, satisfaction encourages lies. Much like sales commissions.


> If I am trying to interact with a company and they direct me to their chatbot, I expect that chatbot to provide me with accurate answers 100% of the time (or to say it can't help me in the event that I ask a question that it's not meant to solve, and connect me to a representative who can).

If I'm trying to interact with a company and they direct me to a chatbot, I expect to get useful help 0% of the time, because if help was available via a mechanism on their site I would already have found it. I expect a chatbot to stall me as long as possible before either conceding that I need a human's assistance or telling me some further hoop to jump through to reach a real human.


Honestly, that's pretty similar to dealing with front-line level 1 human support.


I have a slightly higher expectation that first-line tech support can solve my problem if the problem is "you really should have had a self-service way to do this on your website but I'm sure you have a tool for this".

And if that isn't the case, I've mostly found that contrary to stereotype, many first-line tech support people are not such rote script-followers that they can't deal with skipping most of the script when the problem is obviously on their end and going to need real human intervention.


This is true of all LLMs: you cannot trust a single thing they say. Everything needs to be checked - from airline fee information to code.

I expect we'll see this sort of thing a lot more in the future, and probably a bit of a subsequent reversal of all of the sackings of humans once the issues (... and legal liability!) becomes clearer to people.


My internet went down and I could only get a chat bot on the Web site or a hang up on the support line.

After the "estimated fix by ETA" came and went, I reported my ISP to the FCC. That resulted in a quick follow up from a real human.


Absolutely. That’s the big problem with the race to shoehorn generative AI into everything. If it has to be right the tools aren’t good enough yet.


> If you can't trust it 99% of the time

A chatbot should be either 100% or 0%. Companies should not replace humans with faulty technology.


Agree there. I put 99% as even human reps sometimes get it wrong, but in my experience whenever a human agent has made a mistake and relayed wrong info, the company would take appropriate steps to meet me at least half way.


Would this situation have been handled differently if a human support rep gave them incorrect information? I suspect they would have honored it and then put the rep (or all reps) through more training.

Another thought experiment: If a portion of the company's website was at least partially generated with an LLM, does that somehow absolve the company of responsibility for the content they have on their own site?

I think a company is free to present information to their customers that is less than 100% accurate -- whether by having chatbots or by doing something else silly like having untrained, poorly-paid support reps -- but they have to live with the risks (being liable for mistakes; alienating customers) to get the benefits (low operating cost).


I would say meet or beat human custom support agent accuracy, 100% is in many case not acheivable for machine or human.


then you can't have a chatbot

but if that is your standard, you can't have an airline either


but humans aren't 100% either... seems ridiculous to demand 100% from any implementation


If a human customer support person told me something and I made purchases based on that, and it turned out they lied, yeah I'd want recompense for that as well. You're allowed to be wrong (AI or human), you just have to face consequences for it.


I had that once with an airline, customer rep made promises and afterwards they refused

Coincidentally the audio recording of the conversation was apparently deleted …


A company is partially bound by their representatives actions, so humans can hit 100% despite making mistakes.

This is simply applying the exact same standards to a chat bot.


Maybe don't demand 100%, but instead responsibility for incorrect information.


If a human employee makes mistakes, the company will claim responsibility and in turn reprimand the human employee instead of claiming the human employee is its own "separate legal entity".




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: