Throwaway here but I used to work in Alexa org at Amazon and was amazed by how b...

asdasdsddd · on July 24, 2024

I've interviewed many people on Alexa before. From what I gather, its just a giant switch statement, and each individual "path" takes a bunch of effort to support and there are thousands of paths for music, ordering, commands, etc. It's peak AI == if statement architecture.

ihkasfjdkabnsk · on July 24, 2024

Throwaway, used to work at the NLU unit of Alexa about 5 years ago. There is some ML going on but as with all ML projects I have worked on people want control. This means you add rules for the "important" stuff. You also add test cases to make sure the ML works. But if you already have those test cases, why not just match on them directly? There are also advanced techniques for generating examples (FST for example).

What this culminated in is a platform where 80% of request, and pretty much 99% of "commands" are served by rules built with a team of linguists.

rtpg · on July 25, 2024

Is this not actually kind of powerful? Having linguists write up a bunch of rules seems a lot more predictable than "rolling a bunch of dice and hoping that some LLM spits out a coherent set of steps".

It feels very fractal but on the other hand if Alexa has only a specific gamut of responses it's not exactly a limitless state space right?

Very curious about how those rules look like though

IshKebab · on July 25, 2024

The problem is it's completely undiscoverable. You can tell Alexa "play some music" because you're pretty sure one of these linguists added a rule for that. But can you tell it "play me a song that lasts longer than 5 minutes"? Doubtful. The only way to know is to try it.

The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.

The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.

Contrast that with LLMs which basically at least understand everything you're asking of them. If you give them a simple API to carry out actions they can do really complex commands like "send a WhatsApp to my wife telling her how when I'll get home if I start cycling in 10 minutes". That's impossible without LLMs but pretty trivial with them.

Obviously the downside is they are prone to bullshitting and might do completely the wrong thing.

39896880 · on July 25, 2024

It’s worse than that. These systems can be adapted by looking at failed user commands, but people don’t really sit around trying out fun things and watch it fall on its face for longer than the first day or so. After that, the novelty wears off, so you’ve trained your users to accept the device’s limitations. Then, even when you do improve the functionality, your users won’t know! They won’t try it, and those commands will never get traction in the system or get more testing beyond the initial launch criteria. It’s a death spiral. The same thing happens with the tone of voice people use.

mycall · on July 25, 2024

> people don’t really sit around trying out fun things and watch it fall on its face for longer than the first day or so.

Isn't this what thumbs-up/down RL is for? To improve the quality of the results.

39896880 · on July 25, 2024

That’s the intention but very few users enjoy being unpaid QA for trillion dollar corps.

rtpg · on July 25, 2024

I am confused as to why it's more undiscoverable than, say, some LLM.

> The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.

This is not strictly true. Context free grammars can be written to handle (finite) sentences of arbitrary length! if you have a rule like "play me <song>" and then <song> can be "a song that lasts longer than X" or "a song by <artist>" (then you have <artist> be "<some name>" or "some German singer" or whatever....). You can just keep on going.

9dev · on July 25, 2024

> The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.

Had a small Google Assistant thingy for years, and that search stuff works great, until it doesn't, and completely misses the mark. This immediately kills trust and reduces it to a gadget that I will only use for non-critical stuff, always expecting it to break anyway.

antifa · on July 27, 2024

> But can you tell it "play me a song that lasts longer than 5 minutes"?

I don't think even pre-LLM technology allows you to do this.

I can't do something as basic as goto Spotify's search page and filter "only genres I like", neither a smart version of that filter or a manual version of that filter is possible.

ihkasfjdkabnsk · on July 25, 2024

Honestly the FSTs themselves were actually really cool, it's very much GOFAI. It automatically creates lots of permutations, i.e. `play taylor swift`, `please play taylor swift`, play taylor swift now`. etc. And once the FST is built it always works deterministically. It's compiled to a graph and an incoming command is pushed through the state machine, if you get to an end state it "matched the fst" and some specific behaviour would be triggered.

the rule were really just strings and we had efficient matching against it. I didn't work on that, I would assume some sort of LHS.

phito · on July 25, 2024

what do all these acronyms mean

39896880 · on July 25, 2024

https://en.wikipedia.org/wiki/Finite-state_transducer

https://medium.com/fetch-ai/classical-roots-of-ai-unpacking-....

factormeta · on July 25, 2024

wouldn't that just be some kind of NLP https://en.wikipedia.org/wiki/Natural_language_processing?

May just be a long list of if/else and/or switch statements or isomorphism.

shagie · on July 25, 2024

Was there a knowledge engine of some sort in the past? I could ask it some questions like “what color is a light red flower” and I would get back “a pink flower is pink.” Asking what color a purple cat was would get back purple… but asking what color a blue bird was would get back “a blue bird is blue, red, and brown.”

“Who has birthdays today?” And I would get a list of famous people with birthdays today. I could also ask if Alice and Bob (two names in the list) had the same birthday and I would get an answer (one time I think I got back some internal query language for it instead… but that’s lost in old bug reports).

Now any interesting question starts its answer with “according to an Alexa answers contributor…”

laidoffamazon · on July 24, 2024

Later on, there were methods to generate FSTs themselves without manual human curation.

gregmac · on July 25, 2024

> You also add test cases to make sure the ML works. But if you already have those test cases, why not just match on them directly?

Kind of says it all.

At the end of the day if you have a complex product but don't have comprehensive test cases, it's just a matter of time until your users notice your product sucks.

chamanbuga · on July 25, 2024

This is exactly how Cortana and Google Assistant have been built as well.

novok · on July 24, 2024

IMO with my experience with siri being AI-style unreliable in many ways, like bit flips when saying turn off the lights makes the dimmer go to %100, I think it's better to do the switch statement for the dozen or so query types that probably represent %90 of traffic, like weather, music, home control, unit conversions, etc in exchange for way more reliability.

laidoffamazon · on July 24, 2024

I think describing the NLU Engine as a switch statement is underselling it a bit. Determining domain and intent alone requires more than that (frequently, at least).

SahAssar · on July 24, 2024

Do you mean actual coded if statements as in actually human written code like

    if (question.match(/^what is (.*)/)) return wikipedia.search(question)

or something more automated?

laidoffamazon · on July 24, 2024

Not like that, but once you get to first party Alexa Skill themselves there's a bunch of match rules that make up a big chunk of the traffic. Don't know exactly how much. The longtail is done through ML means.

visarga · on July 25, 2024

> I've interviewed many people on Alexa before.

I got a recruiting message once for a ML engineer position in Alexa, ignored it.

brazzy · on July 24, 2024

I guess that's what happens when you're generating absurd amounts of revenue and want to "reinvest" it all: anything that even vaguely smells of "innovation" gets money thrown at it like crazy, and you end up incentivizing bullshitting.

bloggie · on July 25, 2024

My optimistic side tells me that Blue Origin was founded as a tech lab / app lab / skunkworks and not necessarily as a company with the goal of putting humans in space, that has suffered from said bullshitting. But my cynical side tells me the latter.

deepfriedrice · on July 24, 2024

> Just trying to shoehorn alexa into as many domains as possible

It happened outside of Alexa too. Every team with a public facing product was directed (it seemed) to come up with some sort of Alexa integration. It was usually dreamed up and either a) never prioritized or b) half assed because nobody (devs, PMs, etc.) actually thought it made any sense.

spike021 · on July 24, 2024

I once heard about a feature dogfooding invite that was sent out specifically for people with babies because they wanted to use Alexa always-listening to activate when a baby was crying and automatically order diapers or something ridiculous like that.

bagels · on July 24, 2024

Doesn't even make sense. Babies cry all the time for a multitude of reasons, none of which are informed by how many diapers are in the house.

shermantanktop · on July 24, 2024

Babies just need to learn to cry only when the diapers in the garage are running low. How hard could that be?

crabmusket · on July 24, 2024

Clearly the feedback loop is too long currently. If we could instantly dispense diapers onto the baby as soon as it started crying, that would improve learning outcomes and encourage experimentation.

shermantanktop · on July 25, 2024

Sounds like a virtuous cycle to me!

Babies who cry a little at a time will be awash with diapers, leading to secondary market opportunities. Now we just need a two-sided marketplace to capture that business and charge a modest fee.

sa46 · on July 24, 2024

Interestingly, there is an app, Chatterbaby [1], that claims to detect why a baby is crying based on the acoustic features of the cry. I've used it with middling success.

That'd be a neat integration: "Alexa, why is my baby crying?"

[1]: https://www.chatterbaby.org/pages/

https://www.nature.com/articles/s41390-019-0592-4

smolder · on July 25, 2024

Your baby is crying over the intrusion of big tech into their private life before they even have object permanence.

UberFly · on July 24, 2024

"Your baby is crying because it doesn't like the brand of formula you purchase. We recommend Amazon Basics!"

dyingkneepad · on July 24, 2024

"We recommend you try UBQTLONR this time!"

teractiveodular · on July 24, 2024

Now with extra melamine!

Electricniko · on July 25, 2024

"Your baby is crying because you often raise your voice and make unreasonable demands of those around you. Would you like to add insurance for a lifetime of psychiatric help to your cart?"

39896880 · on July 25, 2024

This sounds like a joke but Amazon made a device specifically to listen to people talk all day and tell them how they sound to others. It is called Halo.

throwway120385 · on July 24, 2024

Yeah and given how often we get it wrong as parents, imagine the absolute dumpster fire it would be to have a voice assistant get it 10x as wrong.

alexathrowawa9 · on July 24, 2024

I remember getting that invite!

You could use "baby crying detected" as an automation trigger

laidoffamazon · on July 24, 2024

It wasn't for diaper ordering, Alexa has a feature to detect particular sounds (like glass breaking, dog barking etc) for monitoring and alerting purposes. Basically it involved adding hotword recognition for not just the N number of hotwords but also Y number of sounds for particular devices.

bobnamob · on July 24, 2024

> I would joke that the canary tests were the biggest customer for a lot of services

This is true for a surprising number of amzn/aws products

Balgair · on July 24, 2024

Hey, I gotta ask a few questions here, given this oportunity:

How many people actually 'worked' there? Like, really did something all day?

What was the pay like?

What were the internal politics like?

Any good stories?

alexathrowawa9 · on July 24, 2024

> Any good stories?

One time I walked into a dark room with like 50 test devices to get something and somehow accidentally triggered them and all 50 started talking at the same time

Was both hilarious and creepy

LegitShady · on July 25, 2024

the furby horror

baxtr · on July 25, 2024

I would love to see the initial “vision doc”. How were they envisioning to make lots of money with it in the first place?

SomeCallMeTim · on July 25, 2024

I am actually surprised that there are thousands of people working on Alexa.

WTF are they all doing?! It's pretty much unchanged from the outside in any way that's relevant to me compared to where it was in 2014. And the few changes I've noticed have been things breaking.

I used to, for instance, have a script (I forget the Alexa term) that would turn off a few lights and then play a Pandora radio station when I gave it a "bedtime" command. Worked great for about a year, and then the Pandora plugin suddenly refused to take any combination of commands that I could figure out to play a particular Pandora station in my account. This is true from outside of the automation as well, by the way. It's just completely broken.

The weather app integration is annoying too. I wanted weather to use a different weather source, and instead of just giving me results from that weather source, it would always preface it with "Weather from BlueSky" or whatever. Maybe it's their fault and they wanted the ad blurb? But as a consumer, it sucked. I just wanted more localized weather, not an ad every time I asked for the weather.

And the "AI" behavior of the app...it was just awful. I could get better answers from Google Home devices across the board. The best Alexa would do if I asked it a question is to read the first paragraph of a Wikipedia entry, and it was about a 1 in 4 chance it would actually choose the correct Wikipedia entry.

OH, and don't get me started on the Android Alexa app (!!!). Again, the most major change was a UI update where the most important feature I ever use was hidden behind another layer of menus for no particularly good reason. And the "Kindle Accessibility" feature of reading Kindle books is so flaky I doubt anyone on the team ever uses it, from random pauses to sudden jumps back to read from the beginning of the section of the book you started on 10 minutes ago, looping forever on those same 10 minutes.

Sorry. I know it wasn't your fault. But I finally gave up on using Echo devices, and the only reason I still even have the Alexa app on my phone any more is so I can have it read a Kindle book while I'm driving, and it's so amazingly frustrating to use that it would likely be better if it didn't even exist. It's more "customer frustration" than useful.

xnx · on July 28, 2024

> Just trying to shoehorn alexa into as many domains as possible

The original AI technology spamming: Alexa Integration

conductr · on July 25, 2024

My take on it was that Alexa was developed at a time they were trying really hard to solidify their continued existence as a tech company; AWS and kindle came around then too. The stock market values tech companies much higher than retail companies. That’s what justified the headcount expense

nickm12 · on July 25, 2024

AWS and Kindle were developed a decade before the Echo.

conductr · on July 25, 2024

Yeah, it’s just how I remember it. It was a period of time, not a moment. A bit longer apart than i remembered but that phase of amazons management and growth felt like a cohesive moment to me as an outside observer. Their strategy has changed since then. And have they released much new tech that wasn’t expansion or iterations of those?