Throwaway here but I used to work in Alexa org at Amazon and was amazed by how big the org was (thousands and thousands of people) considering it doesn't seem to be a big revenue generator
I remember constantly hearing about projects other teams were working on thinking "why would anyone use that" or "how would that ever make money"
Just trying to shoehorn alexa into as many domains as possible
It was like empire building at its finest
I would joke that the canary tests were the biggest customer for a lot of services
And the way amazon works with SOA even what seems like a small feature ends up being a couple services, a pizza team of 10 devs + SDM, the overhead is huge
Back when it was announced alexa org was being hit harder by layoffs that did not surprise me
I've interviewed many people on Alexa before. From what I gather, its just a giant switch statement, and each individual "path" takes a bunch of effort to support and there are thousands of paths for music, ordering, commands, etc. It's peak AI == if statement architecture.
Throwaway, used to work at the NLU unit of Alexa about 5 years ago. There is some ML going on but as with all ML projects I have worked on people want control. This means you add rules for the "important" stuff. You also add test cases to make sure the ML works. But if you already have those test cases, why not just match on them directly? There are also advanced techniques for generating examples (FST for example).
What this culminated in is a platform where 80% of request, and pretty much 99% of "commands" are served by rules built with a team of linguists.
Is this not actually kind of powerful? Having linguists write up a bunch of rules seems a lot more predictable than "rolling a bunch of dice and hoping that some LLM spits out a coherent set of steps".
It feels very fractal but on the other hand if Alexa has only a specific gamut of responses it's not exactly a limitless state space right?
Very curious about how those rules look like though
The problem is it's completely undiscoverable. You can tell Alexa "play some music" because you're pretty sure one of these linguists added a rule for that. But can you tell it "play me a song that lasts longer than 5 minutes"? Doubtful. The only way to know is to try it.
The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.
The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.
Contrast that with LLMs which basically at least understand everything you're asking of them. If you give them a simple API to carry out actions they can do really complex commands like "send a WhatsApp to my wife telling her how when I'll get home if I start cycling in 10 minutes". That's impossible without LLMs but pretty trivial with them.
Obviously the downside is they are prone to bullshitting and might do completely the wrong thing.
It’s worse than that. These systems can be adapted by looking at failed user commands, but people don’t really sit around trying out fun things and watch it fall on its face for longer than the first day or so. After that, the novelty wears off, so you’ve trained your users to accept the device’s limitations. Then, even when you do improve the functionality, your users won’t know! They won’t try it, and those commands will never get traction in the system or get more testing beyond the initial launch criteria. It’s a death spiral. The same thing happens with the tone of voice people use.
I am confused as to why it's more undiscoverable than, say, some LLM.
> The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.
This is not strictly true. Context free grammars can be written to handle (finite) sentences of arbitrary length! if you have a rule like "play me <song>" and then <song> can be "a song that lasts longer than X" or "a song by <artist>" (then you have <artist> be "<some name>" or "some German singer" or whatever....). You can just keep on going.
> The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.
Had a small Google Assistant thingy for years, and that search stuff works great, until it doesn't, and completely misses the mark. This immediately kills trust and reduces it to a gadget that I will only use for non-critical stuff, always expecting it to break anyway.
> But can you tell it "play me a song that lasts longer than 5 minutes"?
I don't think even pre-LLM technology allows you to do this.
I can't do something as basic as goto Spotify's search page and filter "only genres I like", neither a smart version of that filter or a manual version of that filter is possible.
Honestly the FSTs themselves were actually really cool, it's very much GOFAI. It automatically creates lots of permutations, i.e. `play taylor swift`, `please play taylor swift`, play taylor swift now`. etc. And once the FST is built it always works deterministically. It's compiled to a graph and an incoming command is pushed through the state machine, if you get to an end state it "matched the fst" and some specific behaviour would be triggered.
the rule were really just strings and we had efficient matching against it. I didn't work on that, I would assume some sort of LHS.
Was there a knowledge engine of some sort in the past? I could ask it some questions like “what color is a light red flower” and I would get back “a pink flower is pink.” Asking what color a purple cat was would get back purple… but asking what color a blue bird was would get back “a blue bird is blue, red, and brown.”
“Who has birthdays today?” And I would get a list of famous people with birthdays today. I could also ask if Alice and Bob (two names in the list) had the same birthday and I would get an answer (one time I think I got back some internal query language for it instead… but that’s lost in old bug reports).
Now any interesting question starts its answer with “according to an Alexa answers contributor…”
> You also add test cases to make sure the ML works. But if you already have those test cases, why not just match on them directly?
Kind of says it all.
At the end of the day if you have a complex product but don't have comprehensive test cases, it's just a matter of time until your users notice your product sucks.
IMO with my experience with siri being AI-style unreliable in many ways, like bit flips when saying turn off the lights makes the dimmer go to %100, I think it's better to do the switch statement for the dozen or so query types that probably represent %90 of traffic, like weather, music, home control, unit conversions, etc in exchange for way more reliability.
I think describing the NLU Engine as a switch statement is underselling it a bit. Determining domain and intent alone requires more than that (frequently, at least).
Not like that, but once you get to first party Alexa Skill themselves there's a bunch of match rules that make up a big chunk of the traffic. Don't know exactly how much. The longtail is done through ML means.
I guess that's what happens when you're generating absurd amounts of revenue and want to "reinvest" it all: anything that even vaguely smells of "innovation" gets money thrown at it like crazy, and you end up incentivizing bullshitting.
My optimistic side tells me that Blue Origin was founded as a tech lab / app lab / skunkworks and not necessarily as a company with the goal of putting humans in space, that has suffered from said bullshitting. But my cynical side tells me the latter.
> Just trying to shoehorn alexa into as many domains as possible
It happened outside of Alexa too. Every team with a public facing product was directed (it seemed) to come up with some sort of Alexa integration. It was usually dreamed up and either a) never prioritized or b) half assed because nobody (devs, PMs, etc.) actually thought it made any sense.
I once heard about a feature dogfooding invite that was sent out specifically for people with babies because they wanted to use Alexa always-listening to activate when a baby was crying and automatically order diapers or something ridiculous like that.
Clearly the feedback loop is too long currently. If we could instantly dispense diapers onto the baby as soon as it started crying, that would improve learning outcomes and encourage experimentation.
Babies who cry a little at a time will be awash with diapers, leading to secondary market opportunities. Now we just need a two-sided marketplace to capture that business and charge a modest fee.
Interestingly, there is an app, Chatterbaby [1], that claims to detect why a baby is crying based on the acoustic features of the cry. I've used it with middling success.
That'd be a neat integration: "Alexa, why is my baby crying?"
"Your baby is crying because you often raise your voice and make unreasonable demands of those around you. Would you like to add insurance for a lifetime of psychiatric help to your cart?"
This sounds like a joke but Amazon made a device specifically to listen to people talk all day and tell them how they sound to others. It is called Halo.
It wasn't for diaper ordering, Alexa has a feature to detect particular sounds (like glass breaking, dog barking etc) for monitoring and alerting purposes. Basically it involved adding hotword recognition for not just the N number of hotwords but also Y number of sounds for particular devices.
One time I walked into a dark room with like 50 test devices to get something and somehow accidentally triggered them and all 50 started talking at the same time
I am actually surprised that there are thousands of people working on Alexa.
WTF are they all doing?! It's pretty much unchanged from the outside in any way that's relevant to me compared to where it was in 2014. And the few changes I've noticed have been things breaking.
I used to, for instance, have a script (I forget the Alexa term) that would turn off a few lights and then play a Pandora radio station when I gave it a "bedtime" command. Worked great for about a year, and then the Pandora plugin suddenly refused to take any combination of commands that I could figure out to play a particular Pandora station in my account. This is true from outside of the automation as well, by the way. It's just completely broken.
The weather app integration is annoying too. I wanted weather to use a different weather source, and instead of just giving me results from that weather source, it would always preface it with "Weather from BlueSky" or whatever. Maybe it's their fault and they wanted the ad blurb? But as a consumer, it sucked. I just wanted more localized weather, not an ad every time I asked for the weather.
And the "AI" behavior of the app...it was just awful. I could get better answers from Google Home devices across the board. The best Alexa would do if I asked it a question is to read the first paragraph of a Wikipedia entry, and it was about a 1 in 4 chance it would actually choose the correct Wikipedia entry.
OH, and don't get me started on the Android Alexa app (!!!). Again, the most major change was a UI update where the most important feature I ever use was hidden behind another layer of menus for no particularly good reason. And the "Kindle Accessibility" feature of reading Kindle books is so flaky I doubt anyone on the team ever uses it, from random pauses to sudden jumps back to read from the beginning of the section of the book you started on 10 minutes ago, looping forever on those same 10 minutes.
Sorry. I know it wasn't your fault. But I finally gave up on using Echo devices, and the only reason I still even have the Alexa app on my phone any more is so I can have it read a Kindle book while I'm driving, and it's so amazingly frustrating to use that it would likely be better if it didn't even exist. It's more "customer frustration" than useful.
My take on it was that Alexa was developed at a time they were trying really hard to solidify their continued existence as a tech company; AWS and kindle came around then too. The stock market values tech companies much higher than retail companies. That’s what justified the headcount expense
Yeah, it’s just how I remember it. It was a period of time, not a moment. A bit longer apart than i remembered but that phase of amazons management and growth felt like a cohesive moment to me as an outside observer. Their strategy has changed since then. And have they released much new tech that wasn’t expansion or iterations of those?
I remember constantly hearing about projects other teams were working on thinking "why would anyone use that" or "how would that ever make money"
Just trying to shoehorn alexa into as many domains as possible
It was like empire building at its finest
I would joke that the canary tests were the biggest customer for a lot of services
And the way amazon works with SOA even what seems like a small feature ends up being a couple services, a pizza team of 10 devs + SDM, the overhead is huge
Back when it was announced alexa org was being hit harder by layoffs that did not surprise me