One of the people leading ESPHome here. Let me know if there any questions.
Last Saturday we announced that ESPHome is now owned by the Open Home Foundation. The Open Home Foundation fights for privacy, choice, and sustainability for smart homes. And for every person who lives in one. Learn more at https://www.openhomefoundation.org/blog/announcing-the-open-...
If there are people reading this and are excited to try out ESPHome: try it out without writing a single line of configuration by installing some of our ready-made projects: https://esphome.io/projects/
It allows you to turn a cheap microcontroller into a voice assistant, bluetooth proxy or media player directly from your browser.
No questions, only praise. This project is simply awesome, I've been astonished time and time again by the features. I had done a complete dive into the Espressif SDK trying to implement a wireless switch with temperature sensor and mqtt and had nearly finished the project when I stumbled on ESPHome obsoleting all of my work at once. It was just everything I had written so far plus many added features and obsoleted all my work at once.
I both love and hate when this happens. Discoverability seems like the hardest challenge. I always do quite a bit of searching for existing stuff before rolling my own, and it can be really hard to find stuff. Most of the time I stumble on it serendipitously at some point later.
I have one! ESPHome is awesome but I'm trying to steer away from Wifi IoT - a big reason is that I like the idea of self-healing meshes that can work entirely offline, without having to deal with a lot of configuration.
Espressif seems to have a few devices with ZigBee capabilities, think there will be a way of building our own ZigBee device in the future?
There is no reason that Wifi devices can't work without internet. Most ESP32 devices don't talk to internet, but to other device on local network. Wifi doesn't really need mesh since has longer range.
I hope ESPHome is working on Matter support cause protocol that can switch between Wifi, Bluetooth, and Thread is a big advantage.
Fascinating. When I read your comment my first thought was to use something like LoRa, though perhaps broadcasting your data for miles is an antifeature.
The ESPHome project is unusually competent, user-centric, and almost uncanny in how well it works.
I'll tell you what I want, though. I'm not sure this is in-scope for ESPHome, or how it's possible to even implement cleanly:
I want to be able to make devices which have tight feedback loops and more complex on-board algorithms
What I really want is e.g. a light sensor controlling lightbulbs. Here, I want the lightbulbs changing almost continuously by almost imperceptible amounts, things like Kalman filters, and similar, to keep a fixed light level and light temperature based on time-of-day.
I'd like to have my air filters, ventilation, heating, humidification, dehumidification, and cooling continuously controlled such that:
1) All run at the right level continuously to keep environmentals and power optimized.
2) Ventilation reduces CO2 / TVOC levels, but increase PM2.5 levels and lets in external temperature
4) Space heaters cost a lot more than baseline heating, but are sometimes necessary on very cold days
5) This is all less important when I'm not home, and some things change. When I'm home, I want liveable humidity. When I'm not, I want to minimize humidity.
... and so on.
(A second thing I want is ESPHome to allow me to make Zigbee, rather than just wifi, devices)
My home has an ERV and I use a couple Shelly relays (one for power and the other to boost airflow) integrated into HA to modulate the amount of fresh air I bring in, currently based on indoor/outdoor temperature and humidity. I don't have an air quality sensor, but if I had one I could easily integrate that into my automations.
A red flag, relative to what I would do is: "Frequency to adapt the lights, in seconds." I would like to be able to make tight feedback loops, which means much less than seconds. I use HA + ESPHome as well, and that's on my list of issues I'd like to see resolved. To understand why this matters for a lot of controllers:
Audio amplifiers are often in the 50MHz range, in order to achieve good performance in the <20kHz range. Add to that, in this case, the desire for steady transitions so I don't have sudden light or noise changes (stepping through 255 states takes .
That said, holistically, this does what I want better than how I was going to do it.
The other major issue I have with HA is reliability. About 10% of the time, some automations don't work. I'd really like to be able to set state (blinds are down after 8pm) rather than actions (blinds go down at 8pm). If you have suggestions....
I would love to hear more about the integration you've setup. I, too, have an ERV but it's on a dumb controller right now.
I don't use HA yet, but it's a project I plan to tackle soon. I've also been doing some research on ESP and energy monitoring, so it sounds like what you've done is right up the same alley.
Thank you! I've rarely been as impressed by how well software works. Flashing, compiling, logging and OTA updates were always a PITA and with ESPHome it's a breeze. Logging over wifi feels like it shouldn't be that simple. I've created a mini IR receiver / transmitter to control my sound system with my TV remote. It was super simple to set up, and the integration with Home Assistant is great!
Why do you want to avoid Home Assistant? From what I've found setting up my automations is that once you have it doing something useful you find you have a lot of ideas to further make your life better. I might be wrong but I suspect you'll spend a lot of time doing something that is simple in HA only then find you want to do something else similar.
"mostly zigbee?" not sure what you mean there. But ESPHome can be controlled directly without HA. You should read the website, specifically the sections on "Networking" and "Management and Monitoring".
If you are starting at zero there is a big learning curve, but if you're into it, it is a lot of fun.
You can bind a device to another, so while you would need the ability to issue the command, a server wouldn't be required to handle the state propagation.
I’ve been using Home Assistant for about 3 years now, and talk about it to pretty much anyone who will listen. Thanks for keeping home automation open and focused on the users!
The foundation can only work in the interest of privacy, choice and sustainability for the smart home. It is important that we have a thriving ecosystem of communities and companies working towards this goal. You cannot do this with just a single player. If, at some hypothetical point in the future, that means it will work against my interests, then the foundation is doing exactly what we created it for.
First of all many thanks for helping to maintain such a great project!
The feedback I have right now is that for the ESP32-C3 chip, provisioning over USB (Improv_serial) is not supported. So then the only option is to do provisioning over BLE if you want to get the "Made for ESPHome" certification.
However, this blew up our partition size from 1.2 MB to 1.9 MB and basically prevented us to add any further code and we got stuck there (we now develop a native HA integration).
So my feedback would be to try and reduce the overhead for the provisioning.
You can already hook up an RS-485 transceiver to the UART ports and use it today with the UART driver.
Esphome also has a Modbus controller component. What are you referring to by “generic” RS-485 that isn’t available already?
Yes the modules with rpi2040 and Ethernet would be so great to have supported.
I'm pretty unhappy with the WiFi 8266 modules I have. They regularly go into unavailable in home assistant for a few minutes even though my WiFi is working fine
Could it be that they're simply idle sleeping to save power? My first ESPhome device confused me by dropping off and on and it turned out that was the problem, idle sleep was configured by default - the device would wake up, report its status and go back to sleep.
If my understanding is correct, ESPHome need to be re-compiled and uploaded every time the config yaml is changed. Is it possible to separate the binary and the config so that for some config changes, there is no need to re-compile and upload the binary? Thanks.
I believe this is because the yaml is in fact the instructions for what to include in the binary. It wouldn't be feasible for the firmware to include all possible device and peripheral code and enable parts at run time.
I think you can see the esphome intermediate code generation in the file tree during compilation and see how the yaml sections map to blocks of C/C++ code being built.
In practice, this is absolutely no problem. It generally only needs to re-compile a file or two for small changes, which takes seconds, and the OTA update functionality works perfectly so you don't need to unplug it/bring it to your desk.
With a $10 18650 rechargeable battery you could probably get a couple months on a single charge. Esphome can deep sleep between readouts: https://esphome.io/components/deep_sleep.html
Deep sleep pulls the power draw down to 50uA.
Of course if you also wire a solar charger next to the battery… maybe it will never run out.
I already got my ESP gadgets to tinker, what's missing is the time. It's super fun, and looking so forward for it, but my to-do list cannot give me a break. Great job what you guys are doing!
Today at Home Assistant we announced the release of microWakeWord. It is a new wake word model that is fully open source and runs on the ESP32 S3 microcontroller. It is integrated into ESPHome and now powers the latest generation of Home Assistant's privacy focused voice assistant.
With Home Assistant we plan to integrate similar functionality this year out of the box. OP touches upon some good points that we have also ran into and I would love the local LLM community to solve:
* I would love to see a standardized API for local LLMs that is not just a 1:1 copying the ChatGPT API. For example, as Home Assistant talks to a random model, we should be able to query that model to see what the model is capable off.
* I want to see local LLMs with support for a feature similar or equivalent to OpenAI functions. We cannot include all possible information in the prompt and we need to allow LLMs to make actions to be useful. Constrained grammars do look like an possible alternative. Creating a prompt to write JSON is possible but need quite an elaborate prompt and even then the LLM can make errors. We want to make sure that all JSON coming out of the model is directly actionable without having to ask the LLM what they might have meant for a specific value.
I think that LLMs are going to be really great for home automation and with Home Assistant we couldn't be better prepared as a platform for experimentation for this: all your data is local, fully accessible and Home Assistant is open source and can easily be extended with custom code or interface with custom models. All other major smart home platforms limit you in how you can access your own data.
Here are some things that I expect LLMs to be able to do for Home Assistant users:
Home automation is complicated. Every house has different technology and that means that every Home Assistant installation is made up of a different combination of integrations and things that are possible. We should be able to get LLMs to offer users help with any of the problems they are stuck with, including suggested solutions, that are tailored to their situation. And in their own language. Examples could be: create a dashboard for my train collection or suggest tweaks to my radiators to make sure each room warms up at a similar rate.
Another thing that's awesome about LLMs is that you control them using language. This means that you could write a rule book for your house and let the LLM make sure the rules are enforced. Example rules:
* Make sure the light in the entrance is on when people come home.
* Make automated lights turn on at 20% brightness at night.
* Turn on the fan when the humidity or air quality is bad.
Home Assistant could ship with a default rule book that users can edit. Such rule books could also become the way one could switch between smart home platforms.
Reading this gave me an idea to extend this even further. What if the AI could look at your logbook history and suggest automations? For example, I have an automation that turns the lights on when it's dark based on a light sensor. It would be neat if AI could see "hey, you tend to manually turn on the lights when the light level is below some value, want to create an automation for that?"
I've been working on something like this but it's of course harder than it sounds, mostly due to how few example use cases there are. A dumb false positive for yours might be "you tend to turn off the lights when the outside temperature is 50º"
Anyone know of a database of generic automations to train on?
Temperature and light may create illusions in LLM. A potential available solution to this is to establish a knowledge graph based on sensor signals, where LLM is used to understand the speech signals given by humans and then interpret these signals as operations on the graph using similarity calculations.
We might take it one step further and ask the user if they want to add a rule that certain rooms have a certain level of light.
Although light level would tie it to a specific sensor. A smart enough system might also be able to infer this from the position of the sun + weather (ie cloudy) + direction of the windows in the room + curtains open/closed.
I can write a control system easy enough to do this. I'm kind of an expert at that, for oddball reasons, and that's a trivial amount of work for me. The "smart enough" part, I'm more than smart enough for.
What's not a trivial amount of work is figuring out how to integrate that into HA.
I can guarantee that there is an uncountably infinite number of people like me, and very few people like you. You don't need to do my work for me; you just need to enable me to do it easily. What's really needed are decent APIs. If I go into Settings->Automation, I get a frustrating trigger/condition/action system.
This should instead be:
1) Allow me to write (maximally declarative) Python / JavaScript, in-line, to script HA. To define "maximally declarative," see React / Redux, and how they trigger code with triggers
2) Allow my kid(s) to do the same with Blockly
3) Ideally, start to extend this to edge computing, where I can push some of the code into devices (e.g. integrating with ESPHome and standard tools like CircuitPython and MakeCode).
This would have the upside of also turning HA into an educational tool for families with kids, much like Logo, Microsoft BASIC, HyperCard, HTML 2.0, and other technologies of yesteryear.
Specifically controlling my lights to give constant light was one of the first things I wanted to do with HA, but the learning curve meant there was never enough time. I'm also a big fan of edge code, since a lot of this could happen much more gradually and discreetly. That's especially true for things with motors, like blinds, where a very slow stepper could make it silent.
1) You can basically do this today with Blueprints. There's also things like Pyscript [0].
2) The Node-RED implementation in HA is phenomenal and kids can very easily use with a short introduction.
3) Again, already there. ESPHome is a first class citizen in HA.
I feel like you've not read the HA docs [1,] or took the time to understand the architecture [2]. And, for someone who has more than enough self-proclaimed skills, this should be a very understandable system.
(1) You are correct that I have not read the docs or discovered everything there is. I have had HA for a few weeks now. I am figuring stuff out. I am finding the learning curve to be steep.
(2) However, I don't think you understand the level of usability and integration I'm suggesting. For most users, "read the docs" or "there's a github repo somewhere" is no longer a sufficient answer. That worked fine for 1996-era Linux. In 2023, this needs to be integrated into the user interface, and you need discoverability and on-ramps. This means actually treating developers as customers. Take a walk through Micro:bit and MakeCode to understand what a smooth on-ramp looks like. Or the Scratch ecosystem.
This contrasts with the macho "for someone who has more than enough self-proclaimed skills, this should be a very understandable system" -- no, it is not a very understandable system for me. Say what you will about my skills, that means it will also not be an understandable system for most e.g. kids and families.
That said, if you're correct, a lot of this may just be a question of relatively surface user-interface stuff, configuration and providing good in-line documentation.
(3) Skills are not universal. A martial artist might be a great athlete, but unless you're Kareem Abdul-Jabbar, that doesn't make you a great basketball player. My skills do include (1) designing educational experiences for kids; and (2) many semesters of graduate-level coursework on control theory.
That's very different from being fluid at e.g. managing docker containers, which I know next to nothing about. My experience trying to add things to HA has not been positive. I spent a lot of time trying to add extensions which would show me a Zigbee connectivity map to debug some connectivity issues. None worked. I eventually found a page which told me this was already in the system *shrug*. I still don't know why the ones I installed didn't work, or where to get started debugging.
For me, that was harder than doing a root-locus plot, implementing a system identification, designing a lag or lead compensator, or running the Bode obstacle course.
Seriously. If I went into HA, and there was a Python console with clear documentation and examples, this would be built. That's my particular skills, but a userbase brings very diverse other skills.
I think people might be a bit offended by what sounds like arrogance. But I completely agree with your general concern that nuts and bolts of making somebody else's software work is often frustrating, complicated and inaccessible while math, logic and domain knowledge is "easy" for many people and far more generally known. Even to the point that it's often easier to write your own thing than bother to learn about an existing one.
A way I sometimes evaluate whether to implement some feature in my work is the ratio of the work it does for the user to the work the user has to do for it. Adding a page header in MS Word used to have a very low ratio. A web based LLM is at the other extreme. Installing a bunch of interdependent finicky software just to do simple child-level programming for HA seems like a poor ratio too.
Thank you so much for that comment. I really appreciate the feedback.
I do sometimes come off as arrogant. That's unfortunate, and in part due to my cultural background. It's helpful feedback. It's difficult to be downvoted or attacked, and not know why.
I will mention: They're just different skill sets. I know people who can dive into a complex piece of code or dev-ops infrastructure and understand it in hours or days. I'm just not one of them.
Learning to design control systems is a very deep (and rather obscure) pile of mathematics which takes many years of study and is a highly specialized. I picked it up for oddball reasons a few decades ago. Doing proper control systems requires a lot of advanced linear algebra, rational functions, frequency domain analysis, parts of differential equations, etc. That's not the same thing as general math skills. Most people who specialize in this field work in Matlab, wouldn't know what docker is, and in terms of general mathematics, have never taken a course on abstract algebra or topology. Even something like differential equations, one needs only a surface understanding of (it disappears when one shifts to Laplace domain or matrix state space representations).
There's a weird dynamic where things we can't do often seem easier or harder than ones we can. Here, I just have a specialized skillset relevant to the conversation. That doesn't imply I'm a genius, or even a brilliant mathematician.
That just implies I can design an optimal control system. Especially for a system with dynamics as simple as room lighting. And would have a fun time doing that for everything in HA in my house and sharing.
I'd really like to have other things work the same way too, for that matter, where e.g. my HVAC runs heating 24/7 at the right level, rather than toggling on and off. With my background, the intimidating part isn't the math or the electronics, but the dev-ops.
At least higher-end LLMs are perfectly capable of making quite substantive logical inferences from data. I'd argue that an LLM is likely to be better than many other methods if the dataset is small, while other methods will be better once you're dealing with data that pushes the context window.
E.g. I just tested w/ChatGPT, gave it a selection of instructions about playing music, the time and location, and a series of hypothetical responses, and then asked it to deduce what went right and wrong about the response, and it correctly deduced what the user intent I implied was a user that given the time (10pm) and place (the bedroom) and rejection of loud music possibly just preferred calmer music, but who at least wanted something calmer for bedtime.
I also asked it to propose a set of constrained rules, and it proposed rules that'd certainly make me a lot happier by e.g. starting with calmer music if asked an unconstrained "play music" in the evening, and transition artists or genres more aggressively the more the user skips to try to find something the user will stick with.
In other words, you absolutely can get an LLM to look at even very constrained history and get it to apply logic to try to deduce a better set of rules, and you can get it to produce rules in a constrained grammar to inject into the decision making process without having to run everything past the LLM.
While given enough data you can train a model to try to produce the same result, one possible advantage of the above is that it's far easier to introspect. E.g. my ChatGPT session had it suggest a "IF <user requests to play music> AND <it is late evening> THEN <start with a calming genre>" rule. If it got it wrong (maybe I just disliked the specific artists I used in my example, or loved what I asked for instead), then correcting its mistake is far easier if it produces a set of readable rules, and if it's told to e.g. produce something that stays consistent with user-provided rules.
(the scenario I gave it, btw. is based on my very real annoyance with current music recommendation that all to often does fail to take into account things like avoiding abrupt transitions, paying attention to the time of day and volume settings, and changing tack or e.g. asking questions if the user skips multiple tracks in quick succession)
[Anonymous] founder of a similarly high-profile initiative here.
> Creating a prompt to write JSON is possible but need quite an elaborate prompt and even then the LLM can make errors. We want to make sure that all JSON coming out of the model is directly actionable without having to ask the LLM what they might have meant for a specific value
The LLM cannot make errors. The LLM spits out probabilities for the next tokens. What you do with it is up to you. You can make errors in how you handle this.
Standard usages pick the most likely token, or a random token from the top many choices. You don't need to do that. You can pick ONLY words which are valid JSON, or even ONLY words which are JSON matching your favorite JSON format. This is a library which does this:
The one piece of advice I will give: Do NOT neuter the AI like OpenAI did. There is a near-obsession to define "AI safety" as "not hurting my feelings" (as opposed to "not hacking my computer," "not launching nuclear missiles," or "not exterminating humanity."). For technical reasons, that makes them work much worse. For practical reasons, I like AIs with humanity and personality (much as the OP has). If it says something offensive, I won't break.
AI safety, in this context, means validating that it's not:
* setting my thermostat to 300 degrees centigrade
* power-cycling my devices 100 times per second to break them
* waking me in the middle of the night
... and similar.
Also:
* Big win if it fits on a single 16GB card, and especially not just NVidia. The cheapest way to run an LLM is an Intel Arc A770 16GB. The second-cheapest is an NVidia 4060 Ti 16GB
* Azure gives a safer (not safe) way of running cloud-based models for people without that. I'm pretty sure there's a business model running these models safely too.
I suspect cloning OpenAI's API is done for compatibility reasons. most AI-based software already support the GPT-4 API, and OpenAI's official client allows you to override the base URL very easily. a local LLM API is unlikely to be anywhere near as popular, greatly limiting the use cases of such a setup.
a great example is what I did, which would be much more difficult without the ability to run a replica of OpenAI's API.
I will have to admit, I don't know much about LLM internals (and certainly do not understand the math behind transformers) and probably couldn't say much about your second point.
I really wish HomeAssistant allowed streaming the response to Piper instead of having to have the whole response ready at once. I think this would make LLM integration much more performant, especially on consumer-grade hardware like mine. right now, after I finish talking to Whisper, it takes about 8 seconds before I start hearing GlaDOS and the majority of the time is spent waiting for the language model to respond.
I tried to implement it myself and simply create a pull request, but I realized I am not very familiar with the HomeAssistant codebase and didn't know where to start such an implementation. I'll probably take a better look when I have more time on my hands.
So how much of the 8s is spent in the LLM vs Piper?
Some of the example responses are very long for the typical home automation usecase which would compound the problem. Ample room for GladOS to be sassy but at 8s just too tardy to be usable.
A different approach might be to use the LLM to produce a set of GladOS-like responses upfront and pick from them instead of always letting the LLM respond with something new. On top of that add a cache that will store .wav files after Piper synthesized them the first time.
A cache is how e.g. Mycroft AI does it. Not sure how easy it will be to add on your setup though.
it is almost entirely the LLM. I can see this in action by typing a response on my computer instead of using my phone/watch, which bypasses Whisper and Piper entirely.
your approach would work, but I really like the creativity of having the LLM generate the whole thing. it feels much less robotic. 8 seconds is bad, but not quite unusable.
Streaming responses is definitely something that we should look into. The challenge is that we cannot just stream single words, but would need to find a way to learn how to cut up sentences. Probably starting with paragraphs is a good first start.
alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.
there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.
I don't suppose you guys have something in the works for a polished voice I/O device to replace Alexa and Google Home? They work fine, but need internet connections to function. If the desire is to move to fully offline capabilities then we need the interface hardware to support. You've already proven you can move in the hardware market (I'm using one of your yellow devices now). I know I'd gladly pay for a fully offline interface for every room of my house.
That's something we've been building towards to all of last year. Last iteration can be seen at [1]. Still some checkboxes to check before we're ready to ship it on ready-made hardware.
It looks like the "ESP32-S3-BOX-3" is the latest hardware iteration? I looked last year online for the older S3 hardware and everywhere was out of stock. Do you have a recommendation for where to purchase or perhaps alternatively some timeline for a new version with increased planned production?
Regarding accessible local LLMs have you heard of the llamafiles project? It allows for packaging one executable LLM that works on Mac, windows and Linux.
Currently pushing for application note https://github.com/Mozilla-Ocho/llamafile/pull/178 to encourage integration. Would be good to hear your thoughts on making it easier for home assistant to integrate with llamafiles.
Also as an idea, maybe you could certify recommendations for LLM models for home assistant. Maybe for those specifically trained to operate home assistant you could call it "House Trained"? :)
As a user of Home Assistant, I would want to easily be able to try out different AI models with a single click from the user interface.
Home Assistant allows users to install add-ons which are Docker containers + metadata. This is how today users install Whisper or Piper for STT and TTS. Both these engines have a wrapper that speaks Wyoming, our voice assistant standard to integrate such engines, among other things. (https://github.com/rhasspy/rhasspy3/blob/master/docs/wyoming...)
If we rely on just the ChatGPT API to allow interacting with a model, we wouldn't know what capabilities the model has and so can't know what features to use to get valid JSON actions out. Can we pass our function definitions or should we extend the prompt with instructions on how to generate JSON?
I cannot pass this opportunity to thank you very, very much for HA. It is a wonderful product that evolved from "cross your nerd fingers and hope for the best" to "my family uses it".
The community around the forum is very good too (with some actors being fantastic) and the documentation is not too bad either :) (I contributed to some changes and am planning to write a "so you want to start with HA" kind of page to summarize what new users will be faced with).
Again THANK YOU - this literally chnages some people's lives.
I can't help but think of someone downloading "Best Assistant Ever LLM" which pretends to be good but unlocks the doors for thieves or whatever.
Is that a dumb fear? With an app I need to trust the app maker. With an app that takes random LLMs I also need to trust the LLM maker.
For text gen, or image gen I don't care but for home automation, suddenly it matters if the LLM unlocks my doors, turns on/off my cameras, turns on/off my heat/aircon, sprinklers, lights, etc...
That could be solved by using something like Anthropic's Constitutional AI[1]. This works by adding a 2nd LLM that makes sure the first LLM acts according to a set of rules (the constitution). This could include a rule to block unlocking the door unless a valid code has been presented.
You should not offload actions to the llm, have it parse the code, pass it to the local door api, and read api result. LLMs are great interfaces, let's use them as such.
Note that if going the constrained grammar route, at least ChatGPT (haven't tested on smaller models) understands BNF variants very well, and you can very much give it a compact BNF-like grammar and ask it to "translate X into grammar Y" and it works quite well even zero-shot. It will not be perfect on its own, but perhaps worth testing whether it's worth actually giving it the grammar you will be constraining its response to.
Depending on how much code/json a given model has been trained on, it may or may not also be worth testing if json is the easiest output format to get decent results for or whether something that reads more like a sentence but is still constrained enough to easily parse into JSON works better.
I just took break from messing with my HA install to read ... and low and behold!!!
First thanks for a great product, I'll be setting up a dev env in the coming weeks to fix some of the bugs (cause they are impacting me) so see you soon on that front.
Have you guys thought about the hardware barriers? Because most of my open source LLM work has been on high end desktops with lots of GPU, GPU ram and system ram? Is there any thought to Jetson as a AIO upgrade from the PI?
How does OpenAI handle the function generation? Is it unique to their model? Or does their model call a model fine-tuned for functions? Has there been any research by the Home Assistant team into GorillaLLM? It appears it’s fine-tuned to API calling and it is based on LLaMa. Maybe a Mixtral tune on their dataset could provide this? Or even just their model as it is.
I find the whole area fascinating. I’ve spent an unhealthy amount of time improving “Siri” by using some of the work from the COPILOT iOS Shortcut and giving it “functions” which are really just more iOS Shortcuts to do things on the phone like interact with my calendar. I’m using GPT-4 but it would be amazing to break free of OpenAI since they’re not so open and all.
>Constrained grammars do look like an possible alternative.
I'd suggest combining this with a something like nexusraven. i.e. both constrain it but also have an underlying model fine tuned to output in the required format. That'll improve results and let you use a much smaller model.
Another option is to use two LLMs. One to sus out the users natural lang intent and one to paraphrase the intent into something API friendly. The first model would be more suited to a big generic one, while second would be constrained & HA fine tuned.
Also have a look at project functionary on github - haven't tested it but looks similar.
I only found out about https://www.rabbit.tech/research today and, to be honest, I still don't fully understand its scope. But reading your lines, I think rabbit's approach could be how a local AI based home automation system could work.
I've gone into a frenzy of home automation this week-end, right after seeing the demo video of this "LAM" from Rabbit, thinking about the potential for software there.
Connected a few home cameras and two lights to an LLM, and made a few purchases.
The worst expensive offender being a tiny camera controlled RC Crawler[1]. The idea would for it to "patrol" my home in my name, with a sassy LLM.
I would like to see this integrated into Gnome and other desktop environments so I can have an assistant there. This would be a very complex integration, so as you develop ways to integrate more stuff keep this kind of thing in mind.
Everything we make is accessible via APIs and integrating our Assist via APIs is already possible. Here is an example of an app someone made that runs on Windows, Mac and Linux: https://github.com/timmo001/home-assistant-assist-desktop
Tell the LLM a Typescript API and ask it to generate a script to run in response to the query. Then execute it in a sandboxed JS VM. This works very well with ChatGPT. Haven't tried it with less capable LLMs.
HA is extremely modular and add-ons like these tend to be API based.
For example, the whisper speech to text integration calls an API for whisper, which doesn't have to be on the same server as HA. I run HA on a Pi 4 and have whisper running in docker on my NUC-based Plex server. This does require manual configuration but isn't that hard once you understand it.
I've been using HA for years now, and I don't think there's a single feature that's not toggleable. I expect this one to be too, and also hope that LLM offloading to their cloud is part of their paid plan.
I hope you meant Zigbee dongle instead of Z-Wave. Hue devices use Zigbee and can be directly paired to Home Assistant using a Zigbee dongle, but not using a Z-Wave dongle.
We’re working on making a voice assistant with Home Assistant[1]. We have all parts done except for wake word, which Hopefully we’ll be able to ship soon.
We’re super stoked to launch the Home Assistant Green[1]. It will be $99! The first 1000 are available today at Seeed, and 14000 more will be available end of October. More to come afterwards.
With Home Assistant Green we hope to make it very easy for people to join the Home Assistant community and experience a smart home focused on privacy and local control.
As part of our recent chapter 2 milestone, we introduced new Assist Pipelines. This allows users to configure multiple voice assistants. Your project is using the old "conversation" API. Instead it should use our new assist pipelines API. Docs: https://developers.home-assistant.io/docs/voice/pipelines/
You can even off-load the STT and TTS fully to Home Assistant and only focus on wake words.
You will see a lot higher adoption rate if users can just buy the ESP BOX and install the software on it without installing/compiling stuff. That's exactly why we created ESP Web Tools. It offers projects to offer browser-based installation directly from their website. https://esphome.github.io/esp-web-tools/
If you're going the ESP Web Tools route (and you should!), we've also created Improv Wi-Fi, a small protocol to configure Wi-Fi on the ESP device. This will allow ESP Web Tools to offer an onboarding wizard in the browser once the software has been installed. More info at https://www.improv-wifi.com/
HA would be a lot more convincing if basic layout itself alongside config wasn't YAML hell. Every time I want to create some new layout or add something new to my home screen, I dread it.
I hate using it. Yet, I have no viable OSS alternatives.
can you share more details about what's breaking? Is it a specific integration? Is it in general? What breaks? This is not consistent with most users' experience but it's hard to know without more specifics.
First of all, everyone involved in this project has been big fans and users of HA for many years (in my case at least a decade). THANK YOU! For now Willow wouldn't do anything other than light up a display and sit there without Home Assistant.
We will support the pipelines API and make it a configuration option (eventually default). HA has very rapid release cycles and as you note this is very new. At least for the time being we like the option of people being able to point Willow at older installs and have it "do something" today without requiring an HA upgrade that may or may not include breaking changes - hence the conversation API.
One of our devs is a contributor for esphome and we're heading somewhere in that direction, and he's a big fan of improv :).
We have plans for a Willow HA component and we'd love to run some ideas past the team. Conceptually, in my mind, we'll get to:
- Flashing and initial configuration from HA like esphome (possibly using esphome, but the Espressif ADF/SR/LCD/etc frameworks appear to be quite a ways out for esphome).
- Configuration for all Willow parameters from wifi to local speech commands in the HA dashboard, with dynamic and automatic updates for everything including local speech commands.
- OTA update support.
- TTS and STT components for our inference server implementation. These will (essentially) be very thin proxies for Willow but also enable use of TTS and STT functionality throughout HA.
- Various latency improvements. As the somewhat hasty and lame demo video illustrates[0] we're already "faster" than Alexa while maintaining Alexa competitive wake word, voice activity detection, noise suppression, far-field speech quality, accuracy, etc. With local command recognition on the Willow device and my HA install using Wemo switches (completely local) it's almost "you can't really believe it" fast and accurate.
I should be absolutely clear on something for all - our goal is to be the best hardware voice interface in the world (open source or otherwise) that happens to work very well with Home Assistant. Our goal is not to be a Home Assistant Voice Assistant. I hope that distinction makes at least a little sense.
You and the team are doing incredible work on that goal and while there is certainly some overlap we intend to maintain broad usability and compatibility with just about any platform (home automation, open source, closed source, commercial, whatever) someone may want to use Willow with.
In fact, our "monetization strategy" (to the extent we have one) is based on the various commercial opportunities I've been approached with over the years. Turns out no one wants to see an Amazon Echo in a doctor's office but healthcare is excited about voice (as one example) :).
Essentially, Home Assistant support in Willow will be one of the many integration modules we support, with Willow using as many bog-standard common denominator compliant protocols and transports that don't compromise our goals, while maintaining broad compatibility with just about any integration someone wants to use with Willow.
This is the very early initial release of Willow. We're happy for "end-users" to use it but we don't see the one-time configuration and build step being a huge blocker for our current target user - more technical early adopters who can stand a little pain ;).
Last Saturday we announced that ESPHome is now owned by the Open Home Foundation. The Open Home Foundation fights for privacy, choice, and sustainability for smart homes. And for every person who lives in one. Learn more at https://www.openhomefoundation.org/blog/announcing-the-open-...