Great work! Thanks for taking the time to build this and open sourcing it. For me, it scratches on a very real itch. I’ve toyed with the idea of a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition (“you can’t afford to delay X, do X today and maybe reach out to Z to set the expectations for delaying Y?”).
To me, the greatest strength of LLMs is not their knowledge (which is prone to hallucination), but their ability to analyze ambiguous requests with ease and develop a sane action plan - much like a competent human.
One a side note: wouldn’t it be significantly cheaper and as effective to use ChatGPT 3.5 by default, and reserve GPT 4 for special tasks with explicit instruction (“Use GPT 4 to…”).
For most chats, GPT 4 would be incredibly wasteful (read: expensive).
Also - it would be very cool to experiment the use of GPT 3.5 and GPT 4 in the same conversation! GPT 3.5 could leverage the analysis of GPT 4 and act as the primary communication “chatbot” interface for addressing incremental requests.
GPT4 is so much better than 3.5 at virtually everything that I don't think it's worth trying to figure out which ones 3.5 is almost adequate enough for.
Also, the pricing is per token, so even with 4 it is close to negligible unless you are loading in a lot of context or your conversation gets very long.
Seriously their release cycle has been so rapid that I am building stuff with the idea that AI will be better and cheaper by the time I am ready to release it.
Right now it is a bit of a blocker because you can easily get a single prompt to cost you .005c - .01c, which would crush you if you ever had any kind of scale.
I don't necessarily mind the cap, but I am a little annoyed that it was 50 messages every 4 hours when I first began my subscription and they lowered it without so much as an email telling me.
I guess you have experienced something different, as I got an email saying this:
> GPT4 will have a dynamically adjusted usage cap. We expect to be severely capacity constrained, so the usage cap will depend on demand and system performance. API access will still be through the waitlist.
And then the actual 'sign-up' or marketing page before subscription didn't even say anything about getting GPT-4 (just about getting priority access to the standard chatgpt product).
Then at the bottom of each page above the box when you open GPT-4 as long as I remember it has always said something like "Current limit is this, capacity limits for GPT4 will be changed as we adjust for demand."
GPT4 is definitively the future, but GPT3.5 is the present :-).
In addition to being more expensive, GPT4 is a lot slower. For most casual things I use gpt3 and upgrade to GPT4 as needed. I've actually had a couple of days where I spent > $1 on GPT4. It's hard to do with every day chat, but easy to do when you get it to look/improve large amounts of code.
This is all from the API/CLI not the web interface.
Maybe it is just my niche use cases but after spending a good few hours with both, 3.5 for me has actually produced more coherent outputs for me which is a little confusing to me. Maybe I need to rethink my prompting or something.
That's just the case for this bot, but is obviously not the case for ChatGPT.
I have a long running conversation with ChatGPT that I use to keep track of a verbal to-do list. I tell it my items with categories (e.g. work, personal, etc.) and estimated times, and then it outputs my complete task list, grouped by category. I then just tell it when I add tasks or complete tasks, and it continually keeps track of and outputs my current outstanding task list.
I've been using this for weeks now, and since it's all in a single conversation ChatGPT can keep track of the entire state over time.
I don't have access to plugins yet but it would be trivial to implement a personal AI assistant with ChatGPT if it could, for example, look up flight times and prices.
And what happens when your window runs out? It prunes the oldest convo which you will need to keep track. Likely it could be fine, but if there was a task from a long time ago, that could get wiped out
Every time I add or complete a task, ChatGPT outputs the latest complete copy of my task list (I only had to ask it to do this the first time, then it just did that automatically). So, in other words, it always updates the latest "live state" in its most recent message. If I hit the limit on context window it would just take me ~30 seconds to open up a new window with my original prompt and the latest iteration of my task list if I so desired.
This looks way more like "improved chat interface to a Reminders app compared to Siri" or such than "prioritize and remember things with human intuition".
You could store whatever personal data you want in embeddings and use the api to refer to those. Can't do it on vanilla ChatGPT with the website though, I don't think.
Their API docs for embeddings also don't talk about using them to get outside of the context size limit; instead, the way I've used it and seen others is "create embeddings from documents to enable fast search for relevant documents to populate in context" which still requires a separate data store.
ChatGPT can keep some state for you, but there is a limit to the amount of tokens you can keep going in an instance.
It’s enough to keep a todo list going, it’s not enough to make it your friend / coworker
If you built what you were describing right now, either the flight questions would push out your todo list, or you would need to build something to keep state yourself.
Oh dang, that sounds awesome. So a ChatGPT conversation doesn't have a historical limit? I guess I assumed it would start having to forget things at a certain point.
As far as I’m aware it will definitely forget things as the history grows. You can “remind” it about things to keep them relevant, and I think a cool product built like OP’s project would take that into consideration for long running tasks.
Yes, there’s a limit. Right now it’s 4000 tokens (gpt4 has a 32k model but i don’t think it’s available yet, at least with chatgpt). Once you near the limit, chatgpt will start dropping previous messages to stay under the limit. I don’t know what their algorithm is for deciding what to drop. It could be as simple as dropping the oldest stuff. Or maybe taking a long message and replacing it with a summary. But at a certain point, the conversation becomes “lossy”.
FWIW that's exactly what it does (updates the current state of the whole todo list each time). After two weeks of use I've never had it make a mistake in keeping track of my total list of tasks. By default (I mean I didn't ask it to do this) it outputs the total estimated time for each category of tasks, and that summation has been wrong (which wasn't that surprising to me, as "LLMs aren't great at math" is a known issue), but even then I just tell it "Can you double check the category totals?" and it fixes them.
They might make it summarize intermittently behind the scenes when the context grows too large. Which would be how it has a gist of the stuff that was mentioned way earlier even thought it doesn't know the specifics.
good point, I'm doing this with chatGPT, I use a long conversation with 3.5 to help me write prompts for 4, it's fun, when I hit the rate cap on 4 I'll go back to 3.5 with what 4 came up with, then converse on the topic until my 4 cap lifts, combine that with being able to ask bing for things like links and current information and dalle for image based visualisations makes for an intriguing combination, bard gets a look in to but so far seems a little shy compared to 4 or bing.
What behaviour would users prefer when uploading a voice message, a) the voice message is transcribed, so speech to text? Or b) the voice message is treated as a query, so you receive a text answer to your voice query?
I've done a) for now as mobile devices already let you type with your voice.
I'd quite like a twilio script I could host that enables voice to voice with ChatGPT over a phone call, but for messaging apps (I'm gonna to try yours, though would prefer Signal) I'd
personally prefer to stick with typing and use Apple's transcription (the default microphone on iOS keyboard) for any voice stuff - still wanting text back.
This is (in addition to the fact that Apple's works pretty well for me) mostly because that way I get to see the words appear as I'm speaking, and can fix any problems in real-time rather than waiting until I've finished leaving a voice note to find out it messed up. Bing AI chat, for example, trying to use their microphone button just leads to frustration as it regularly fails to understand me. But maybe Whisper is so good that I'd hardly ever need to care about errors?
I do suspect I'm an outlier in terms of how I use dictation, checking as I go - at least based on family members, they seem to either speak a sentence then look at it, or speak and then send without looking - so for them, off-device transcription would probably be welcome as long as it even slightly improves accuracy rates.
I see my server has restarted a few times! I imagine it's folks here since I haven't shared Chat Bling elsewhere yet. Sorry to anyone who started generating images, but haven't received a response. The 'jobs' for images generations are stored entirely within memory, so a server restart will lose all of that.
Going forward, I'll explore storing image jobs in redis or something, which will be more resilient to server crashes.
As for conversation history, I'll continue to keep that in memory for now (messages are evicted after a short time period, or if messages consume too many OpenAI tokens) - even that's lost during a server restart/crash. Feels like quite a big decision to store sensitive chat history in a persistent database, from a privacy standpoint.
You could have a default "will be wiped after <x time>" policy / notification up front, plus an option to change this (in either direction, one way to "only store this in RAM not the DB, and wipe it as soon as I close this window - or maybe after an hour of inactivity", the other way to "please never delete (we reserve the right to delete anyway but will keep for at least Y days/months/whatever)". And also a "delete now" button to override. And then a cron job checking what's due to be deleted and wiping them from the DB/memory?
Of course, it maybe also adds more pressure to keep the server more secure without private conversations being accessible after a reboot...
Agreed, giving the user a choice would be best here. Something tells me most users would not change it from whatever the default is, but yeah still good to expose this as a setting which should be doable. Thanks for the input!
Np - and you're probably right that I'm in the minority of people who'd care about having as much granular control as possible... maybe most people would rather something closer to a browser's privacy mode, so just a toggle on and off between very private and don't care about private?
This is very cool. I tested it with a quick reminder request and it seemed to work. I'm a bit terrified by the privacy issue though. Combining OpenAI with WhatsApp seems like a marriage made in hell.
I guess the only solution will be to move to local bots and models on the phone which will interface out only when needed.
It is expensive, so I made a version for myself as a Discord bot https://github.com/alexQueue/GPTBotHost/ (note: code is sketchy. Not really cleaned up for the public)
Nice work.
I have a question though. The example chat window you show has an interaction where AI explains that it cannot remember the previous question. Isn’t Langchain there for exact that purpose or am I missing something?
It seems that you can self host the thing. Apart from that, it seems that you would be sharing info obviously with OpenAI (both GPT and Whisper), Telegram and Google.
Being able to use voice messages as an interface makes a huge difference. I can just ramble on, sharing my thoughts, and then have GPT turn it into something sensible.
Great for brainstorming, getting your thoughts out on "paper", etc.
I’ve been heavily using chatgpt (gpt 4) on my honeymoon/baby moon/vacation in Spain. Everything from itineraries to asking art history questions in museums. I’ve mainly been using the voice input on my iPhone for chatgpt on a mobile browser and I can’t help but think how useful better voice support will be.
I've got an iPhone app in testflight beta that has speech to text and text to speech. Basically a nicer iPhone app for GPT-4, I tried most of the existing ones and none were particularly nice UX.
Pricing model for now is you just pay exactly what we pay (we just pass on the API costs plus Apple's 30%, no markup). We could add a use your own API key thing too to avoid Apple's 30%.
Not as cool, but there for the lazy: install the Bing app on your phone (I guess you need to be accepted into the beta first?). I use it as a slow-thinking alternative to Google Assistant that usually gives much better answers.
The Bing app isn't as responsive as ChatGPT. I asked it a slightly question about my taxes and it "binged" something weird and gave me a non-answer generic response.
I’ve noticed that bing chat is better if you instruct it not to search anything, that way it will use the model knowledge. I’ve learned to use the model knowledge or the web search results summary depending on what I want. But ChatGPT is still way better for model knowledge because it has fewer restrictions.
I wish they would make this distinction clearer in the UI. Most of the time it can answer without resorting to search, I think it would be better if the user explicitly specifies that they need web results.
It definitely has a web search bias, no surprise, but that's kind of its superpower too. It lacks the snappy responses that Assistant can give, especially with routine questions like the weather.
OP integrated LangChan and the ability to Google results (and a neat way to integrate more agents). That’s the main draw for me in their implementation.
I will look into it when I will be granted access to the GPT4.
But yeah, I plan to the make accessible the switch between GPT 3.5 and GPT 4 right into telegram.
Using a cloud-hosted AI with a Terms-of-Service as an assistant is a recipe for disaster in the future. I can't wait for the future where everyone is reliant on a corporate spy for everything they do.
Does anyone have a suggestion for doing something similar with SMS? I've been tinkering with it but it seems that there are some regulations that will require me to have a commercial organization registered to allow SMS to 10 digit North America numbers.
I added gpt to a bot of mine for Telegram and Discord a few weeks ago. I'm constantly amazed at how the littlest of things can spawn so many new opportunities for inside jokes and meta humor.
This is being downvoted but it's an important thing to consider. As we move to doing more with these systems we're going to start seeing restrictions on which AI tools we can use at work/school/home.
i was able to do something similar with Siri using the shortcuts app. You can have siri transcribe your text and post it to an endpoint and then read the response back to you.
It amazes me to no end that some people would feed private conversations and other sensitive data into an experimental chat bot. Don’t these people not know that ChatGPT it not a mature technology, that does not reliably isolate sessions and may even permanently ingest user data for training purposes?
GPT and other LLMs are currently integrated into countless products and hobbyist projects. Expect an avalanche of lawsuits on the grounds of LLMs being structurally incompatible with notorious privacy laws like the GDPR. For instance, how would they implement the GDPR’s “right to be forgotten”? Untrain the model?
(I may have misunderstood your comment to some extent, but I'm going to send this reply anyway even if just to clarify for anyone else who might misunderstand.)
---
I agree with "be careful what you send to the chat bot", but let's clarify some things in case you or someone else reading your comment is misunderstanding.
LLMs aren't immature AI brains that "may even permanently ingest user data for training purposes". They're just models, which are represented by an architecture described in readable source code, and weights derived from training.
There is a very clear delineation between inference and training. Models are static when being used for inference. You don't need to "untrain" the model after you ask it something; you never trained it in the first place. Running inference does not change the trained weights.
If you're talking about OpenAI specifically saving ChatGPT data for later training purposes, they absolutely are doing that; they aren't hiding it. But that's a purposeful "let's take this data and use it for training", not "oh no, our immature tech accidentally ingested prompt data, how do we untrain it"?
Sure. But my point was that it is not an inherent feature in LLMs that they are frozen in time.
Fine-tuning the entire model is very expensive. But fine-tuning a tiny parallell piece using LoRA is cheap both in CPU cycles and storage.
OpenAI could already have implemented an auto-update feature without telling us.
In the future, I can see them selling a premium feature where you have your own LoRA-addon that gets constantly trained on your interactions with it, so you get your own personalized GPT-4.
"Note that this data policy does not apply to OpenAI's Non-API consumer services like ChatGPT or DALL·E Labs. You can learn more about these policies in our data usage for consumer services FAQ."
This particular project is API based, so the above doesn't apply, but I have seen several projects that scrape via ChatGPT, where your data is used:
"Does OpenAI train on my content to improve model performance?
For non-API consumer products like ChatGPT and DALL-E, we may use content such as prompts, responses, uploaded images, and generated images to improve our services."[1]
OpenAI says they aren’t training on chat GPT conversations. Up to you to see if you believe them or not. They also said having a 30 days data retention policy, that is compatible with GDPR.
on one hand yes there is a real threat of these company misusing personal data, espèecially if you use the public side of the API (i.e. not the one from within azure, which has a separate set of privacy guarantees as far as I can understand)
on the other hand this is a guardrail like the many others that GPT already has, if I search my name I get a 'not notable enough' answer already
To me, the greatest strength of LLMs is not their knowledge (which is prone to hallucination), but their ability to analyze ambiguous requests with ease and develop a sane action plan - much like a competent human.
One a side note: wouldn’t it be significantly cheaper and as effective to use ChatGPT 3.5 by default, and reserve GPT 4 for special tasks with explicit instruction (“Use GPT 4 to…”).
For most chats, GPT 4 would be incredibly wasteful (read: expensive).
Also - it would be very cool to experiment the use of GPT 3.5 and GPT 4 in the same conversation! GPT 3.5 could leverage the analysis of GPT 4 and act as the primary communication “chatbot” interface for addressing incremental requests.