Personal Concierge Using OpenAI's ChatGPT via Telegram and Voice Messages

sheepscreek · on April 10, 2023

Great work! Thanks for taking the time to build this and open sourcing it. For me, it scratches on a very real itch. I’ve toyed with the idea of a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition (“you can’t afford to delay X, do X today and maybe reach out to Z to set the expectations for delaying Y?”).

To me, the greatest strength of LLMs is not their knowledge (which is prone to hallucination), but their ability to analyze ambiguous requests with ease and develop a sane action plan - much like a competent human.

One a side note: wouldn’t it be significantly cheaper and as effective to use ChatGPT 3.5 by default, and reserve GPT 4 for special tasks with explicit instruction (“Use GPT 4 to…”).

For most chats, GPT 4 would be incredibly wasteful (read: expensive).

Also - it would be very cool to experiment the use of GPT 3.5 and GPT 4 in the same conversation! GPT 3.5 could leverage the analysis of GPT 4 and act as the primary communication “chatbot” interface for addressing incremental requests.

jbellis · on April 10, 2023

GPT4 is so much better than 3.5 at virtually everything that I don't think it's worth trying to figure out which ones 3.5 is almost adequate enough for.

Also, the pricing is per token, so even with 4 it is close to negligible unless you are loading in a lot of context or your conversation gets very long.

literalAardvark · on April 10, 2023

The real problem with 4 is that OpenAI are having trouble keeping up with the demand.

So it might not be an option until 4 turbo comes out in 37 hours or whatever their development cycle is these days.

ericmcer · on April 10, 2023

Seriously their release cycle has been so rapid that I am building stuff with the idea that AI will be better and cheaper by the time I am ready to release it.

Right now it is a bit of a blocker because you can easily get a single prompt to cost you .005c - .01c, which would crush you if you ever had any kind of scale.

hesdeadjim · on April 10, 2023

Yea, I find it a fair bit annoying that I pay $20 and have a 25 message cap every 3 hours.

squeaky-clean · on April 10, 2023

I don't necessarily mind the cap, but I am a little annoyed that it was 50 messages every 4 hours when I first began my subscription and they lowered it without so much as an email telling me.

Closi · on April 10, 2023

I guess you have experienced something different, as I got an email saying this:

> GPT4 will have a dynamically adjusted usage cap. We expect to be severely capacity constrained, so the usage cap will depend on demand and system performance. API access will still be through the waitlist.

And then the actual 'sign-up' or marketing page before subscription didn't even say anything about getting GPT-4 (just about getting priority access to the standard chatgpt product).

Then at the bottom of each page above the box when you open GPT-4 as long as I remember it has always said something like "Current limit is this, capacity limits for GPT4 will be changed as we adjust for demand."

_oghd · on April 10, 2023

well it was something like 100 before that even.

ukuina · on April 10, 2023

I use TypingMind.com and the GPT4 API. Seems an overall better experience.

arthurcolle · on April 10, 2023

GPT-4 API still not open. I should really try Evals to get early access, been dying to give it a spin (outside of ChatGPT)

totoglazer · on April 10, 2023

It works for me.

TeMPOraL · on April 10, 2023

Because you were one of the blessed lucky ones to get access to GPT-4 API? I'm still on the waitlist :(.

resize2996 · on April 11, 2023

I pay for it and do not have access to the GPT4 API

literalAardvark · on April 10, 2023

I'm just excited to have access at all.

dror · on April 10, 2023

GPT4 is definitively the future, but GPT3.5 is the present :-).

In addition to being more expensive, GPT4 is a lot slower. For most casual things I use gpt3 and upgrade to GPT4 as needed. I've actually had a couple of days where I spent > $1 on GPT4. It's hard to do with every day chat, but easy to do when you get it to look/improve large amounts of code.

This is all from the API/CLI not the web interface.

avereveard · on April 10, 2023

This by defsult is using the zero shot agent attached to Google, it is burning trough tokens.

jjcon · on April 10, 2023

Maybe it is just my niche use cases but after spending a good few hours with both, 3.5 for me has actually produced more coherent outputs for me which is a little confusing to me. Maybe I need to rethink my prompting or something.

russ · on April 10, 2023

The one exception is probably speed. GPT-4 is noticeably slower than GPT-3.5.

https://twitter.com/natfriedman/status/1639029709395886080?s...

ratg13 · on April 10, 2023

>a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition

Everyone wants this, but this is not the product.

The current AI offerings are information in --> information out.

It is not meant to keep state long term, and it is not meant to be your friend. It is meant to answer questions with the information available to it.

You can even see in the example screenshot they showcase the fact that it is not designed to be asked follow-up questions.

hn_throwaway_99 · on April 10, 2023

That's just the case for this bot, but is obviously not the case for ChatGPT.

I have a long running conversation with ChatGPT that I use to keep track of a verbal to-do list. I tell it my items with categories (e.g. work, personal, etc.) and estimated times, and then it outputs my complete task list, grouped by category. I then just tell it when I add tasks or complete tasks, and it continually keeps track of and outputs my current outstanding task list.

I've been using this for weeks now, and since it's all in a single conversation ChatGPT can keep track of the entire state over time.

I don't have access to plugins yet but it would be trivial to implement a personal AI assistant with ChatGPT if it could, for example, look up flight times and prices.

m3kw9 · on April 10, 2023

And what happens when your window runs out? It prunes the oldest convo which you will need to keep track. Likely it could be fine, but if there was a task from a long time ago, that could get wiped out

hn_throwaway_99 · on April 10, 2023

Every time I add or complete a task, ChatGPT outputs the latest complete copy of my task list (I only had to ask it to do this the first time, then it just did that automatically). So, in other words, it always updates the latest "live state" in its most recent message. If I hit the limit on context window it would just take me ~30 seconds to open up a new window with my original prompt and the latest iteration of my task list if I so desired.

Here's where I posted a snippet of this convo a couple weeks ago: https://news.ycombinator.com/item?id=35390644

majormajor · on April 11, 2023

This looks way more like "improved chat interface to a Reminders app compared to Siri" or such than "prioritize and remember things with human intuition".

aqme28 · on April 10, 2023

You could store whatever personal data you want in embeddings and use the api to refer to those. Can't do it on vanilla ChatGPT with the website though, I don't think.

majormajor · on April 11, 2023

Am I missing something or is that the opposite of how embeddings work?

https://platform.openai.com/docs/guides/chat - this API endpoint for completions doesn't take embeddings, just messages.

Their API docs for embeddings also don't talk about using them to get outside of the context size limit; instead, the way I've used it and seen others is "create embeddings from documents to enable fast search for relevant documents to populate in context" which still requires a separate data store.

pixl97 · on April 10, 2023

I'm assuming at this point you'd need a means of pruning expired information from your list, and then importing that shorter list into a new session.

I guess the question is, can we get ChatGPT to make that new list reliably?

ratg13 · on April 10, 2023

ChatGPT can keep some state for you, but there is a limit to the amount of tokens you can keep going in an instance.

It’s enough to keep a todo list going, it’s not enough to make it your friend / coworker

If you built what you were describing right now, either the flight questions would push out your todo list, or you would need to build something to keep state yourself.

ravenstine · on April 10, 2023

Oh dang, that sounds awesome. So a ChatGPT conversation doesn't have a historical limit? I guess I assumed it would start having to forget things at a certain point.

hesdeadjim · on April 10, 2023

As far as I’m aware it will definitely forget things as the history grows. You can “remind” it about things to keep them relevant, and I think a cool product built like OP’s project would take that into consideration for long running tasks.

zwily · on April 10, 2023

Yes, there’s a limit. Right now it’s 4000 tokens (gpt4 has a 32k model but i don’t think it’s available yet, at least with chatgpt). Once you near the limit, chatgpt will start dropping previous messages to stay under the limit. I don’t know what their algorithm is for deciding what to drop. It could be as simple as dropping the oldest stuff. Or maybe taking a long message and replacing it with a summary. But at a certain point, the conversation becomes “lossy”.

rolisz · on April 10, 2023

If it periodically outputs the latest state of the todo list, it won't have the problem of forgetting things from a long time ago.

hn_throwaway_99 · on April 10, 2023

FWIW that's exactly what it does (updates the current state of the whole todo list each time). After two weeks of use I've never had it make a mistake in keeping track of my total list of tasks. By default (I mean I didn't ask it to do this) it outputs the total estimated time for each category of tasks, and that summation has been wrong (which wasn't that surprising to me, as "LLMs aren't great at math" is a known issue), but even then I just tell it "Can you double check the category totals?" and it fixes them.

haukurb · on April 11, 2023

They might make it summarize intermittently behind the scenes when the context grows too large. Which would be how it has a gist of the stuff that was mentioned way earlier even thought it doesn't know the specifics.

JanSt · on April 10, 2023

You can use Langchain / Plugins to get around that issue

tudorw · on April 10, 2023

good point, I'm doing this with chatGPT, I use a long conversation with 3.5 to help me write prompts for 4, it's fun, when I hit the rate cap on 4 I'll go back to 3.5 with what 4 came up with, then converse on the topic until my 4 cap lifts, combine that with being able to ask bing for things like links and current information and dalle for image based visualisations makes for an intriguing combination, bard gets a look in to but so far seems a little shy compared to 4 or bing.

umaar · on April 10, 2023

Made something similar recently, but for WhatsApp: https://chatbling.net/

What behaviour would users prefer when uploading a voice message, a) the voice message is transcribed, so speech to text? Or b) the voice message is treated as a query, so you receive a text answer to your voice query?

I've done a) for now as mobile devices already let you type with your voice.

swores · on April 10, 2023

I'd quite like a twilio script I could host that enables voice to voice with ChatGPT over a phone call, but for messaging apps (I'm gonna to try yours, though would prefer Signal) I'd personally prefer to stick with typing and use Apple's transcription (the default microphone on iOS keyboard) for any voice stuff - still wanting text back.

This is (in addition to the fact that Apple's works pretty well for me) mostly because that way I get to see the words appear as I'm speaking, and can fix any problems in real-time rather than waiting until I've finished leaving a voice note to find out it messed up. Bing AI chat, for example, trying to use their microphone button just leads to frustration as it regularly fails to understand me. But maybe Whisper is so good that I'd hardly ever need to care about errors?

I do suspect I'm an outlier in terms of how I use dictation, checking as I go - at least based on family members, they seem to either speak a sentence then look at it, or speak and then send without looking - so for them, off-device transcription would probably be welcome as long as it even slightly improves accuracy rates.

umaar · on April 10, 2023

I see my server has restarted a few times! I imagine it's folks here since I haven't shared Chat Bling elsewhere yet. Sorry to anyone who started generating images, but haven't received a response. The 'jobs' for images generations are stored entirely within memory, so a server restart will lose all of that.

Going forward, I'll explore storing image jobs in redis or something, which will be more resilient to server crashes.

As for conversation history, I'll continue to keep that in memory for now (messages are evicted after a short time period, or if messages consume too many OpenAI tokens) - even that's lost during a server restart/crash. Feels like quite a big decision to store sensitive chat history in a persistent database, from a privacy standpoint.

swores · on April 10, 2023

You could have a default "will be wiped after <x time>" policy / notification up front, plus an option to change this (in either direction, one way to "only store this in RAM not the DB, and wipe it as soon as I close this window - or maybe after an hour of inactivity", the other way to "please never delete (we reserve the right to delete anyway but will keep for at least Y days/months/whatever)". And also a "delete now" button to override. And then a cron job checking what's due to be deleted and wiping them from the DB/memory?

Of course, it maybe also adds more pressure to keep the server more secure without private conversations being accessible after a reboot...

umaar · on April 10, 2023

Agreed, giving the user a choice would be best here. Something tells me most users would not change it from whatever the default is, but yeah still good to expose this as a setting which should be doable. Thanks for the input!

swores · on April 10, 2023

Np - and you're probably right that I'm in the minority of people who'd care about having as much granular control as possible... maybe most people would rather something closer to a browser's privacy mode, so just a toggle on and off between very private and don't care about private?

jimmyjack · on April 10, 2023

How did you get Meta to approve? Been trying for so long.

jaggs · on April 11, 2023

This is very cool. I tested it with a quick reminder request and it seemed to work. I'm a bit terrified by the privacy issue though. Combining OpenAI with WhatsApp seems like a marriage made in hell.

I guess the only solution will be to move to local bots and models on the phone which will interface out only when needed.

djohnston · on April 10, 2023

dude how did you get Meta to approve your WA Business? I couldn't get verified after like two weeks of trying and gave up :(

hombre_fatal · on April 10, 2023

This is the new hello world.

https://github.com/danneu/telegram-chatgpt-bot

https://t.me/god_in_a_bot (demo bot)

I tried building this for WhatsApp but Twilio is weirdly expensive. I don't even think Twilio is cheap for sending 2FA tokens.

moralestapia · on April 10, 2023

I'm also on Twilio and yes it is expensive, a longish call (10 mins) comes to about $1.

They charge you for:

* Time spent using the "Twilio Client" (whatever that means)

* Inbound call time

* Transcription of each audio chunk, billing them at a minimum of 15s per function call

* Every time you use their text-to-speech functions (not even that is free)

aqme28 · on April 10, 2023

It is expensive, so I made a version for myself as a Discord bot https://github.com/alexQueue/GPTBotHost/ (note: code is sketchy. Not really cleaned up for the public)

hombre_fatal · on April 10, 2023

> note: code is sketchy. Not really cleaned up for the public

No worries, you're in good company.

aivisol · on April 10, 2023

Nice work. I have a question though. The example chat window you show has an interaction where AI explains that it cannot remember the previous question. Isn’t Langchain there for exact that purpose or am I missing something?

andag · on April 10, 2023

Can you update the readme with some info privacy wise. Some info on who I'm sharing my data with?

Openai - fair enough, already doing that a fair amount .

pantulis · on April 10, 2023

It seems that you can self host the thing. Apart from that, it seems that you would be sharing info obviously with OpenAI (both GPT and Whisper), Telegram and Google.

aftergibson · on April 10, 2023

I'm guessing on step 3, the meant touch .env, not mkdir.

  mkdir .env and fill the following:

    TELEGRAM_TOKEN=
    OPENAI_API_KEY=
    PLAY_HT_SECRET_KEY=
    PLAY_HT_USER_ID=

marckohlbrugge · on April 10, 2023

For people who are looking for a hosted solution: https://t.me/marcbot

Being able to use voice messages as an interface makes a huge difference. I can just ramble on, sharing my thoughts, and then have GPT turn it into something sensible.

Great for brainstorming, getting your thoughts out on "paper", etc.

mkw5053 · on April 10, 2023

I’ve been heavily using chatgpt (gpt 4) on my honeymoon/baby moon/vacation in Spain. Everything from itineraries to asking art history questions in museums. I’ve mainly been using the voice input on my iPhone for chatgpt on a mobile browser and I can’t help but think how useful better voice support will be.

tikkun · on April 11, 2023

I've got an iPhone app in testflight beta that has speech to text and text to speech. Basically a nicer iPhone app for GPT-4, I tried most of the existing ones and none were particularly nice UX.

Pricing model for now is you just pay exactly what we pay (we just pass on the API costs plus Apple's 30%, no markup). We could add a use your own API key thing too to avoid Apple's 30%.

If you'd like access, email in profile

golergka · on April 10, 2023

Did you have access to plugins?

m348e912 · on April 10, 2023

I don't know what the parent used, but here is an example of how to integrate GPT with your iPhone.

https://twitter.com/mckaywrigley/status/1640414764852711425

mkw5053 · on April 10, 2023

Unfortunately I’m still waiting

MetaWhirledPeas · on April 10, 2023

Not as cool, but there for the lazy: install the Bing app on your phone (I guess you need to be accepted into the beta first?). I use it as a slow-thinking alternative to Google Assistant that usually gives much better answers.

throwaway2203 · on April 10, 2023

The Bing app isn't as responsive as ChatGPT. I asked it a slightly question about my taxes and it "binged" something weird and gave me a non-answer generic response.

titaniczero · on April 10, 2023

I’ve noticed that bing chat is better if you instruct it not to search anything, that way it will use the model knowledge. I’ve learned to use the model knowledge or the web search results summary depending on what I want. But ChatGPT is still way better for model knowledge because it has fewer restrictions.

I wish they would make this distinction clearer in the UI. Most of the time it can answer without resorting to search, I think it would be better if the user explicitly specifies that they need web results.

MetaWhirledPeas · on April 10, 2023

It definitely has a web search bias, no surprise, but that's kind of its superpower too. It lacks the snappy responses that Assistant can give, especially with routine questions like the weather.

rapsey · on April 10, 2023

Are there any offline text to speech options that supports a wide variety of languages?

floitsch · on April 10, 2023

Did something similar (without voice) that runs on an esp32. This way I don't need any server or keep my desktop machine running.

Supports Telegram and Discord.

https://github.com/floitsch/ai-projects/tree/main/chat

sheepscreek · on April 10, 2023

OP integrated LangChan and the ability to Google results (and a neat way to integrate more agents). That’s the main draw for me in their implementation.

floitsch · on April 10, 2023

Yeah... That would be hard to do in the limited memory of the ESP32. Main issue is the cost of TLS connections.

Heloseaa · on April 10, 2023

Recently did the same in a lightweight alternative with python: https://github.com/clemsau/telegram-gpt

Looking to make it accessible, cheap and as lean as possible. I'd love to hear potential features ideas.

Hadriel · on April 11, 2023

can you choose to use gpt4?

Heloseaa · on April 11, 2023

I will look into it when I will be granted access to the GPT4. But yeah, I plan to the make accessible the switch between GPT 3.5 and GPT 4 right into telegram.

Fauntleroy · on April 10, 2023

Using a cloud-hosted AI with a Terms-of-Service as an assistant is a recipe for disaster in the future. I can't wait for the future where everyone is reliant on a corporate spy for everything they do.

yewenjie · on April 10, 2023

Play.ht's API is like $99/month. Is it possible to use any other TTS service? (Also, does play.ht just use Azure underneath?)

55555 · on April 10, 2023

I want something like this but I don't want to have to host it myself. Is there any I can simply sign up to?

marckohlbrugge · on April 10, 2023

I made https://t.me/marcbot a few weeks ago. Comes with voice memo support, etc.

voicedYoda · on April 10, 2023

Page crashed?

Hadriel · on April 11, 2023

it's not working :(

jcims · on April 10, 2023

Does anyone have a suggestion for doing something similar with SMS? I've been tinkering with it but it seems that there are some regulations that will require me to have a commercial organization registered to allow SMS to 10 digit North America numbers.

christiangenco · on April 10, 2023

I don't think you need a commercial organization registered to get a number on Twilio.

I hooked up an old Twilio number I bought a while ago to ChatGPT for an ADHD encouragement bot last week: https://attaboy.ai

Now I can message it via. WhatsApp or Telegram and it even remembers chat history (by storing the last ~20 messages in Firebase).

jcims · on April 10, 2023

Nice! I'm using twilio as well, will try to power through it.

mwlp · on April 10, 2023

I added gpt to a bot of mine for Telegram and Discord a few weeks ago. I'm constantly amazed at how the littlest of things can spawn so many new opportunities for inside jokes and meta humor.

yosito · on April 10, 2023

Great, so instead of sharing all your data with one party, you can share it with 3+ parties

MH15 · on April 10, 2023

This is being downvoted but it's an important thing to consider. As we move to doing more with these systems we're going to start seeing restrictions on which AI tools we can use at work/school/home.

soederpop · on April 10, 2023

i was able to do something similar with Siri using the shortcuts app. You can have siri transcribe your text and post it to an endpoint and then read the response back to you.

is it possible to use gpt-4 with langchain?

hankman86 · on April 10, 2023

It amazes me to no end that some people would feed private conversations and other sensitive data into an experimental chat bot. Don’t these people not know that ChatGPT it not a mature technology, that does not reliably isolate sessions and may even permanently ingest user data for training purposes?

GPT and other LLMs are currently integrated into countless products and hobbyist projects. Expect an avalanche of lawsuits on the grounds of LLMs being structurally incompatible with notorious privacy laws like the GDPR. For instance, how would they implement the GDPR’s “right to be forgotten”? Untrain the model?

Cyphase · on April 10, 2023

(I may have misunderstood your comment to some extent, but I'm going to send this reply anyway even if just to clarify for anyone else who might misunderstand.)

---

I agree with "be careful what you send to the chat bot", but let's clarify some things in case you or someone else reading your comment is misunderstanding.

LLMs aren't immature AI brains that "may even permanently ingest user data for training purposes". They're just models, which are represented by an architecture described in readable source code, and weights derived from training.

There is a very clear delineation between inference and training. Models are static when being used for inference. You don't need to "untrain" the model after you ask it something; you never trained it in the first place. Running inference does not change the trained weights.

If you're talking about OpenAI specifically saving ChatGPT data for later training purposes, they absolutely are doing that; they aren't hiding it. But that's a purposeful "let's take this data and use it for training", not "oh no, our immature tech accidentally ingested prompt data, how do we untrain it"?

semiquaver · on April 10, 2023

from the “reliably isolate sessions” reference I have to assume they are referring to this: https://news.ycombinator.com/item?id=35291112

Cyphase · on April 10, 2023

I figured the same; I didn't address that point.

fl7305 · on April 10, 2023

> Running inference does not change the trained weights.

That's true today.

I don't know how many days (or hours) away we are from GPT-4 running a LoRA-pass to update its weights after each round though?

Cyphase · on April 10, 2023

That would be an explicit decision by OpenAI, not a result of immature tech.

fl7305 · on April 11, 2023

Sure. But my point was that it is not an inherent feature in LLMs that they are frozen in time.

Fine-tuning the entire model is very expensive. But fine-tuning a tiny parallell piece using LoRA is cheap both in CPU cycles and storage.

OpenAI could already have implemented an auto-update feature without telling us.

In the future, I can see them selling a premium feature where you have your own LoRA-addon that gets constantly trained on your interactions with it, so you get your own personalized GPT-4.

alach11 · on April 10, 2023

> If you're talking about OpenAI specifically saving ChatGPT data for later training purposes, they absolutely are doing that

They claim they're not retaining data through the API.

Cyphase · on April 10, 2023

That's correct (after 30 days). When I say ChatGPT I mean the web-based frontend product, not the models behind it which you can also access via API.

hhh · on April 10, 2023

People continue to push this, without reading the privacy statements regarding API use.

https://openai.com/policies/api-data-usage-policies https://openai.com/blog/introducing-chatgpt-and-whisper-apis

tyingq · on April 10, 2023

"Note that this data policy does not apply to OpenAI's Non-API consumer services like ChatGPT or DALL·E Labs. You can learn more about these policies in our data usage for consumer services FAQ."

This particular project is API based, so the above doesn't apply, but I have seen several projects that scrape via ChatGPT, where your data is used:

"Does OpenAI train on my content to improve model performance?

For non-API consumer products like ChatGPT and DALL-E, we may use content such as prompts, responses, uploaded images, and generated images to improve our services."[1]

[1] https://help.openai.com/en/articles/7039943-data-usage-for-c...

drowsspa · on April 10, 2023

Even if you accept their pinky promise, won't people seeing others people's conversations just a few weeks ago?

dgellow · on April 10, 2023

Only discussion titles (some payment information also leaked though).

hhh · on April 12, 2023

This is unrelated to API use.

weikju · on April 10, 2023

As my friends would say "all my data is out there already, so it doesn't matter anymore"... this is what we're dealing with here.

dgellow · on April 10, 2023

OpenAI says they aren’t training on chat GPT conversations. Up to you to see if you believe them or not. They also said having a 30 days data retention policy, that is compatible with GDPR.

https://techcrunch.com/2023/03/01/addressing-criticism-opena...

avereveard · on April 10, 2023

on one hand yes there is a real threat of these company misusing personal data, espèecially if you use the public side of the API (i.e. not the one from within azure, which has a separate set of privacy guarantees as far as I can understand)

on the other hand this is a guardrail like the many others that GPT already has, if I search my name I get a 'not notable enough' answer already

elif · on April 10, 2023

for commercial snoops you have very strong legal protections over your data if you seek to enforce it legally.

for government snoops you don't have any privacy anyway.