Does anyone have an example of agents like AutoGPT doing something useful? Everything I've seen seems to be stuff that GPT could do anyway without the agent cruft. And iterating seems to multiply the opportunities for LLM bugs and mistakes.
The hype seems to be that agents are an emerging form of AGI, but the fine print is always "it's not quite ready for production yet, it makes a lot of mistakes, I had to fix 20 things in the output, but it's fun to watch and I'm sure once we work the bugs out..."
The intention is for them to do things that are more like projects. E.g. build a website or even a business (e.g. an ecommerce shop with marketing, etc). AutoGPT and BabyAGI are only a few weeks old, so they need time to build. So they are being criticized too early in my opinion.
They are also being hyped too early. It is 100% fair--hell: I'd even say, more correct--to criticize something for what it is, instead of trying to predict what it might one day become. If the stuff gets better, it can come back around then and soak up its deserved praise.
(Though--and this is an unrelated issue--in the case of some of this AGI work, if it actually works we are probably going to collectively wish it hadn't been built, as no good is going to come from rapidly empowering an AI to become in any way autonomous, even momentarily.)
I got a very recent version of Auto-GPT to clone a template repository, launch a nextjs dev server, and badly start iterating on a product last night. It took about 3 hours of prompt and setting tweaking and cost about $35.
- using gpt4only mode w/ 8k context for "fast" and "smart" llms
- allow local command execution
My (admittedly, magical and weird) ai.yaml looks like:
ai_name: [app]-founder
ai_role: You are working on a AI SaaS product called [app]. You complete all goals continuously to build [app]. You are a highly autonomous and efficient software engineering agent composed of ChatGPT 4 instances. You are running on macOS via AutoGPT. You rely heavily on memories from your past. You know [app] is an existing app in the current directory. You avoid 'search_files' because it will crash, so use ls instead. You avoid performing the same actions in a loop or repetitively. You manage a fleet of agents that can answer questions up to Sept. 2021.`
ai_goals:
- [something to effect of - git clone template repo from this url if it isnt already cloned]
- continuously build an app- add relevant features by questioning the state of the web app that is located at localhost:3000 && localhost:5556, and improving the code.
- understand and learn about what might make a good [product idea]
- research and build a business (with moat) around [app/biz name] (limit sources to high quality such as news.ycombinator.com, github.com, reddit.com, etc),
The only thing being solved is the reddit OP getting paid, they're overhyping everything and plays on FOMO so that you subscribe to their newsletter
> I'm kinda sad I wrote about like 3-4 of these stories in detailed in my newsletter on thursday but most won't read it because it's part of the paid sub
I also have this feeling of "everyone is doing the same thing, a GUI for chatgpt with some prepared prompts".
The thing is, it's too risky and too soon for bigger issues to be tackled. It will take quite some time before LLMs provide medical and financial advice, if they ever will.
Take the gpt-4 based law ai, harvey for instance. I doubt many people know about what they're up to at the moment but they already have deals signed with some of the biggest law firms on earth with revenue quickly growing.
now realize that 4 isn't any worse in medicine than it is in law. this stuff is far closer than people think.
The hard perhaps uncomfortable truth is that GPT-4 is already proficient enough in several fields to bounce ideas off of as a colleague/equal.
Why is that something to hold against LLMs? They're making a ton of progress and will probably significantly improve the productivity of e.g. paralegals. The fact they're not legal persons and our rules against unauthorised practice of law mean they're not going to be in courtrooms any time soon even if hypothetically they were twice as good as defense attorney work (they aren't, right now). A technology can be really impressive and indeed revolutionary without making literally every job redundant. That seems like an absurdly high bar to hold anything to.
Yeah, it's kind of ironic that there's all this concern that LLMs will automate content to an extent that there'll be an unfathomable amount of text material on the internet that it'll be impossible to distinguish but in a way... it kind of already happened with crypto. There emerged a class of people who were financially incentivized to just constantly pump out bullshit about crypto- fake projects, fake coins, fake nfts, fake metaverses, all this crud, and because there was nothing underlying it it just raised the noise floor incredibly. Well now crypto is mostly dead and those same people have moved onto noisily bullshitting about AI.
There's genuinely some amazing stuff happening, but it's being damaged by a class of morons who want to run to the front of the crowd and shout "follow me!". The important thing is to just focus on the level-headed reporting of what's really going on - places like Hard Fork, TechMeme, and Pivot are doing great thoughtful reporting on what actually matters. It's amazing to see the contrast between All-In's hosts frothily making absurdly overstated claims based on some research paper that was just published, and then you hear about the same research from actual tech reporters and guess what... you get to actually find out about the research was and what they found rather than hear some no-knowthing gleefully spew about how he's going to make the entire middle class unemployed and sit on his pile of money like a fictional dragon.
I don't know why you're being downvoted, I think this phone service is amazing and I'm saving it to my contacts. I now wish to find a bunch of them with different 'accents'.
The hype seems to be that agents are an emerging form of AGI, but the fine print is always "it's not quite ready for production yet, it makes a lot of mistakes, I had to fix 20 things in the output, but it's fun to watch and I'm sure once we work the bugs out..."