Give an LLM the text of a PDF document. Ask the model to extract values in the document or in tables. Input the values into a spreadsheet. This is at minimum a task which costs companies around the world Hundreds of Millions of dollars a year.
Having worked on this directly and used basically every other piece of "automation" software available to do this, I can tell you that the GenAI solution is far superior. It's not close.
If you tell a multi modal LLM to extract and structure the contents of a PDF it will absolutely be able to do that successfully. Further, they display a surprising ability at abductive “reasoning” (acknowledging of course they don’t reason at all), and are thus able to make pretty reasonable assumption given the semantic context of a request. Unlike traditional extraction tools that require very specific tuning and are very fragile to structural changes in layout, LLMs tend to be very resilient to such things.
> I pay for GPT4 enterprise, try again, only this time answer the question.
When I'm frustrated, I talk to ChatGPT like that.
It works as well for the LLM as it does for the humans in this thread.
What's worse is, I'd been writing some SciFi set in 2030 since well before Transformer models were invented, and predicted in early drafts of my fiction that you'd get better results from AI if you treated them with the same courtesies that you'd use for a human just because they learn by mimicking us (which turned out to be true), and yet I'm still making this mistake IRL when I talk to the AI…
What Step? Do you think we're lying? I literally have built an application which takes in PDF's and extracts over 2 dozen values from it via prompting with Langchain. Do you think I'm a paid OpenAI sponsor or what?
You also realize that OpenAI has a dedicated Document Assistant which will literally extract information from a document you upload using prompts? Are you just unable to get that to work? I just don't know what you arguing at this point, like you're watching people walk backwards and then yelling that it's impossible for humans to walk backwards.
Yes you send a message to the API containing the context you are interested in along with questions about that context the LLM can answer. The parlance commonly used refers to that message as a prompt. I don't know if you're delusional or just completely clueless about how LLM's work.
Honestly just a skill issue on your part. RAG and one shot learning can get you incredibly far and if you can't figure it out you're ngmi. And no one is using ChatGPT for this lol.
Wait do you not realize what a prompt or a context window is? You literally think GPT just does things on it's own?
You understand that If I want to extract the Date a letter was sent, who the recipient was, what amount is due on an invoice, etc that I have to send a specific prompt asking that to GPT with the PDF in the context? Do you just literally not know how this works?
Yeah I'd love to see the results... If anything if there is a multimillion dollar benefit on the table one might argue that companies should publish this data in a more useful format. But no, let's bandaid over outdated practices like PDF-only data and burn GPU cycles to do it, with 92% accurate results.
A lot of things don’t require 100% accuracy, and raging against the worlds outdated practices doesn’t solve problems faced in the immediate present - and after spending 30 years of my career rating at those practices to absolutely no meaningful effect, a more effective bandaid that has a semantic understanding of the content is probably as good as you get. And GPU cycles are meant to be wasted.