Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Give an LLM the text of a PDF document. Ask the model to extract values in the document or in tables. Input the values into a spreadsheet. This is at minimum a task which costs companies around the world Hundreds of Millions of dollars a year.


what you're describing is automation and companies have been doing it for years.


Having worked on this directly and used basically every other piece of "automation" software available to do this, I can tell you that the GenAI solution is far superior. It's not close.


because apparently somehow ChatGPT magically knows what you're trying to extract.

Oh it doesn't? odd that.

ChatGPT cannot make judgement calls like what you're trying to imply it can.

ChatGPT can do some really cool things, but it's not magic.


If you tell a multi modal LLM to extract and structure the contents of a PDF it will absolutely be able to do that successfully. Further, they display a surprising ability at abductive “reasoning” (acknowledging of course they don’t reason at all), and are thus able to make pretty reasonable assumption given the semantic context of a request. Unlike traditional extraction tools that require very specific tuning and are very fragile to structural changes in layout, LLMs tend to be very resilient to such things.


This guy prompts.


what are you extracting?


Just go use GPT4 and try it. You keep repeating the same rhetoric that it doesn’t work but people have made it work, including myself.


I pay for GPT4 enterprise, try again, only this time answer the question.


> I pay for GPT4 enterprise, try again, only this time answer the question.

When I'm frustrated, I talk to ChatGPT like that.

It works as well for the LLM as it does for the humans in this thread.

What's worse is, I'd been writing some SciFi set in 2030 since well before Transformer models were invented, and predicted in early drafts of my fiction that you'd get better results from AI if you treated them with the same courtesies that you'd use for a human just because they learn by mimicking us (which turned out to be true), and yet I'm still making this mistake IRL when I talk to the AI…


Why are you paying for enterprise if it’s not useful to you?


it's almost as if you're so hyped over this you take someone cautioning that it's not magic as the enemy.

maybe stop doing that.

The PDF example being thrown around in this thread. There's a magical step in the middle that no one is acknowledging.


What Step? Do you think we're lying? I literally have built an application which takes in PDF's and extracts over 2 dozen values from it via prompting with Langchain. Do you think I'm a paid OpenAI sponsor or what?

You also realize that OpenAI has a dedicated Document Assistant which will literally extract information from a document you upload using prompts? Are you just unable to get that to work? I just don't know what you arguing at this point, like you're watching people walk backwards and then yelling that it's impossible for humans to walk backwards.


odd that you keep using that word prompts.


Yes you send a message to the API containing the context you are interested in along with questions about that context the LLM can answer. The parlance commonly used refers to that message as a prompt. I don't know if you're delusional or just completely clueless about how LLM's work.


Honestly just a skill issue on your part. RAG and one shot learning can get you incredibly far and if you can't figure it out you're ngmi. And no one is using ChatGPT for this lol.


oh snap, mr hot-shit has the skillzors.


Wait do you not realize what a prompt or a context window is? You literally think GPT just does things on it's own?

You understand that If I want to extract the Date a letter was sent, who the recipient was, what amount is due on an invoice, etc that I have to send a specific prompt asking that to GPT with the PDF in the context? Do you just literally not know how this works?


Yeah I'd love to see the results... If anything if there is a multimillion dollar benefit on the table one might argue that companies should publish this data in a more useful format. But no, let's bandaid over outdated practices like PDF-only data and burn GPU cycles to do it, with 92% accurate results.


A lot of things don’t require 100% accuracy, and raging against the worlds outdated practices doesn’t solve problems faced in the immediate present - and after spending 30 years of my career rating at those practices to absolutely no meaningful effect, a more effective bandaid that has a semantic understanding of the content is probably as good as you get. And GPU cycles are meant to be wasted.


If it replaces hundreds or thousands of man-hours of expensive labor then yes lets do it.


"Should" is just a curse word.

Be the change.


Lol I just don't read those PDFs. They are probably generated by shoddy automated tooling anyway. Garbage in, garbage out.


A lot of garbage encodings encode valuable information and resilient systems follow postels law, even if you personally do not.


This sort of bespoke automation ~~is~~ was expensive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: