Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you get value from chatting with documents? I can scan and read a pdf faster than I can chat with an AI about it. There must be more to it than I realize.


> There must be more to it than I realize.

PDF material comes with different information density. If you have a lose collection of 100 manuals, and you need to find a snippet of information that could be in 10 different ones, I'm guessing something like this can help you navigate and locate what you need.


That would be a great litmus test for these programs. Dump gigabytes of manuals and ask "how many pins does the 74LS04 have?" "What size bolts hold the oil pan on a 73 Porsche?"


a one-page PDF, sure. But if it's a 500 page pdf of a law and/or regulation, then definitely not.


How can you chat with a pdf which doesn’t fit in the context window? I mean with a 500 page pdf you might need 100 context windows to fully grok it.

Basically it makes no sense to “chat” with a 500 page pdf with todays LLMs.


That is what the RAG system does. The PDF is chunked and thrown into a vector store. And then when prompted, only the relevant bits are retrieved and stuffed into the context and sent to the LLM.

So yeah it's kinda smoke and mirrors. In some cases, for some long PDFs, it works really well. If it's a 500 page PDF with many disparate topics, it may do fine.


Indeed. Would only add, context windows are continually multiplying in size. Who knows how long Moore's Law will apply here, but it's a continually improving window.


I've found that the longer context windows don't seem to be a linear improvement in responses though. It's like the longer the context window, the quality of the response is perhaps broader, but less sharp or accurate. I've been using GPT4-turbo with the longer context window for coding tasks but it doesn't seem to have improved the responses as much as you would think, it seems to be more "distracted" now, which perhaps makes some intuitive sense.

I can give gpt4-turbo many full code files to try and solve a complex coding task but despite the larger window it seems to fail more often or ignore parts of the context window or just doesn't really answer the question.


That assumes that only one part of the PDF, which fits in the context window, is relevant to the prompt, which seems like a fairly big assumption.


You can screenshot the first page and use gpt vision


You could just load up the doc, take first 1024 tokens, and almost always get the right authors/title/year, etc, assuming its there.

But going further, for large bills you might need (|n|..|m|) pages to capture full index

for research papers you also want to look at last (|n2|..|m2|) pages for bibliography, etc..


Responding to RFPs springs to mine. Knowing you have already answered the question on some previous response but a nightmare to lay your hands on it.


I run https://Docalysis.com/ and there’s a few use cases. The first is getting information out of various reports and papers by chatting with it, which is faster than reading an entire document. Another is automating data extraction out of files, which is part of many business processes.


I use Claude 2.1 to create summaries and TOCs of the magazines on my magazine encyclopedia. There is no way I could do that by hand for several million magazines averaging 100 pages each.


Also if they are in spanish?


you mean ability to translate PDFs into english?


what if you want to extract certain information from 100 pdfs?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: