How do you get value from chatting with documents? I can scan and read a pdf fas...

capableweb · on Dec 10, 2023

> There must be more to it than I realize.

PDF material comes with different information density. If you have a lose collection of 100 manuals, and you need to find a snippet of information that could be in 10 different ones, I'm guessing something like this can help you navigate and locate what you need.

gosub100 · on Dec 11, 2023

That would be a great litmus test for these programs. Dump gigabytes of manuals and ask "how many pins does the 74LS04 have?" "What size bolts hold the oil pan on a 73 Porsche?"

freedomben · on Dec 10, 2023

a one-page PDF, sure. But if it's a 500 page pdf of a law and/or regulation, then definitely not.

saberience · on Dec 10, 2023

How can you chat with a pdf which doesn’t fit in the context window? I mean with a 500 page pdf you might need 100 context windows to fully grok it.

Basically it makes no sense to “chat” with a 500 page pdf with todays LLMs.

htsh · on Dec 10, 2023

That is what the RAG system does. The PDF is chunked and thrown into a vector store. And then when prompted, only the relevant bits are retrieved and stuffed into the context and sent to the LLM.

So yeah it's kinda smoke and mirrors. In some cases, for some long PDFs, it works really well. If it's a 500 page PDF with many disparate topics, it may do fine.

freedomben · on Dec 10, 2023

Indeed. Would only add, context windows are continually multiplying in size. Who knows how long Moore's Law will apply here, but it's a continually improving window.

saberience · on Dec 10, 2023

I've found that the longer context windows don't seem to be a linear improvement in responses though. It's like the longer the context window, the quality of the response is perhaps broader, but less sharp or accurate. I've been using GPT4-turbo with the longer context window for coding tasks but it doesn't seem to have improved the responses as much as you would think, it seems to be more "distracted" now, which perhaps makes some intuitive sense.

I can give gpt4-turbo many full code files to try and solve a complex coding task but despite the larger window it seems to fail more often or ignore parts of the context window or just doesn't really answer the question.

saberience · on Dec 10, 2023

That assumes that only one part of the PDF, which fits in the context window, is relevant to the prompt, which seems like a fairly big assumption.

bravura · on Dec 10, 2023

You can screenshot the first page and use gpt vision

arthurcolle · on Dec 10, 2023

You could just load up the doc, take first 1024 tokens, and almost always get the right authors/title/year, etc, assuming its there.

But going further, for large bills you might need (|n|..|m|) pages to capture full index

for research papers you also want to look at last (|n2|..|m2|) pages for bibliography, etc..

abraae · on Dec 10, 2023

Responding to RFPs springs to mine. Knowing you have already answered the question on some previous response but a nightmare to lay your hands on it.

jrpt · on Dec 10, 2023

I run https://Docalysis.com/ and there’s a few use cases. The first is getting information out of various reports and papers by chatting with it, which is faster than reading an entire document. Another is automating data extraction out of files, which is part of many business processes.

qingcharles · on Dec 10, 2023

I use Claude 2.1 to create summaries and TOCs of the magazines on my magazine encyclopedia. There is no way I could do that by hand for several million magazines averaging 100 pages each.

naiv · on Dec 10, 2023

Also if they are in spanish?

katrinarodri · on Dec 10, 2023

you mean ability to translate PDFs into english?

netcraft · on Dec 10, 2023

what if you want to extract certain information from 100 pdfs?