Do you take any measures to prevent link hallucination? And content grounding / ...

nickwatson · 2025-04-28T08:52:28 1745830348

At the moment the measures taken are:

- Full content analysis by Primary LLM (Default is Gemini 2.5 Pro) with link hard-coded alongside each piece of content with structured output for better parsing. - Temperature right down (0.2), strict instructions to synthesize, precise prompts to attribute links exactly and without modification.

What I hope to introduce:

- Hard-coded parsing of links mentioned in final report to verify with the link map created throughout the research journey - Optional, "double-checking" LLM review of synthesized content to ensure no drift. - RAG enhancements for token-efficient verification and subsequent user questions (post-research)

Do you have any further suggestions?

Right now I hope to strike the delicate balance between token efficiency, with enhanced grounding as optional settings in the future. I have a big task list of things, and this is one of them. I will ensure to re-prioritize alongside user requests for the different features.

Of course, being open source, contributions are highly welcome. I would love to see large community involvement. Collaboration benefits everyone.

P.s. I have spent hundreds of dollars in tests. I'd say for every 1 hour of building, about 3 hours of testing have gone into this, debugging, optimizing quality, ensuring guard-rails are in place.

If you go to the repo, also check out the config/prompts.py file - it will give you a little more insight into what is going on (there are code checks as well, but generally it gives you an idea).