Hacker News new | past | comments | ask | show | jobs | submit login

Do you take any measures to prevent link hallucination? And content grounding / attribution verification?





At the moment the measures taken are:

- Full content analysis by Primary LLM (Default is Gemini 2.5 Pro) with link hard-coded alongside each piece of content with structured output for better parsing. - Temperature right down (0.2), strict instructions to synthesize, precise prompts to attribute links exactly and without modification.

What I hope to introduce:

- Hard-coded parsing of links mentioned in final report to verify with the link map created throughout the research journey - Optional, "double-checking" LLM review of synthesized content to ensure no drift. - RAG enhancements for token-efficient verification and subsequent user questions (post-research)

Do you have any further suggestions?

Right now I hope to strike the delicate balance between token efficiency, with enhanced grounding as optional settings in the future. I have a big task list of things, and this is one of them. I will ensure to re-prioritize alongside user requests for the different features.

Of course, being open source, contributions are highly welcome. I would love to see large community involvement. Collaboration benefits everyone.

P.s. I have spent hundreds of dollars in tests. I'd say for every 1 hour of building, about 3 hours of testing have gone into this, debugging, optimizing quality, ensuring guard-rails are in place.

If you go to the repo, also check out the config/prompts.py file - it will give you a little more insight into what is going on (there are code checks as well, but generally it gives you an idea).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: