Good question — it's pretty straightforward right now:
I pass the collected content chunks (with their original URLs attached) into Gemini 2.5 Pro, asking it to synthesize a balanced report and to inline citations throughout.
So it's not doing anything fancy like dynamic retrieval or classic RAG architecture.
Basically:
- The agent gathers sources (webpages, PDFs, Reddit, etc.)
- Summarises each as it goes (using a cheaper model)
- Then hands a bundle of summarised + raw content to Gemini 2.5 Pro
- Gemini 2.5 Pro writes the final report, embedding links directly as citations with [1], [2], etc style citations throughout.
Reverse-RAG is something I for sure want to implement. Once I can afford a better computer to run this with at scale. Even an 8B model will take overnight to summarize an average piece of content for me right now! But I'm also keeping an eye on the pace of which AI moves in the larger LLM space. The size and abilities of likes of Gemini 2.5 Pro context windows are pretty crazy these days!
- Full content analysis by Primary LLM (Default is Gemini 2.5 Pro) with link hard-coded alongside each piece of content with structured output for better parsing.
- Temperature right down (0.2), strict instructions to synthesize, precise prompts to attribute links exactly and without modification.
What I hope to introduce:
- Hard-coded parsing of links mentioned in final report to verify with the link map created throughout the research journey
- Optional, "double-checking" LLM review of synthesized content to ensure no drift.
- RAG enhancements for token-efficient verification and subsequent user questions (post-research)
Do you have any further suggestions?
Right now I hope to strike the delicate balance between token efficiency, with enhanced grounding as optional settings in the future. I have a big task list of things, and this is one of them. I will ensure to re-prioritize alongside user requests for the different features.
Of course, being open source, contributions are highly welcome. I would love to see large community involvement. Collaboration benefits everyone.
P.s. I have spent hundreds of dollars in tests. I'd say for every 1 hour of building, about 3 hours of testing have gone into this, debugging, optimizing quality, ensuring guard-rails are in place.
If you go to the repo, also check out the config/prompts.py file - it will give you a little more insight into what is going on (there are code checks as well, but generally it gives you an idea).