More

tifa2up · 2026-03-21T20:07:46 1774123666

Interesting project. Curious why Electrobun over Tauri here? Tauri has a much larger ecosystem and rust based for improved performance.

ahmedriad1 · 2026-03-21T20:15:34 1774124134

Thank you! Main reason was DX. Tauri uses Rust for the backend while Electrobun uses TS + Bun. It's honestly been a breeze up till now.

verdverm · 2026-03-21T20:25:34 1774124734

Most people don't write Rust, so being in Rust is a disadvantage for those people. Most users don't care what language is used.

tifa2up · 2026-02-09T12:11:52 1770639112

https://agentset.ai/

Open-source RAG infrastructure.Every team I talk to has the same experience: RAG works in the demo, breaks in production.

We handle ingestion through retrieval with optimizations baked in. 97.9% on HotpotQA vs 88.8% for standard RAG. Model-agnostic, 22+ file types, built-in citations, MCP server. MIT licensed.

https://github.com/agentset-ai/agentset

tifa2up · 2026-01-15T09:45:52 1768470352

https://abdellatif.io

tifa2up · 2025-11-29T09:01:41 1764406901

https://agentset.ai/leaderboard/embeddings good rundown of other open-source embedding models

tifa2up · 2025-11-10T07:09:06 1762758546

I'm building https://github.com/agentset-ai/agentset, RAG as a service that works quite well out of the box.

We achieve this performance by baking in the best practices before any tweaking

cluckindan · 2025-11-10T07:12:54 1762758774

How does it handle retrieval in a multi-turn conversation? Is there an intent graph involved?

Does it summarize past context or keep it all?

tifa2up · 2025-11-10T13:32:41 1762781561

Right now it's single shot, we're looking into building an "Agentic Retrieval" based on Claude ADK. tbd how it'll work

cluckindan · 2025-11-11T14:02:46 1762869766

So retrieve once on the first message, and then use that context for the rest of the conversation?

tifa2up · 2025-11-05T08:17:59 1762330679

We tried GPT-5 for a RAG use case, and found that it performs worse than 4.1. We reverted and didn't look back.

sigmoid10 · 2025-11-05T08:24:27 1762331067

4.1 is such an amazing model in so many ways. It's still my nr. 1 choice for many automation tasks. Even the mini version works quite well and it has the same massive context window (nearly 8x GPT-5). Definitely the best non-reasoning model out there for real world tasks.

HugoDias · 2025-11-05T09:11:10 1762333870

Can you elaborate on that? In which part of the RAG pipeline did GPT-4.1 perform better? I would expect GPT-5 to perform better on longer context tasks, especially when it comes to understanding the pre-filtered results and reasoning about them

tifa2up · 2025-11-05T09:14:02 1762334042

For large context (up to 100K tokens in some cases). We found that GPT-5: a) has worse instruction following; doesn't follow the system prompt b) produces very long answers which resulted in a bad ux c) has 125K context window so extreme cases resulted in an error

internet_points · 2025-11-05T09:39:42 1762335582

Interesting. https://www.robert-glaser.de/prompts-as-programs-in-gpt-5/ claims GPT-5 has amazing!1!! instruction following. Is your use-case very different, or is this yet another case of "developer A got lucky, developer B tested more things"?

tifa2up · 2025-11-05T13:59:05 1762351145

Think it varies by use case. It didn't do well with long context

Shank · 2025-11-05T09:39:41 1762335581

ChatGPT when using 5 or 5-Thinking doesn’t even follow my “custom instructions” on the web version. It’s a serious downgrade compared to the prior generation of models.

cj · 2025-11-05T13:47:56 1762350476

It does “follow” custom instructions. But more as a suggestion rather than a requirement (compared to other models)

Xmd5a · 2025-11-05T10:14:29 1762337669

Ah, 100k/125K this is what poses problems I believe. GPT-5 scores should go up should you process contexts that are 10 times shorter.

mbesto · 2025-11-05T14:08:44 1762351724

How do you objectively tell whether a model "performs" better than another?

belval · 2025-11-05T14:13:34 1762352014

Not the original commenter but I work in the space and we have large annotated datasets with "gold" evidence that we want to retrieve, the evaluation of new models is actually very quantitative.

mbesto · 2025-11-05T20:33:39 1762374819

> but I work in the space

Ya, the original commenter likely does not work in the space - hence the ask.

> the evaluation of new models is actually very quantitative.

While you may be able to derive a % correct (and hence quantitative), they are by their nature very much not quantitative. Q&As on written subjects are very much subjective. Example benchmark: https://llm-stats.com/benchmarks/gpqa Even though there are techniques to reduce overfitting, it still isn't eliminated. So it's very much subjective.

teekert · 2025-11-05T08:23:41 1762331021

So… You did look back then didn’t look forward anymore… sorry couldn’t resist.

tifa2up · 2025-10-26T17:41:05 1761500465

Don't solve it on the STT level. Get the raw transcription from Gemini then pass the output to an LLM to fix company names and other modifications.

Happy to share more details if helpful.

idopmstuff · 2025-10-26T18:45:07 1761504307

Yeah, I've done it with industry-specific acronyms and this works well. Generate a list of company names and other terms it gets wrong, and give it definitions and any other useful context. For industry jargon, example sentences are good, but that's probably not relevant for company names.

Feed it that list and the transcript along with a simple prompt along the lines of "Attached is a transcript of a conversation created from an audio file. The model doing the transcription has trouble with company names/industry terms/acronyms/whatever else and will have made errors with those. I have also attached a list of company names/etc. that may have been spoken in the transcribed audio. Please review the transcription, and output a corrected version, along with a list of all corrections that you made. The list of corrections should include the original version of the word that you fixed, what you updated it to, and where it is in the document." If it's getting things wrong, you can also ask it to give an explanation of why it made each change that it did and use that to iterate on your prompt and the context you're giving it with your list of words.

dotancohen · 2025-10-26T22:39:45 1761518385

Which specific model do you use?

remus · 2025-10-26T17:48:18 1761500898

I've had some luck with this in other contexts. Get the initial transcript from STT (e.g. whisper), then feed that in to gemini with a prompt giving it as much extra context as possible. For example "This is a transcript from a youtube video. It's a conversation between x people, where they talk about y and z. Please clean up the transcript, paying particular attention to company names and acronyms."

flyinglizard · 2025-10-26T18:26:48 1761503208

I've done the same, it works very well.

tifa2up · 2025-10-23T17:42:45 1761241365

Yes, we got 187 self-serve users (all on the free plan). And are in talks with an enterprise now.

tifa2up · 2025-10-21T07:53:11 1761033191

You typically add a lot of metadata with each chunk text to be able to filter it, and do to include in the citations. Injecting metadata means that you see what metadata adds helpful context to the LLM, and when you pass the results to the LLM you pass them in a format like this:

Title: ... Author: ... Text: ...

for each chunk, instead of just passing the text

tifa2up · 2025-10-21T07:51:44 1761033104

Quite a decent hit. Local models don't perform very well in long contexts. We're planning to support a local-only offline set-up for people to host w/o additional dependencies