mathis-l's comments

mathis-l · 2026-04-10T07:36:37 1775806597

At least when working with local MCP servers I solved this problem by wrapping the mcp tools inside an in-memory cache/store. Each tool output gets stored under a unique id and the id is returned with the tool output. The agent can then invoke other tools by passing the id instead of generating all the input. Adding attribute access made this pretty powerful (e.g. pass content under tool_return_xyz.some.data to tool A as parameter b). This saves token costs and is a lot faster. Granted, it only works for passing values between tools but I could imagine an additional tool to pipe stuff into the storage layer would solve this.

mathis-l · 2026-03-24T22:23:15 1774390995

CrewAI (uses litellm) pinned it to 1.82.6 (last good version) 5 hours ago but the commit message does not say anything about a potential compromise. This seems weird. Is it a coincidence? Shouldn’t users be warned about a potential compromise?

https://github.com/crewAIInc/crewAI/commit/8d1edd5d65c462c3d...

mathis-l · 2026-03-24T22:26:39 1774391199

Dspy handling it openly https://github.com/stanfordnlp/dspy/issues/9500

mathis-l · 2026-01-13T21:37:38 1768340258

I’ve worked on a code base that was about 15 years old and had gone through many team changes. It was a tricky domain with lots of complicated business logic. When making changes to the code, the commit history was often the only way to figure out if certain behavior was intended and why it was implemented this way. Documentation about how the product should behave often lacked the level of detail. I was certainly always thankful when a dev that was long gone from the team had written commit messages that communicated the intent behind a change.

mathis-l · 2025-06-22T11:21:15 1750591275

I’m 190cm and tested luna rail’s prototypes. I was amazed how much space I had, even in the smallest cabin. Definitely much better compared to any night train experience I’ve ever had

ant6n · 2025-06-22T13:27:15 1750598835

Hi Mathis-l, thanks for the shout out!

mathis-l · on April 13, 2025

You might want to take a look at https://github.com/segment-any-text/wtpsplit

It uses a similar approach but the focus is on sentence/paragraph segmentation generally and not specifically focused on RAG. It also has some benchmarks. Might be a good source of inspiration for where to take chonky next.

vunderba · on April 13, 2025

This is the library that I use, mainly around very noisy IRC chat transcripts and it works pretty well. OP I'd love to see a paragraph matching comparison benchmark against wtpsplit to see how well Chonky stacks up.

mathis-l · on Dec 25, 2023

Haystack [1] is another good option. It‘s modular, doesn’t get in your way and is particularly strong at retrieval. People like the documentation too.

Disclaimer: I work at deepset

[1] https://github.com/deepset-ai/haystack

mathis-l · on July 12, 2023

You might want to give Haystack a try (disclaimer: I work at deepset, the company behind Haystack).

Haystack allows you to pre-process your documents into smaller chunks, calculate embeddings and index them into a document store. You can wrap all of that in a modular pipeline if you want.

Next, you can query your documents using a retrieval pipeline.

Regarding document store selection: Replacing your document store is easy, so I would start with the most simple one, probably an InMemoryDocumentStore. When you want to move from experimentation to production, you‘ll want to tailor your selection to your use case. Here‘s a few things that I‘ve observed.

You don’t want to manage anything and are fine with SaaS -> Pinecone

You have a very large dataset (500M+ vectors) and you want something that you can run locally -> maybe Qdrant

You have meta data that you want to incorporate into your retrieval or you want to do hybrid search -> Opensearch/Elasticsearch

Regarding model selection:

We‘ve seen https://huggingface.co/sentence-transformers/multi-qa-distil... work well for a good semantic search baseline with fast indexing times. If you feel like the performance is lacking, you could look at the E5 models. What also works fairly well for us is a multi-step retrieval process where we retrieve ~100 documents with BM25 first and then use a cross-encoder to rank these by semantic relevance. Very fast indexing times are a benefit and you also don’t need a beefy vector db to store your documents. Latency at query time will be slightly higher though and you might need a GPU machine to run your query pipeline.

Retrieval in Haystack: https://docs.haystack.deepset.ai/docs/retriever

Cross-Encoder approach: https://docs.haystack.deepset.ai/docs/ranker

Blog Post with an end-to-end retrieval example: https://haystack.deepset.ai/blog/how-to-build-a-semantic-sea...