I've made https://uithub.com 2 months ago. Its speciality is the fact that seeing a repo's raw extract is a matter of changing 'g' to 'u'. It also works for subdirectories, so if you just want the docs of Upstash QStash, for example, just go to https://uithub.com/upstash/docs/tree/main/qstash
I wonder why nobody uses jsonl format to represent an entire codebase? It’s what I do and LLMs seems to prefer it. In fact, an LLM suggested this strategy to me. Uses less characters, too.
Are you suggesting that there's a correlation between what input formats provide best performance for an LLM input, and what sequence of tokens the same LLM outputs when prompted about what input formats provide best performance? Why would that be?
Why wouldn't that be? We've had several generations of LLMs since ChatGPT took the world by storm; current models are very much aware of LLMs that came before them, as well as associated discussions on how to best use them.
You can use JSON using the accept parameter of the API. The url structure remains the same. It also supports YAML and I found that's easiest to read for LLMs.
Is there any reason to prefer JSONL besides it being more efficient to edit? I'm happy to add it to my backlog if you think it has any advantages for LLMs
I've made https://uithub.com 2 months ago. Its speciality is the fact that seeing a repo's raw extract is a matter of changing 'g' to 'u'. It also works for subdirectories, so if you just want the docs of Upstash QStash, for example, just go to https://uithub.com/upstash/docs/tree/main/qstash
Great to see this keeps being worthwhile!