For example: i'm using Claude Code mostly + Gemini CLI. Gemini CLI is not such powerful as CC, also it won't work with some mcp's. So i have different *.md files.
Nvidia hardware is cheap as chips right now. If you got 2x 3060 12gb cards (or a 24gb 4090), you'd have 24gb of CUDA-accelerated VRAM to play with for inference and finetuning. It should be plenty to fit the smaller SOTA models like GLM-4.5 Air, Qwen3 30b A3B, and Llama Scout, and definitely enough to start layering the giant 100b+ parameter options.
In my opinion? Qwen3 does live up to the benchmarks, it leaves Sonnet 4 in the dust quality-wise if you can get a fast enough tok/s to use it. I haven't tried GLM or Llama Scout yet, nor do I have a particularly big frame of reference for the quality of Opus 4.
You might be able to try out Qwen3 via API to see if it suits your needs. Their 30b MOE is really impressive, and the 480b one can only be better (presumably).
It is generated in it's entirety by GPT. Well 98% is more like it. By the time it's ready, it will have a README. I will announce it on Reddit /r/rust if you are interested.
Something i want to test, is how much documentation is needed, for the machine to infer the rest of it. Something like, one sentence of human documentation + code, how much can LLM infer and describe the code as accurately as possible. Does it need two sentences? 3? We'll see.
p.s: looking forward for crush :3