Is there a way to use this within VSCode like copilot , meaning having the "shadow code" appear while you code instead of having to tho back-and-forth between the editor and a chat-like interface ?
For me, a significant component of the quality of these tools resides on the "client" side; being able to engineer a prompt that will yield to accurate code being generated by the model. The prompt needs to find and embed the right chunks from the user current workspace, or even from his entire org repos. The model is "just" one piece of the puzzle.
Not using Codestral (yet) but check out Continue.dev[1] with Ollama[2] running llama3:latest and starcoder2:3b. It gives you a locally running chat and edit via llama3 and autocomplete via starcoder2.
It's not perfect but it's getting better and better.
Having the chats in Obsidian lets me save them to reference them later in my notes. When I first started using it in VSCode when programming in Python it felt like a lot of noise at first. It kept generating a lot of useless recommendations, but recently it has been super helpful.
I think my only gripe is I sometimes forget to turn off my ollama systemd unit and I get some noticeable video lag when playing games on my workstation. I think for my next video card upgrade, I am going to build a new home server that can fit my current NVIDIA RTX 3090 Ti and use that as a dedicated server for running ollama.
I created a simple CLI app that does this in my workspace, which is under source control so after the LLM execution all the changes are highlighted by diff and the LLM also creates a COMMIT_EDITMSG file describing what it changed. Now I don't use chatgpt anymore, only this cli tool.
I never saw something like this integrated directly on VSCode tho (and isn't my preferred workflow anyway, command line works better).
For me, a significant component of the quality of these tools resides on the "client" side; being able to engineer a prompt that will yield to accurate code being generated by the model. The prompt needs to find and embed the right chunks from the user current workspace, or even from his entire org repos. The model is "just" one piece of the puzzle.