Quick testing on vscode to see if I'd consider replacing Copilot with this. Bigg...

ggerganov · 2025-01-23T20:23:50 1737663830

There are 4 stopping criteria atm:

- Generation time exceeded (configurable in the plugin config)

- Number of tokens exceeded (not the case since you increased it)

- Indentation - stops generating if the next line has shorter indent than the first line

- Small probability of the sampled token

Most likely you are hitting the last criteria. It's something that should be improved in some way, but I am not very sure how. Currently, it is using a very basic token sampling strategy with a custom threshold logic to stop generating when the token probability is too low. Likely this logic is too conservative.

bangaladore · 2025-01-23T20:36:52 1737664612

Hmm, interesting.

I didn't catch T_max_predict_ms and upped that to 5000ms for fun. Doesn't seem to make a difference, so I'm guessing you are right.