Hacker News new | past | comments | ask | show | jobs | submit login

I've not done fine tuning on code bases but I have done other fine tuning.

You will generally get better results when you fine-tune the base model on your data.

Since you still want to use it with the chat template in the end, you fine-tune the base model with the chat template with your specific data.

From there you'll have a lora that knows your data alright, but still doesn't really work for chatting.

You take that lora, merge it with the base model. Let's call this the stage model.

Then you use mergekit to merge the base model with both the stage model and the chat model. I used the TIES merge method in the past. Now you have your final model.

I use vLLM for inference, and needed access to multiple fine tunes on only a single set of hardware. So from that point I go and take the base model and my final model and extract a new lora. I also take the base model and chat model and extract another lora for that. Then I load up vLLM with the base model and as many of the fine tune loras I need + the chat lora.

The only time this hasn't worked is if the chat model adds a bunch of new tokens on top of the base model. If I remember right there was an issue with that

This has worked well for me in the past.




Yes!! The trick is the merging of models weights!!


Thank you, this was a great explanation!


Very welcome, I wish you luck!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: