Hacker News new | past | comments | ask | show | jobs | submit login

Are people fine-tuning LLMs on their local machines with a single GPU? What are people using to scale their training to multiple nodes / gpus? I've been playing around with Hugging Face Estimators in sagemaker.huggingface but not sure if there are better options for this?



It takes a significant amount of time (few hours) on a single consumer GPU, even 4090 / 5090, on personal machines. I think most people use online services like runpod, vast ai, etc to rent out high-powered H100 and similar GPUs for a few cents per hour, run the fine-tuning / training there, and just use local GPUs for inference on those fine-tuned models generated on cloud-rented instances.


It used to be that way! Interestingly I find people in large orgs and the general enthusiast don't mind waiting - memory usage and quality are more important factors!


Google Colab is quite easy to use and has the benefit of not making your local computer feel sluggish while you run the training. The linked Unsloth post provides a notebook that can be launched there and I've had pretty good luck adapting their other notebooks with different foundational models. As a sibling noted, if you're using LORA instead of a full fine-tune, you can create adapters for fairly large models with the VRAM available in Colab, especially the paid plans.

If you have a Mac, you can also do pretty well training LORA adapters using something like Llama-Factory, and allowing it to run overnight. It's slower than an NVIDIA GPU but the increased effective memory size (if you say have 128GB) can allow you more flexibility.


Take a look at the hardware requirements at https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

A 'LoRA' is a memory-efficient type of fine tuning that only tunes a small fraction of the LLM's parameters. And 'quantisation' reduces an LLM to, say, 4 bits per parameter. So it's feasible to fine-tune a 7B parameter model at home.

Anything bigger than 7B parameters and you'll want to look at renting GPUs on a platform like Runpod. In the current market, there are used 4090s selling on ebay right now for $2100 while runpod will rent you a 4090 for $0.34/hr - you do the math.

It's certainly possible to scale model training to span multiple nodes, but generally scaling through bigger GPUs and more GPUs per machine is easier.


For experimentation and smaller models, single gpu is the way to go! Tbh I normally find most people to spend the majority of their time on datasets, training loss convergence issues etc!

But if its helpful I was thinking about spinning up a platform for something like that!


For experimentation? Absolutely. It can often be done overnight for smaller models and reasonably sized GPUs (24GB+).

It'd become a lot less practical with huge datasets, but I'd guess that a lot of fine tuning tasks aren't really that large.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: