Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
FeepingCreature
on March 23, 2024
|
parent
|
context
|
favorite
| on:
Emad Mostaque resigned as CEO of Stability AI
LoRA training/merging basically is "crank up the batch size ridiculously high" in a nutshell, right? What actually breaks when you do that?
brrrrrm
on March 23, 2024
[β]
Cranking up the batch size kills convergence.
FeepingCreature
on March 23, 2024
|
parent
[β]
Wonder if that can be avoided by modifying the training approach. Ideas offhand: group by topic, train a subset of weights per node; figure out which layers have the most divergence and reduce lr on those only.
brrrrrm
on March 25, 2024
|
root
|
parent
[β]
A provable way to recover convergence is to calculate the hessian. Itβs computationally expensive but there are approximation methods.
Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: