Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You could also crowd source this, some distributed application where everyone puts their home machines towards training.

That's how Leela Chess 0 (LC0) replicated the Alpha-zero performance. In fact, this is actually not that difficult. Assuming you have the means to orchestrate it; all it takes is loading the weights and a batch, computing backprop; and submitting it to the central system which aggregates the gradient updates and then updates the whole network and push new weights (kinda how bitcoin creates a new block).

This is no different to gradient accumulation; just "distributed". In-fact, the system could offload a large number of batches because the returned update is O(1) space to return where n is for batch size; it's just that the O(1) is the size of the network.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: