Hacker News new | past | comments | ask | show | jobs | submit login

How else could/should it be done?



I would have assumed they could just upload it to Github. If it has restrictions on file size I'm sure they could make multiple part compressed files.

Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.


GitHub have a soft repository size limit of 5GB, documented here: https://docs.github.com/en/repositories/working-with-files/m...

Soft size limit means "If your repository excessively impacts our infrastructure, you might receive an email from GitHub Support asking you to take corrective action." - I know people who have received such emails.

Most model releases happen through Hugging Face which does not have such a size limit.


They'd probably just charge you for it. They sell "data packs" for LFS.

https://docs.github.com/billing/managing-billing-for-git-lar...


It would be super expensive to use LFS to distribute this:

> Each pack costs $5 per month, and provides 50 GiB of bandwidth and 50 GiB for storage

So they would need to pay for 6 data packs (or $30) for every 300gb download.

(https://docs.github.com/en/billing/managing-billing-for-git-...)


I'd bet Hugging Face would be happy to have hosted these canonically too, so not sure why that doesn't happen more.


The model is also at https://huggingface.co/xai-org


The great thing about torrents is that you (or anyone else who cares) can single-handedly solve the problem you're complaining about by seeding the torrent.


No git would be impossible. I’ve never seen a repo even a few GB in size, if you are uploading non code files you really should not be using git. Git is a version management software for code. I often see repos which images and even videos checked in, please don’t, there are so many far better and more performant solutions out there.

The other approach would be to use AWS S3 or other cloud providers which would cost them money every time someone downloads their code, which is not their prerogative to pay for when they are releasing something for free. Torrents seems like the only good solution, unless someone hosts this on the cloud for free for everyone.


Scott Chacon (github cofounder) mentioned in a recent talk that the Windows repo is 300GB https://youtu.be/aolI_Rz0ZqY?si=MOo2eS6dsKKAxmsP


Interesting, had no idea git had a VFS or that MS was a Monorepo. I guess git is much more capable than I thought but the average user really should just be uploading code into github


Huggingface will disagree with impossible as their models are available via git, sometimes broken up in pth files.

Still, as far as sentiment goes, yeah git for model weights is an impedance mismatch for sure!


> No git would be impossible. I’ve never seen a repo even a few GB in size, if you are uploading non code files you really should not be using git

It's not actually a limitation in git itself, especially if you use Git LFS. People use Git for Unreal projects and big ones can be half a terabyte or more in size.


Others have pointed out that GitHub doesn't allow that, but

> Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.

So to can web links, especially when they are 300 GB and egressing out of AWS at $0.09/GB or worse (in non-US regions). Each full download would cost $27 at that rate. 10,000 downloads would cost $270,000.

Sure you could go for something with a better cost model like R2, but you can't beat using one or two unmetered connections on a VPN to constantly seed on Bittorrent, your pricing would be effectively free and reliability would be higher than if you just exposed a HTTP server on the Internet in such a way.


> and egressing out of AWS at $0.09/GB

There's a lot of seeders on the torrent that are actually AWS ips too, all with similar configurations which makes me believe that it's probably xAI running them

> on a VPN

That's unnecessary, you don't need a VPN?


No you don't, but if you wanted to host it from your gigabit office IP, you probably would want to.


Why?


GitHub may choose to throttle downloads or remove the files simply because they're taking up too much bandwidth.

A torrent is less likely to go down in the short term.


This is not some crappy DVD rip on The Pirate Bay. It will be seeded as long as its relevant.

Twitter/X has their own massive infrastructure and bandwidth to seed this indefinitely.


Yeah, they can just leave some server running somewhere and just let it seed forever




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: