Oh no! I didn't know their IPFS initiative didn't pan out. What happened to it? ...

nikisweeting · on Oct 9, 2024

There are people still working on trying to make it happen but it's just a collosal amount of data and filesystems are notoriously hard, so it's very slow going.

From my own personal experience doing distributed archiving with no relation to Archive.org, Filecoin/IPFS's UX isn't quite there yet. They still don't let you serve data to the network from a normal filesystem, you have to let their system ingest all of your stuff so you end up double-storing data or you have to give into everything being stored as inscrutable binary blobs.

That's why I still haven't integrated ArchiveBox with IPFS/Filecoin/Storj, let my data live in a normal filesystem dammit!

Aachen · on Oct 9, 2024

> They still don't let you serve data to the network from a normal filesystem, you have to let their system ingest all of your stuff so you end up double-storing data or you have to give into everything being stored as inscrutable binary blobs.

I don't understand this part. What data would you have to give them? Why can't it just live next to your stuff on your OS' filesystem?

dannyobrien · on Oct 10, 2024

For IPFS, I'm fairly sure you can now serve from your normal filesystem, rather than load it into their blockstorage -- or at least the blockstorage has pointers to real data blocks that are part of your existing files (it's the nocopy option[1]; it's marked as experimental, so there may be some sharp edges.)

For Filecoin, if you want fast access, you do need to keep a second hot plaintext copy, as well as the sealed Filecoin copy. But that works for the backup case for IA, because the hot copy would be served from the archive's existing infrastructure (and/or a distributed IPFS hot cache) -- you'd just use Filecoin for the proven safe backup.

The project to back up IA to Filecoin is still ongoing. The IA dashboard that shows the current state is (perhaps predictably) down at the moment, but it crossed the 1PiB line last year[2], and they've been optimising the onboarding flow recently.

[1] https://docs.ipfs.tech/reference/kubo/cli/#ipfs-add

[2] https://blog.archive.org/2023/10/20/celebrating-1-petabyte-o...

(Disclosure: I work at the Filecoin Foundation/Filecoin Foundation for the Decentralized Web, which partners with the Archive on this project, as well as supporting other Internet Archive backup projects.)

nikisweeting · on Oct 10, 2024

Needing to keep a separate hot copy at 220PiB is already ~$7M/yr, and multiples much more than that if you factor in labor and redundancy. The --nocopy option looks great though, I didn't see it last time I was looking around for an MFS/FUSE solution, I'll try it.

I appreciate your effort and I hope the project continues.

nightpool · on Oct 9, 2024

They're saying that the client software (the servers that speak the IPFS protocols) has to load the files to be served into their own local storage database, it can't just keep a "metadata file" and read the existing files off disk. Presumably somebody could write a client that spoke the IPFS protocol and did this, or fork the main Go or JS one, but until someone does that they're stuck with the software that's already been written

pshc · on Oct 10, 2024

IPFS is all content-hash-addressed, so my guess is the IPFS service spirits the files away to a (hopefully) immutable store for the sake of sanity.