Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Backing up a Linux system to Usenet (arstechnica.com)
80 points by Tomte on Feb 10, 2017 | hide | past | favorite | 79 comments


>With access to a Usenet news server, you can simply upload your backup there, and it will be stored redundantly in news servers all over the world. Best of all, this approach typically costs considerably less than a cloud backup service.

I saw no mention about "retention policies" in the article. For newsgroups with binaries, the other Usenet peer servers can choose to download only text and ignore binaries. Or they only hold binaries for 30 days or whatever discretionary time period they choose based on available disk space.

With AWS Glacier, Backblaze, etc the data retention would be explicitly specified.

[1] example of limited retention period: http://www.giganews.com/blog/labels/retention


Most paid Usenet services these days advertise their retention time of > 1000 days.

Some of the big providers in the business right now:

https://www.easynews.com/usenet-plans.html 2600+ days

http://www.supernews.com/ 2300+ days

http://www.news.astraweb.com/ 3000+ days

Easily enough for some backups. Of course, there's no guarantee of integrity, but it might work for some people.


>Of course, there's no guarantee of integrity,

Exactly. It's been a decade since I've used Usenet but I remember there were always tons of messages complaining, "part43.rar and part62.rar is missing please reupload!!!"

As another anecdote, I personally posted a huge list of AWK one-liners 20 years ago to Usenet (comp.lang.awk) and I can't even retrieve that message. That's just pure ASCII text (probably less than 2k) and even the Google Groups (dejanews acquisition) archives don't have it.


> As another anecdote, I personally posted a huge list of AWK one-liners 20 years ago to Usenet (comp.lang.awk) and I can't even retrieve that message. That's just pure ASCII text (probably less than 2k) and even the Google Groups (dejanews acquisition) archives don't have it.

This is actually another in a long list of known search bugs on Google Groups: https://productforums.google.com/forum/#!topic/apps/kej8-gpV...

Broken search has been a constant problem since Google acquired Deja: https://motherboard.vice.com/en_us/article/google-a-search-c...

Google broke the Deja archive after acquiring them in 2001 and it has never worked properly since that time. The UI and the search sucks. The archives are still there though. I wish Google would release the archives to the public and set up a non-profit to manage them.

Compare this to Gmane, essentially a one-man project, where the community stepped up to bring it back on the web after only a couple of months: https://lars.ingebrigtsen.no/2016/09/06/gmane-alive/


If you posted something in the last 6-8 years or so it'll be readily available. Around that point most of the big commercial usenet providers just kept growing their storage without stopping. Drives got that cheap.

With something like backups though, it's expected you'll be regularly updating them, so the retention won't be a huge issue anyways as you'd probably prune old ones even if you were paying to store them.


Maybe you have not read the article closely. You normally have a bunch of .par2 files (my knowledge is also 10 years old) which can repair your missing files/parts when the amount of loss is smaller than the size of all (non broken) .par2 files.


My comment was more about the reliability of replication across geographies (author writes, "and it will be stored redundantly in news servers all over the world.")

The par2 files would be more surefire if you're using your own subscribed newsserver in a closed-loop fashion for backups. However par2 technology doesn't mean NNTP will guarantee those parity files actually got replicated to other servers.[1] If one is looking for safety beyond your own newsserver, you'd have to consider increased probabilities of failures. I think that's ok for sharing rips of Blu-Ray movies. However, for the irreplaceable backup photos of your children, hoping for par2 files on other Usenet servers you don't subscribe to may be too risky.

[1] http://ask.metafilter.com/237447/How-do-files-on-Usenet-deca...


> You normally have a bunch of .par2 files

vol003+04.par2 and vol031+32.par2 is missing please reupload!!!

Before any flippancy haters downvote, this is exactly what happened on Usenet. Shitty servers are shitty, whatever the file type.


par2 data could be uploaded but also kept on local backup or traditional cloud backup providers.


Parchives. Great for adding recovery to any file. :) I use them [now par2cmdline-tbb] religiously for archives. I started doing this years ago after having CD-R's burned on a drive (Plextor, no less) that I started having a hard time reading on any other drive.


Yep. I use par2 for my local backups.


> Exactly. It's been a decade since I've used Usenet but I remember there were always tons of messages complaining, "part43.rar and part62.rar is missing please reupload!!!"

Just use an error-correcting scheme, such as used by RAID.


That's what the Par2 in the fine article is.


Someone just posted a link to "Handy one-line scripts for awk":

https://news.ycombinator.com/item?id=13619124

Is that it?


Thanks for the heads up but that's not the one. I posted it in 1998 or 1999 and I tried to find the exact Usenet archive link similar to the direct link for Larry Page's famous 1996 post on comp.lang.java[1].

To go back to the article, the author mentions posting the files to newsgroup "alt.binaries.backup". With Usenet, there isn't exactly a contractual SLA (Service Level Agreement) for that group. It's a gentlemen's agreement between those commercial Usenet providers (and non-commercials ones like universities) to replicate messages. Maybe because I posted the message to my GTE/Verizon ISP's Usenet server meant that it only got replicated to a few peers and it "died".

If my tiny text-based post which is 2 years newer than Larry Page's can't be recovered today, it doesn't give me lot of confidence to use Usenet as a backup solution. I have over 1 terabyte of photos, home videos, and tif scans of all my paper files. It's not appealing to chop that 1TB into a thousand PAR2 files with extra 20% redundant parity and posting it to alt.binaries.backup. That seems extremely fragile. Another commenter made a suggestion for Amazon's new personal "unlimited" cloud for $60/year. That seems much more reliable.

[1] https://groups.google.com/forum/#!msg/comp.lang.java/aSPAJO0...


> It's not appealing to chop that 1TB into a thousand PAR2 files with extra 20% redundant parity and posting it to alt.binaries.backup.

For a 1 TB archive with 20% redundancy, you're looking at a block size of at least 32 MB in each par2 file (due to the maximum block count of 32767 [1] in the official implementation). Given that the article size limit for many news servers is roughly 1 MB, you're looking at even a single block getting split into 32 article posts. par2 programs typically will generate a series of files where the smallest files contain a single block and the largest files contain 50 or more blocks. The 50 block files will each get split into 1600 articles.

For par2 recovery to work even when articles are missing, you really want the recovery block size to be less than the article size limit, so that even if one or more articles are missing, the par2 archiver program can still read a subset of blocks from the incomplete recovery file and still use them for recovery. That means that the maximum archive size would be roughly 32 GB to keep the block size under the article size limit.

Going beyond that size means that it's less likely that the recovery file would be usable if some of the articles are missing. At 32 GB, if one article is missing from a 3 block recovery file, the software will still be able find 2 blocks in that file. But, if the archive size was 100 GB, then the block size would be a minimum of 3 MB and just missing 3 out of 9 articles that make up a 3 block recovery file would make the recovery file unusable.

[1] https://en.wikipedia.org/wiki/Parchive#Parity_Volume_Set_Spe...


Common Usenet binary retention is north of 3 years today, shared across many providers. Yes, you will need an account at one of those providers to download the backup in the future. And yes, nothing is guaranteed about retention.


This has to be a joke, surely? A good example of the Tragedy of the Commons...

As ProfessorGuy succinctly commented: > Why not an article on how you can get a bed for yourself in the local hospital so you won't have to pay rent? Hey, if they're going to build a public institution like a bunch of suckers, they deserve to be taken advantage of!


Amusingly, there's alt.binaries.backup https://www.binsearch.info/browse.php?server=2&bg=alt.binari.... I found out about its existence through the following two earlier threads regarding this approach:

https://newsgroupdirect.com/blog/2012/04/07/usenet-for-file-...

https://www.linkedin.com/pulse/non-scalable-anti-social-back...

Does anyone know what the deal with alt.binaries.backup is? Why would anyone mirror?


This would be a drop in the bucket compared to the terabytes of warez uploaded daily. https://en.wikipedia.org/wiki/Usenet#Usenet_traffic_changes


While there are different, and maybe bigger ethical issues there, at least it is using the system as designed.


This is why we can't have nice things...


Encrypting your backup with a passphrase and making it public is a bad idea. It is not forward secure. Anyone can try to bruteforce it offline. You better use a keyfile and backup it somewhere else, such as your other computers, external drives etc. And even then, publishing your backup is a bad idea due to possible breakthroughs in cryptography. No crypto is secure forever, you better assume that it will be broken in the next 5-10 years.


Related-ish-ly, I've had a notion bouncing around in my head that it should be possible to build a fairly complete social networking interface on top of email and ical (or similar), and some kind of contact-management standard, such that the whole thing (or at least enough of it) could be rebuilt from e.g. your (possibly dedicated to this purpose?) email account when necessary, possibly in stages—say, a new installation of the application builds what it can from the last 30 days in a few seconds, then displays a populated interface while it continues digging to some arbitrary point in the past (until it exhausts caching resources it has been allotted on a given platform, basically).

An initial search has revealed a couple abandoned partially-complete efforts, but that's it.


I believe that p≡p is aiming to implement something like that in due time. They had a talk at FOSDEM this year: https://fosdem.org/2017/schedule/event/pep/


> For most users, however, you’ll probably find it costs you around $10 per month to maintain the Usenet service in this manner.

Which is more than a dedicated cloud backup service like Backblaze or Crashplan. Genius.


If you read the whole article, the author mixes in caveats throughout and says that it requires a special set of requirements for the usenet backup to be worthwhile. At some point, he addresses your point by saying that in some cases (possibly requiring careful coordination between all of the servers) you could backup many different servers with the same usenet account whereas services like Backblaze and Crashplan usually restrict the number of machines.

That said I still think the point you make is the most important critique. Online storage/backup is such a crowded market (Google, Amazon, DropBox, Box, Backblaze, Crashplan, Carbonite, Mozy, tarsnap, Spideroak, rsync.net, OneDrive, etc) that if you go with something significantly cheaper than all of the mainstream options you are probably going to get what you paid for.


>Bored with ho-hum cloud backups?

Well I can't say that's something I've ever thought.


Another idea:

Split your data into ping packet sized chucks and throw them out on the Internet. Keep all your data bouncing around on the Internet and use that as your storage provider.

Not really practical, but a neat idea. Saw in the excellent "Silence of the wire": https://www.nostarch.com/silence.htm


Point a laser at the empty sky, modulate it with your backup file.

It's gonna last forever.


My personal favourite was sending data in outgoing emails to invalid destinations. Your data would get queued and a few days later you'd receive it back in an undeliverable-message.

However this stopped being effective when mailserver stopped sending back the complete body of the bounced mail. Instead just a "Your mail couldn't be delivered, blah blah". You could deal with that by encoding your data in the subject-lines, though I guess ..


It's analogous to delay line memory, and a common joke people when they encounter that idea of storage (which is rather clever).



Some time ago, I implemented a little tool for myself which backups folders incrementally on usenet (deterministic message-id creation from a secret key, append-only style with metadata, parity, encryption etc. so you only have to remember one unique key to access all of your data, even if new data is added).

It can mount the current state of the usenet backup with FUSE and it's possible to browse through the files and listen to music etc. I understand that it might not be a good idea to store all your data only on usenet, but I thought that was an interesting concept and a fun little project to work on :)


Open source or didn't happen.


at the moment its very tailored to macOS and has plenty rough edges because its for personal use. But I'll gladly clean it up a bit and put it on github if enough people are interested and/or i will write-up how its "protocol" works. Just let me know (twitter pm or sth.) who is interested.

EDIT: basically it's one deterministic stream of messages for journaling everything, one recursive stream of folders and linked raw files. deterministic lookup rougly like: HMAC(type|index|revision|replication, key_for_locating) and it iterates through that

EDIT2: and one for parity of meta and raw


Reminds me of

"Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" - Linus Torvalds


From Amazon Lightsail I can get this for $5 a month:

* 512 MB Memory

* 1 Core Processor

* 20 GB SSD Disk

* first 1 TB of outbound Transfer

That's a whole VPS I can put a git repo on and upload encrypted files to using ssh. So I get gigabytes of storage, secure transfer, a commit log, and can do some light other work on it. Or I could just rsync or scp them there.

GitLab is even one of their preinstall-able images, or you can pick some other application/stack (MEAN, LAMP, Nginx, Drupal, Node.js, Redmine) or a bare OS (Amazon Linux or Ubuntu) and initialize your own bare repo.

For the $20 mentioned in the post, I could set up four boxes in four AWS zones. I could configure my backup scripts to do a duplication rate of only two full copies and have roughly 40GB of stuff backed up fully duplicated in physically disparate data centers.

This isn't even getting into elastic storage offerings or paid cold storage which is dirt cheap by comparison. This is just simplified point-and-click deployment of predefined images, and it's already far more reliable for the money.

If one really wants to spend monthly to rig up their own backup solution rather than just using Amazon Drive, Google Drive, or Dropbox or something similar, this at least makes the stuff easy to find, redundant, and simple enough without polluting the public forums.

If one really needs two or three year data retention offsite with multiple copies and wants to only pay for the upload and download without paying for storage in between, just buy outright some 32 GB USB flash drives for $8 a pop and mail them to three friends. There, you've paid $30ish one month for hardware, postage, and taxes and will pay one friend $1 in postage to get the data back.


>For the $20 mentioned in the post, I could set up four boxes in four AWS zones.

And for the $2.50 also mententioned, you could set up zero. Or you could buy a bandwidth based plan, where $20 will get you at least 250GB of bandwidth.


It's actually quite possible to get an unmanaged VPS for under $2 a month if you shop around. I'm currently renting one for use as a personal email server for £0.002 per hour, which works out to around $1.85 a month for 1 core, 1GB ram, 20GB storage, and 2TB bandwidth.

https://lowendbox.com/ is a good resource for finding cheap VPS offerings to mess around with.


Idea: An open source peer-to-peer backup service where data is encrypted and backed up in pieces across countless systems around the globe. No one person had your data but in some way it would be guaranteed your data would always be available. Everyone who uses the backup service would be required to also accept data pieces for backup.


It is called BitTorrent. Generate the backup file as described in the parent article, give it a saucy name involving a (female) celebrity and let it out into the wild. As disk space is effectively free and people tend to hoard files, I suspect you'd have no problems recovering your file a year or two down the line.

(I tried this way back when using WinMX (anyone remember THAT?) to see whether it would work. It did.)

We dubbed it 'hardcore backups.'


You could even throw a little steganography in to the mix, so those who check for the content they want, but pay little attention to file size continue to participate in storing your data!


Was thinking the exact same thing. Win-Win


That is the dream yes.

Maybe with: https://ipfs.io


IPFS, Freenet, Zeronet... maybe more significantly Maidsafe is trying to do this with a cryptocurrency so you get paid a bit to store files. It's a quite clever idea.


There was a paid version of something like this called Digital Lifeboat. IIRC it ran on top of the BitTorrent protocol, and users would store some pieces of other users backups. It didn't really catch on and shutdown a few years ago.


That sounds just like Freenet or IPFS.


Etherium developers working on Swarm[1][2] which is p2p storage with incentive and smart contracts!

[1] https://blog.ethereum.org/2016/12/15/swarm-alpha-public-pilo...

[2] https://github.com/ethersphere/swarm


There was something like this for years called Symform but they shut down recently: http://www.backupreview.com/symform-to-discontinue-peer-to-p...

I guess this was more of a commercial solution though.


Seems like someone got the task to write something controversially stupid to get traffic, because if you proposed such scenario to any real life company, you'd be laughed out of the room instantly, disregarding "20 years of experience".


While interesting, this seems like a bad idea. You're uploading your backups, no matter how encrypted, to a place where they will be publicly available to download.


Most cloud backup services are worse - they do no client side encryption, your files are freely available to the service provider or anyone who can break in.

I'd be much more comfortable with this personally. Trust the math, not the people.


Absolutely, but having another layer blocking access to your data is definitely a good thing. It's a good idea to encrypt your files yourself before uploading them to a public cloud.


agree!


Exactly. I rather trust well proven math more than people or infrastructure. One famous example nowadays is Bitcoin ... nobody was able to break the fundamental math behind it.


> Exactly. I rather trust well proven math more than people or infrastructure. One famous example nowadays is Bitcoin ... nobody was able to break the fundamental math behind it.

Well, there was the integer overflow bug years ago where someone could essentially create money out of thin air. But that's the only one I know of and it's a pretty amazing security track record for such a high-profile and lucrative target.

That said, this is just me being pedantic, I agree I'd much rather trust solid crypto than a promise from a person somewhere, even if that promise is in writing.


Depends on how long you want your data to be private, though. There's no guarantee that the encryption won't be broken in a decade or three. And, even if it's not mathematically broken, increased computing power (quantum?) could make brute-forcing fairly trivial.


We are all doomed if this quantum computer works and can break stuff. I also say never ever ;)


> quantum

irrelevant to symmetric crypto


Not irrelevant, it is my understanding that it can still cut the effort required considerably.

If you are using only a 128-bit key, a quantum computer can cut the brute-force effort required to 2^64, which is feasible today.


Or you could just use Tarsnap, where you can trust the math and have private backups for cheap.


True, Tarsnap is pretty on-point there, but it's also not cheap. $0.25/GB is much more than S3 ($0.023) or B2 ($0.005) - the tarsnap dev says it's because he does blocking and that makes it so much more valuable. But and there are other tools that can do encrypted backups with blocking like Duplicati and can be used with cheaper services. With this considered, Tarsnap is 50x the price of B2 - and that's without counting bandwidth.

Or if you're a cheap fuck like me, you want even lower and you go to OVH Hubic which is $50 for 10 TB for a year, with no additional bandwidth cost.


Just as a heads-up, I believe $60 all-you-can-store Amazon Cloud Drive is currently the cheapest offering for large amounts of personal data.


Does the "personal data" bit restrict me from doing automated backups of business data?


Previously: business use typically comes with expectations that tend not to align well with consumer grade products (specifically: availability and performance).

Edit: Turns out the answer is yes, no commercial use.

https://www.amazon.com/gp/help/customer/display.html?nodeId=...

1.2 Using Your Files with the Services. You may use the Services only to store, retrieve, manage, organize, and access Your Files for personal, non-commercial purposes using the features and functionality we make available. You may not use the Services to store, transfer, or distribute content of or on behalf of third parties, to operate your own file storage application or service, to operate a photography business or other commercial service, or to resell any part of the Services.


True, I guess it's seemed cheap to me because I store relatively little data on Tarsnap. (In fact, I don't think I've added any funds to my Tarsnap account in like 2 years.) If you're dealing with larger quantities of data I could see how other options would be the way to go.


Nobody cares about data junk. Especially your personal data junk if it is all encrypted. I don't think that a lot of persons will look at your data there. If this sort of good encryption you consider for public cloud backups breaks we have a lot more problems than exposed backups.


Saying "nobody cares" about your data is not a good security policy.


hm. Security policies is something you can break with one way or another. You cannot break mathematics that way.


You want math? How many combinations are afforded by your "long, carefully chosen password" in a symmetric system? How many core seconds per hour does a typical botnet scriptmonger control? Cryptanalysis of GPG doesn't even matter if Eve has enough time to brute force your symmetric key.


The encryption tools you are using are written by people, and can have bugs. Being careless and blindly trusting them can get you into trouble.


Who knows? Your private backups might even end up being conveniently available test data for nascent quantum decryption software.


I've often thought that Usenet binaries would be perfect for podcasts.


This is one of those ridiculously pointless exercises that keep me coming back to Hacker News for more. What a waste.

http://www.tarsnap.com/


Has anyone experience with that in practice? What are the data limits on this kind of backups?


It works well enough. You can find binary providers that charge by download quantity and use that style of plan to upload for free. (And upload is usually free -- usenet providers want more content, after all.)

Name your backups something inconspicuous and have your client generate an .nzb file for later recovery (has the unique mail ids for each component mail in it).

It's a little risky in that maybe providers would delete your backups if they identify a pattern, or if you name them wrong some studio may issue a (bogus) DMCA takedown against them.

Tarsnap is cheap enough that I wouldn't bother with any of this unless you're already using a binary usenet account anyway.


this kills the usenet


Also, if you ever get bored putting junk mail in your recycling bin, why not mail it to the author of this article?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: