Spammers would have to host a page (permanently) that links to your post, and even then they don't get to control what (if anything) from that page gets displayed on your site.
I guess one danger is that they only serve the page that contains your link to the webmention-validating request. That way they get a backlink but don't have to keep a public outgoing link. They'd have to know that a given request is that validation though, and I'm not sure that'd be very easy.
Webmention receivers can filter on whatever parts of a URL they want to. Maybe a WordPress implementation limits this to the domain? But as far as the spec goes, the receiver just gets a `source` parameter that's a URL. They can then decide to allow that (based on the domain, or any other characteristic they want) and at that point they check that URL to see if the document there contains the link that it's supposed to.
That's all true, and probably a better system overall, but burning an optical disk, labelling it, and putting it on a shelf does feel like a more accessible backup regime for many people. :-)
Fair enough! The danger with disks however is that it's an entirely manual operation which is easy to forget. Something setup-once-and-forget - local server or a cloud-based one like backblaze - is more likely to actually have the latest data when you need it.
(Another reason is that the disks do bit rot however, and you'll never know until it's too late. Meanwhile, my ZFS fileserver sends me a email every weekend that it's scrubbed all the disks and found no errors - this warms my heart :) )
I don't know about the best way to split things (I do it topically mostly, e.g. each website backup goes to a separate disc). But hashdeep is a great little tool for producing files full of checksums of all files that get written to the disc, and also for auditing those checksum files.
I've done this sometimes with hosting images and other large files on a combination of Flickr, Wikimedia Commons, Internet Archive, and Zenodo. Flickr costs money, but it feels like it's worth it given I'm using Netlify and all the others for free.
I know some people use S3 services for hosting images, but then you have to worry about generating your own thumbnails etc. and it's trickier.
> I know some people use S3 services for hosting images, but then you have to worry about generating your own thumbnails etc. and it's trickier.
This is one reason why I started asking, “Is this image really needed for this article.” As for the thumbnail generation, I used to have a few Photoshop actions that I just click and be done with.
Now, I just manually optimize the few images I uses in such a way that it is somewhere in the middle -- CSS can still shrink it as a thumbnail but the original isn't that too large either. Something like that.
If you still maintain that a popular website to worry about images that much, I would try out CloudFlare image service.
True. Although the ziploc bags can just be left a bit open, that's quite sufficient. The good thing about food-safe bags is that they're usually polypropylene and so good for archival use (and much cheaper than anything from a preservation-supplies shop).
> food-safe bags is that they're usually polypropylene
Interesting.
I've just checked the biggest grocery site in Norway (oda.com) and two out of the three bags they sell were low density polythene, the third was polypropylene.
Not a big sample I'll admit. I'm pretty sure that the very thin bags provided in supermarkets here for fresh loose produce are also polythene.
Oh right, I've never seen polythene resealable ones here in Australia. (The floppier 'freezer bags' are I think, but they're less useful for archiving.)
I think it's mainly PVC that's to be avoided for archiving, and office supply shops are full of the stuff (document sleeves, etc.).
An archivist once told me that if you burn a bit of plastic and it doesn't give off any smoke then it's likely polypropylene or similar, and so good to use. That's never felt like a particularly robust test though (but I'm not a chemist).
The horizontal vs vertical storage thing is interesting. I've often wondered what the rationale is with the difference, and it seems that it's cultural to a large extent.
One thing that wasn't covered, that sometimes matters for non-institutional collections, is that cardboard is thicker than plastic, and can add quite a lot to the number of boxes required for a given collection. Polyester or polypropylene sleeves (open at the top, and stored vertically, i.e. to allow gas exchange) can be as cheap and sometimes are a better option, at ~0.08 mm vs ~0.5 mm.
As others have mentioned, Transkribus works pretty well for handwritten text recognition. You can also train your own model if you have enough source material.
If the documents you have are able to be made public, you could upload them to Wikimedia Commons and use https://ocr.wmcloud.org/ — you can use Transkribus via that. (Disclosure: I'm an engineer working on the Wikimedia OCR project.)
I'm also interested in this. I've been using SpiderOak for years, but am currently trying to migrate away (to rsync.net, coincidentally). It's not that I've ever had any issues with SpiderOak, but nor do they seem to be a very engaged company (e.g. I've never heard of a SpiderOak person posting here on HN, but @rsync is never far away and is always friendly). It does sound like their efforts are in other directions.
I guess one danger is that they only serve the page that contains your link to the webmention-validating request. That way they get a backlink but don't have to keep a public outgoing link. They'd have to know that a given request is that validation though, and I'm not sure that'd be very easy.