Hacker News new | past | comments | ask | show | jobs | submit login
Introducing Google Cloud Storage Nearline (googlecloudplatform.blogspot.com)
366 points by paukiatwee on March 11, 2015 | hide | past | favorite | 178 comments



The detailed pricing is at https://cloud.google.com/storage/pricing#nearline-pricing

In short, if I'm understanding correctly:

  $0.01 per GB per month storage
  $0.01 per GB retrieval
  Normal egress fees on top, so additional ~$0.10 per GB if you want to retrieve outside of Google Cloud.
  Early deletion fee. Effectively just a minimum storage charge of 1 month.
It seems this is cheaper than Glacier and quite a bit simpler. The speed restrictions are interesting though.


Unless Glacier just changed their pricing, how is it cheaper and simpler? Glacier is also $.01/GB storage, but only $.09/GB retrieval, which is cheaper than google's $.12/GB. Glacier also comes with 1st GB retrieval free/mo.


Glacier pricing is surprisingly complicated, and the actual cost can be much higher than $0.01 per GB-month if you don't read the fine print.

The biggest gotcha is that you can only access 0.17% of the data you've stored in any given day without extra charges. So if you've stored 1000 GB, you can only access 1.7 GB per day for free.

The cost for going over your daily "retrieval allowance" can be large, because cost is driven by "peak retrieval rate". They find the highest hourly retrieval rate over the entire month, and charge the whole month's retrieval at the peak rate.

This can get expensive fast. Again, with 1000 GB stored, if you retrieve 200 GB over the course of 4 hours, your hourly retrieval rate is 200 GB / 4 hours = 50 GB per hour. Your free retrieval allowance is 1.7 GB per day / 24 hours = 0.07 GB / hour. So your excess retrieval is 49.93 GB per hour.

Amazingly, that 49.93 GB per hour for 4 hours is charged for all 720 hours in the month, so that's 49.93 * 720 * 0.01 = $359.49.

That's an astonishing $1.79 per GB just to retrieve one-fifth of the data from Glacier storage.

So Glacier only makes sense for true cold storage, where you are very unlikely to touch more than 0.00007 of it in any given hour.

Google's Nearline is much simpler and faster for the same storage cost -- and far cheaper if you need to access more than a tiny fraction of the data.

Source: http://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retr...


Thanks for the detailed breakdown, that's really interesting! Sounds like you were bitten by this? :)


Don't know about the OP, but we were bitten by this - badly - after our logic to throttle Glacier retrieval failed.

AWS has now added the ability to set a spending limit to avoid runaway retrievals. Nonetheless, from my point of view the ultra-complicated pricing scheme is nothing short of a disaster for Glacier as a product. I think it will continue to seriously impact its uptake, and Google is smart to exploit the opportunity to offer a simple, straightforward pricing scheme.


The use case is people who have a requirement to store data and probably never need to access it, either for reg requirements, or legal requirements, etc. In which case paying a couple of hundred bucks for retrieval is fine, especially if they can access at a high level of granularity

edit: Also as others have pointed out, Nearline is limited in retrieval time as well, so the cost difference isn't nearly as large.


Ok. I have to this to this. i'm sorry.

I guess he would say he was... sunglasses FROSTBITTEN YEAH!


I understand, but please go stand in the corner there and think about what you've done.


Why does Amazon charge like this? It seems like on the rare occasion that you need to send some person to go grab the tape/disk from storage and bring it online, Amazon would want you to get all the data you need and put it back in storage.

Incentivizing users to bring the data online once a day to trickle it out seems bad for all involved.


I think it makes most sense to think of it as paying for access to the tape robot.

If Amazon store your data in a tape archive (I don't know if they do, but they at least seem to have similar constraints), they can only access a small portion of the stored data at a time, so they need to control how often people request data.

They could just rate limit everyone, but this way allows people to pay for priority in an emergency while still discouraging everyday read requests.

The pricing makes more sense if you're a large user with data spanning several tapes than if you're in the single terabyte range, but the low limit still discourages you from making requests causally, which helps them keep their SLA.

If they predict that you'll trickle out your small file, they can just read out everything on the first access and cache it online, so there's no extra trips to the archive for them.


Note: The following is speculation best I know. I'm recalling from memory what I've read on the internet written by someone who did not have a direct source.

Glacier uses low-speed (5400 RPM) consumer drives, which they then clock down lower to save energy. Any given 'rack' only has enough power to power a few drives on that rack, the rest is powered down.

To prevent multiple customers from all trying to pull their data out they needed to introduce a rate limiting system, which they did with this exorbitant pricing.


You assume Glacier uses tape storage... But the same principle still applies, they're trying to prevent people clogging up the network so when somebody does need to do an emergency restore there is lots of spare network capacity.


Except for the Glacier retrieval fees which can be insane if you choose to retrieve more than 0.17% of your stored data per day.

https://aws.amazon.com/glacier/faqs/#How_much_data_can_I_ret...


But if you retrieve the same 1000 GB from Glacier in the same 3 days that Google takes you would pay a 98 dollar retrieval fee. And Amazon would charge 300 dollar less traffic cost ($ 0.09 vs $ 0.12 / GB for 1000 GB) so in the end Glacier is still 202 dollar cheaper AND allows you to retrieve the data much faster if you really have to at a certain price.

So Google looks cheaper/simpler, but the amazon model is actually quite good for disaster recovery where you may be in a situation where you "need it now at any cost"


Internet egress for 1000 GB is $120 for Google and $90 for Amazon, a difference of $30 not $300.


Glacier have additional fees depending on how fast you retrieve the data. Honestly I'm not sure what they are because I've never had to use it, but my impression is that these fees could be quite high.


They can be. It's possible to set a retrieval policy on your bucket either at free tier, or a fixed price point so you can control the cost of your final bill if you want to retrieve faster than the free tier


At least with Glacier I get the option of retrieving faster. Nearline is going to take more than 3 DAYS to retrieve my data if I only have 1TB stored (3MB/s).


Let's assume that with Glacier you can retrieve your 1000 GB in 4 hours instead of 3 days.

You will be surprised by a retrieval fee of 1000 / 4 * 720 * 0.01 = $1,800.

If you want to get your 1000 GB for free, you'd have to request no more than 69 MB per hour. It will take you about 20 months to get your data!

With Glacier, limit retrievals to less than 5% per month / 30 days / 24 hours = 0.007% of total data stored per hour to avoid these ugly surprises.

Source: http://aws.amazon.com/glacier/faqs/#How_much_data_can_I_retr...


For many people waiting 3 days is more attractive than paying the ~$1000 that it would cost to retrieve it in one day from Glacier.


For disaster recovery 3 days might mean the end of your business.


For disaster recovery $1000 is peanuts, that's a company thing, GP mentioned 'people', not companies, different use-case.


I wrote a backup program that uses Glacier, and the retrieval policies are nearly impossible to manage and explain. But the thing I really don't get is that you can use Amazon Import to read a hard drive into Glacier, but you can't use Amazon Export to get a drive full of data back out. You can with S3.

In a disaster situation, a company is going to want hard drives sent to them next day. As others have mentioned, this isn't a money thing, it's a time issue. But it isn't available with Glacier (probably not with Google either...)


If you're willing to pay more for faster retrieval I guess you could store e.g. 5TB of random data to get a fast "base speed"- cutting those 3 days down to 12 hours.


Or use Nearline for archival data as well as disaster recovery. You won't need to transfer the archival data for normal disaster recovery scenarios, so it'll be -- for that scenario -- extra data that boosts your speed, but it's still, on its own, a useful use of the service.


That would be more expensive than just storing 1TB with the normal, full speed Cloud Storage service.


"Unless Glacier just changed their pricing, how is it cheaper and simpler? Glacier is also $.01/GB storage, but only $.09/GB retrieval, which is cheaper than google's $.12/GB. Glacier also comes with 1st GB retrieval free/mo."

As long as we are comparing effective pricing, which is the only number that matters with Glacier and "Google Nearline" (and, to some degree, S3) it should be noted that rsync.net PB-scale is 3.0 cents with no additional charges.

That is, the effective price, no matter what your use-case or traffic/usage, is 3.0 cents per GB.[1]

Of course, you do have to buy a petabyte of it ...

[1] http://www.rsync.net/products/petabyte.html


Not to nitpick, but your CEO page is a bit misleading.

    > "a two year contract is required." [1]
    > "There are no contracts, overages, fees, or license charges at rsync.net." [2]
[1] http://www.rsync.net/products/petabyte.html

[2] http://rsync.net/products/ceopage.html


Quote: "This is a Beta release of Nearline Storage. This feature is not covered by any SLA or deprecation policy and may be subject to backward-incompatible changes."

So should I believe in Google's good-will? I would be fine trying out some services, which are in Google Beta. But my valuable data? They should have a SLA right from the start to gain the user's trust.


If we place too many restrictions on how companies should offer preview releases, they'll just stop entirely. Are you suggesting that even those comfortable taking on risk to get a sneak peak should be forced to wait for general availability?


Nobody is placing restrictions, but it's understandable if people are skeptical.


Praise be the skeptical! It allows the courageous easy advantages. SLAs are almost worthless. And, really, has Google ever retired something like this (i.e., not an acquisition, not Google Reader, etc)?


ReplyTo aros: The two main criteria I would have are: 1) not an acquisition and 2) important development/developer service. I can't think of any and didn't see any after a cursory look at the list.

I think SLAs are literally worthless since I don't think they encourage even slightly more effort in minimizing whatever issues the customer is concerned about. No one wants servers to go down, no one wants to shut down a service. That a few Google bucks might be on the line would have zero impact.


Easy there Ayn Rand.


No, one simply should wait for companies to offer services with guarantees for one's own data, if the data is important. Just that.


If they lose your data, yes, that's one thing.

But if the 3 seconds becomes 6 seconds... Or the price goes up... Or they announce they're end-of-life'ing the product...

Just move your data somewhere else.

Sure, it'd be inconvenient (and maybe expensive) to move. So, you balance all of that out in your mind, and maybe this is the right service for you, and maybe it's not.


I'm not even sure it'd be "expensive to move". Are people _really_ considering using this (or Glacier/rsync.net/whatever) as their _only_ copy of their data? I can't imagine looking my boss/customers in the eye and saying "We're going to have multiple terabytes of mission critical business data, and it's all going to live _only_ on AWS/Google/Cloud-service-de-jour!"

If I lose my AWS Glacier stored data (or Amazon bump the prices intolerably), I'll upload it to a competitor _from my local copies_...

Admittedly, I've only had to deal with storage topping out in the tens of terabytes range, so I've never needed to go beyond a dozen or two consumer-grade drives to keep a pair of rsynced copies locally - but I think that same kind of techniques scale all the way out to building your own Backblaze style storage pod if needed.


In a word: yes.

You're thinking about data that can't be lost, or your customers are screwed. Not all data is like that.

Log files come to mind. They're _nice_ to archive for a long time, but in many businesses, they're certainly not _critical_ to archive for a long time.

Intermediate files, too. You retain the original files in secured storage. But because the intermediate files are large and expensive to re-create, you keep them here in AWS.


think of the poor poor companies!


In beta, your build the tools to use the service with your data, but don't use it for mission critical data until you have an SLA, etc., that you are comfortable with for that purpose.

And if you aren't comfortable even investing development efforts against a beta, don't. Different customers have different risk tolerances, and the fact that an early access product doesn't meet yours doesn't mean it shouldn't be available for those whose risk tolerances it is suitable for.


> But my valuable data?

Surely the whole point of not having an SLA (and indeed the whole point of calling it a "beta") is so that you don't trust it with your valuable data. I'm sure it will have an SLA soon enough.


I don't have a whole lot of worthless data.


I do. It's called test data. I use it when I develop tools to interact with a service before it reaches maturity and before it get an SLA.

Once it has an SLA, I can use my tool to work with valuable data.


Then it sounds like you should wait for the proper release, as this beta test is not for you.


Do you have data you would like to have an extra copy of, but it wouldn't kill you to lose that single copy?


It seems like this is why they have the clause that you quoted. They aren't yet comfortable guaranteeing reliability or backwards compatibility (i.e. what "beta" means). If these things are crucially important to you, there's a pretty good chance you shouldn't be using it for your "valuable data".


It seems most useful as a place for redundant, encrypted backup. You don't really need it to be reliable (or safe) to store an encrypted online version of ~500gb of family photos that you have on an external hard disk.

You just want cheap and fast.


It's probably something that could be deployed in the "nice to have" category: use it as a cache for your offline data to provide quick recovery of things you would previously have needed to go all the way to offline to retrieve. But if something goes wrong, you still have the offline data to recover from.


You shouldn't be trusting your production data to anything in beta. The whole point of open beta testing is to pool the technical risk. So use it, but don't rely on it. If you can't afford to use it in parallel with some existing system, then don't use it.


If your data really is that valuable, any compensation promised by an SLA is likely to be meaningless.

Far better to use multiple redundant solutions, and although Nearline is only beta* it offers us an easy and cheap way to increase storage diversity.

*How long did we all rely on Gmail while it was beta?


No. Also, you shouldn't believe in ANY single backup solution.

Don't place all your eggs in the same basket.


I added GCS nearline to my object storage comparison:

http://gaul.org/object-store-comparison/


Just a correction: the city name is Sao Paulo with a 'u'. It has the tilde at São but if you don't want to you can omit it. The thing is that Paolo with an 'o' isn't a Portuguese name (it's Italian). In English it's written like in Portuguese. I see it sometimes so I just got in the habit of reminding people, wasn't calling you out, just trying to prevent future mistakes like that.


Corrected; thanks! Note that GitHub hosts this project for any further issues or corrections:

https://github.com/andrewgaul/object-store-comparison


Actually there are not 7 regions for Google Cloud Storage but 3: US, EU, ASIA. You can get more specfic but that's still Alpha: ASIA-EAST1, US-CENTRAL1, US-CENTRAL2, US-EAST1, US-EAST2, US-EAST3, US-WEST1.

Note that there are separate zones for Google Compute Engine: https://cloud.google.com/compute/docs/zones#available


Corrected; thanks! Note that GitHub hosts this project for any further issues or corrections:

https://github.com/andrewgaul/object-store-comparison


you are missing runabove edit: also missing constant.com


Didn't know they offered object storage, thanks. $0,01/GB/month - for online content - replicated 3 times.. Wow

Any experience with them? how is their stability and speed?


No I haven't used them. But their parent-company(ohv) has the ~lowest prices you can find.


I have not previously encountered these providers; could you submit a pull request here:

https://github.com/andrewgaul/object-store-comparison


1c/GB/mo for data stored, ~3 seconds response times (cf. tape/Glacier at multiple hours), 11 9s durability, same API as Google Cloud Storage online.

(edit: add /month)


There's no mention of durability numbers anywhere in the docs. S3 and Glacier mention 11 9s durability in their FAQ though.


Google Cloud Storage Nearline offers the same durability as the Standard-priced storage offering. See the first paragraph of https://cloud.google.com/storage/docs/nearline-storage


That pricing is per month. (I had to look it up).


Isn't the price a little high? $10/TB/mo is the same as Dropbox, and this has a lot fewer features than Dropbox.


I understand the downvotes (they're very different services, and Dropbox prices can be very low because you're paying for capacity not usage - case in point I'm currently using 25 gigs but paying for a terabyte) but honestly - that's an interesting point... anyone want to write an API?


It's a bit apples to oranges comparison. Dropbox is a consumer product, like Google Drive. Nearline is targeted to app developers and enterprises


Exactly, so why would the lower-level service aimed at big slow transfers be more expensive than a full-featured product?


There's a few reasons:

* Dropbox doesn't guarantee 11 9s like this service does - when I'm backing critical data up, I want to make sure it's _there_. * Dropbox likely wouldn't take kindly to me storing 10PB, whereas that is what this service is designed for. * I've got SLAs and guaranteed speeds with this, Dropbox isn't designed for me to suddenly download 10PB very quickly.


Where are you seeing these durability claims? I can't find them. And what would 12 9s durability even mean? You lose one byte out of every TB?


It's 11, not 12, I misspoke.

Those claims are from glacier: http://aws.amazon.com/glacier/faqs/ which is priced the same per GB of data.

In this case, the durability refers to the loss of data/objects stored per year - if you're sending multiple PBs of data off to Glacier, you want to be able to retrieve them many years later. Even 5 9s would mean that 1 object out of 100,000 is lost every year, which is quite poor.


I'm thinking (unrecoverable data loss / total data stored) in their system per year.


12 9s is new to me too. Not worked on anything better than 5 9s (5 mins downtime/year), and Wikipedia only goes to 9. 9 9s comes to 35ms downtime a year. I can't think of anything that needs more than 5 9s, let alone 9.

http://en.wikipedia.org/wiki/High_availability#Percentage_ca...


This is a durability metric, not an availability one. This typically tells you the likelihood of losing a given object in a year. (5 9's would be quite poor for this, implying a loss of 1 object out of 100,000 every year)


It was 11, not 12. I misspoke: http://aws.amazon.com/glacier/faqs/

In this case, it's not as much service uptime as it is data retention. If you're storing 5+ PB of data, even 5 9s of data loss per year can have a measurable impact.


Do you have any information to suggest that the Google Nearline service can handle 10PB?


If you'd like to put 10PB in storage, let's talk offline :)

source: SE at Google Cloud


It's safe to assume this is going to be just like Glacier and S3/Google Storage, ie: unlimited.

Also retrieval speeds increase with your data set size: Note: You should expect 4 MB/s of throughput per TB of data stored as Nearline Storage. This throughput scales linearly with increased storage consumption. For example, storing 3 TB of data would guarantee 12 MB/s of throughput, while storing 100 TB of data would provide users with 400 MB/s of throughput.


I was just interested as to what these storage systems are capable of supporting. While I'm confident they could all store 10 TB of Data (That's just barely Tier-2 of 6 for Amazon), I'm wondering if they have the back end capability to store 10 PB of data.


The issue is with all the 9s, is it will take 3hrs to find out there is an issue, and another 3hours to make another request? I bet dropbox could fix their shit in less time.


The full-featured product is $10 regardless of how much storage you're using. If my requirements are for 1GB, then my price is $0.01 a month.


Hm yeah, I must admit I have a hard time understanding the high pricing on cloud storage. I always end up comparing to something like OVH storage servers, and cloud storage seems way overpriced..


Based on the prices I can see on OVH's website you're not going to be able to hit 1c/gigabyte with many nines of durability. And thats ignoring the cost of operating the service yourself.


SLAs on availability/durability & guaranteed throughput..


I only have 100GB data to store, which is $1 a month. A lot cheaper than Dropbox.


Total apples/oranges. You are comparing a Consumer System which is focussed on providing folder synchronization to a system that is meant to be used at scale (and is scaleable).


Isn't Dropbox bit expensive? Check out hubiC.com


This is a nail in the coffin of the SaaS backup industry (Backblaze, RescuePlan, etc.).

With storage becoming essentially free the last excuse for not using self-hosted, secure backup tools like Arq[1] disappears.

[1] http://www.haystacksoftware.com/arq/


As a former Arq user, I'm happy to pay BackBlaze $5/month for not ever thinking about the backup problem again, on a bunch of family machines (mostly Macs but a couple of Windows machines as well).

No thinking about installing software (e.g., Arq, and finding something equivalent for Windows), keeping that software up to date, checking that it's up and running properly, etc. (BackBlaze tells me when it hasn't been able to reach a given machine after some period.)

So, no, I think the SaaS backup industry has nothing to fear from cheap online storage, at least for ordinary folk. (Hackers are a vanishingly small segment of that market.)


I'm adding support for creating nearline buckets in Arq right now.


Update (5 days later): It's added! If you're an Arq 4 user, pick "Check for Updates" from Arq's menu to get it. Go to Preferences, Destinations tab, and click the + button. Pick "Google Cloud Storage" for the destination type.

It's really great. I'm using it instead of Glacier for all my personal backups.


This is exactly what I want for Linux (headless), but I haven't been able to find anything like it. The closest thing I've found is attic, which, while amazing, requires a daemon to be running on the remote end (which is what makes checking, deduplication and deltas fast).

Does anyone know an Arq alternative for linux?



Unfortunately duplicity is very inefficient, it needs to reupload the full set every so often, which takes up a lot of bandwidth. I guess attic is my best bet in the foreseeable future.


Haven't used it, but Duplicati is another one: http://www.duplicati.com/


Looks very interesting, thanks! I'm investigating it now.


I just came in here just to ask/comment on: I hope Arq implements this. It's great to see such a prompt response. Thanks


> the last excuse for not using self-hosted, secure backup tools like Arq[1] disappears.

Hardly. I pay for support. I pay for liability. If something goes haywire with my backups, I have a phone number I can call. A number that's not software developer/ sys admin who has no idea why I'm calling about my mom's missing holiday photos.


"I pay for support. I pay for liability. If something goes haywire with my backups, I have a phone number I can call"

Google "{service} lost all of my data" and see what kind of support these people got. I get that it gives you warm fuzzies but when these companies loose all of your data, you have almost no recourse.


The free services have no warranty on anything. The services you pay for (eg: with money, not just data) have a warranty.

If they lose any data, they're liable for that as well. All the way up to taking them to court on it.


Sort of. I've called DropBox about missing data before. They actually were quite helpful and even helped recover most of it, despite the fact that I wasn't even a paying customer at the time.

A more likely scenario is that the data is not lost, but my mother (or whoever is calling them) simply has things misconfigured. These companies _do_ provide useful support. Support that they're not going to get if they backup to a developer oriented service directly.


Backblaze is $5/mo per computer. If I have 4TB on a computer, it's still just $5/mo.


"Backblaze is $5/mo per computer. If I have 4TB on a computer, it's still just $5/mo."

Beware flat rate service providers - it's not a consumer-friendly model:

http://blog.kozubik.com/john_kozubik/2009/11/flat-rate-stora...

Your interests and the interests of your provider should be aligned, not opposed.


Well, I would guess the number of people who need to backup 4T is rather small.

My thinking is that most people likely max out well below the 0.5T that the same $5 buys you on Google now, making that variant actually cheaper for them than Backblaze.

For those people it means paying less per month and getting privacy and better control (data retention!) in return. Should be a no-brainer.


Backblaze lets you add a personal encryption key. I guess they could log that key when you try to decrypt and restore, although I trust they don't. I suppose the NSA probably gets the key when it gets transmitted, but I don't really care. Am I missing something here? If Backblaze's encryption implementation is substantially worse, I may switch.


I suppose the NSA probably gets the key when it gets transmitted [...] Am I missing something here?

I'm not sure if you're being serious or sarcastic. ;)

Anyway, if you don't care who has access to your backups then obviously my argument has no relevance to you.


"Well, what do I have to hide?" is a bad argument with respect to whether the NSA should be doing the things it does, but with respect to whether I am going to spend money and effort to hide my photos and documents from them, I think it's a fair argument.

I care very much, however, whether hackers will have access to my files, as they can cause havoc with things like tax returns that the NSA won't (I mean, the government already has my tax returns....).

So I was mainly curious if there was some significant flaw in Backblaze's encryption that should worry me from the perspective of a non-nation state adversary.


Can anyone comment how Crashplan stacks up vs Backblaze here? I'm leaning towards it as it'll save the data for more than 30 days after deletion, and security's another factor in my choice.


It's much more flexible in terms of external drives, and deleted files.

You can also set multiple backup locations, by sharing space on your other computers.

They have a proprietary Java client, but it runs cross platform. It's completely painless to add your own encryption key to all backups.


Backblaze can probably afford to be $5/mo per computer because the average computer using it has less than 4TB; OTOH, if there is a more cost effective option for the low end, it won't make sense to use Backblaze on the low end, which will drive up Backblaze's sustainable per computer price.


Yev from Backblaze here -> Fair point! Our costs keep decreasing as well -> https://www.backblaze.com/blog/150-petabytes-of-cloud-storag... and we're able to grow quickly (https://www.backblaze.com/blog/vault-cloud-storage-architect... ), so even if avg. user size goes up, decreasing costs make up for it overall.


Yes, but would you spend weeks uploading your 4TB?


Well... yeah. But it doesn't cost me any money to restore either, so it's much cheaper. I don't think they're going to feel threatened by this.


For disaster recovery Nearline's speed restriction can be prohibitive. Not everyone can afford to wait 3 days for their backups to download.


I'm considering switching from Backblaze to Arq to backup personal files (photos, etc.) from my rMBP. Anybody have any experience doing this and can recommend one vs. the other? I have a 500GB drive, probably only 300GB to backup. Backblaze is only $5/month, and let's you encrypt, so I'm wondering if it's even worth switching.


I'm using Arq with DreamObjects currently (hope Google support will be added soon), backing up two laptops and a desktop, total backup size ~3T.

I can hardly recommend it high enough.

The only minor niggles that I've run into were:

- If you have a very large directory (hundreds of thousands of files) then opening the backup-catalog can give you the beachball for minutes. However this really only happens for ginormous directories and is easily resolved by splitting the job into a few smaller ones.

- If you're low on diskspace then the temp files that Arq creates during a backup can drive you over the edge and into a disk-full situation.

Other than that it has been rock-solid for me, the author really knows what he's doing.


How is using Arq+Nearline better than Backlaze (client and storage)? Not really "self hosted" if you're still dumping the data to a cloud provider. Even if you configure Arq to use your own "cloud" (cost of that notwithstanding) I don't think it's an overall net improvement.


(I'm the developer behind Arq) With Arq you keep control of your stuff because it's encrypted before it leaves your computer. With Backblaze you can choose your own encryption key, but to restore your files you have to give them your key; they decrypt your files and leave them (unencrypted!) on their servers for you to download, which I think makes choosing your own encryption key pointless. https://news.ycombinator.com/item?id=8169040 https://www.backblaze.com/backup-encryption.html


arq (which we love) + rsync.net = success.

We have a "HN readers" discount which is fairly substantial.


= success

rsync.net is 20x more expensive than Google Nearline ($0.20/GB).

Why would anyone choose to use it for Arq when Arq supports both?


While I personally don't mind Google having all my backup data, we must appreciate the fact that in this post-Snowden world some people will justifiably refuse to go with the big USA-corp players like Amazon or Google.

rsync.net just saying: "Hey, we're not Google or Amazon" is already a big selling point for some.


I encourage you to email us and ask about the HN readers discount.

Further, Amazon S3 is a better comparison, as far as pricing goes - we're fully live, online, random access storage - not nearline or weirdline like google/glacier.


Arq would be exactly what I wanted if it wasn't Mac-only, and preferably open source.


Windows version coming soon! (I'm the developer behind Arq)


Fantastic, also support for Windows servers? And how do you lots of files and 50tb of data?


It's also extremely expensive. I can put some effort for 5 USD per month for storage, but 40 USD upfront is prohibitive for me.


The cost of one dinner out is prohibitive? Just testing a backup solution alone would cost more than that, much less rolling your own, unless your time is worth considerably less than $40/hour.


I spend ~1.5USD per meal (for two people). We don't all live in first world economies with high-paying jobs.


So is this a blatant dig at Amazon's Glacier? "Fast performance... unlike competitors"

As they generally do tit for tat on price wars, it will be interesting to see what Amazon responds with.


> So is this a blatant dig at Amazon's Glacier?

Glacier is the other big offering from a comparable cloud vendor in this space, so, yeah. The performance claim is directed largely at Glacier, as is the consistent access one.


So how is this implemented? Spun-down disk drives?


My guess would be disk drives that are otherwise used for I/O intensive purposes, and where there's tons of unused/wasted space.

4MB/s is only around 4% of the bandwidth of a modern 1TB hard disk.


Why would they want to risk the performance of the I/O intensive disks?


With a 3s average response time and presumably replication across multiple disks, you've plenty of scope to schedule requests in a way that they won't affect performance.

In 3s on one disk you can do roughly 5k seeks and read up to 300MB. They only need to do 1 seek and read 4MB.


You can store 1 Pb and get 4000 MB/s


And at 3MB/s per TB it will always take over 3 days to retrieve all of your data. The 2-5 second retrieval latency is irrelevant when you're going to be waiting for 3 days...


FWIW it's nominally 4MB/s/TB:

> You should expect 4 MB/s of throughput per TB of data stored as Nearline Storage.

But still, you are correct in that it will take about 1TB / 4MB/s = 2.9d to retrieve all your data.

If you need the ability to do a restore faster than this, then you need to pay more for storage, is all. For many of us, waiting 3d to recover from a catastrophic failure isn't a big deal.


Is 3 days particularly slow for restoring a petabyte of data from an offsite backup?


s/over 3 days/no more than 3 days/


Presumably distributed across 1000 times more disks.


In my experience, spun-down disk drives take in the ballpark of 10-15 seconds to spin back up.


I own a big wholesale telco that does tons of data center business and bandwidth. Of course, margins are super thin so we need great pricing. We are no Google, but we can achieve the same pricing, including the cost of bandwidth (that is 9-9s of durability and no spin-up time, which is where the 3-seconds come from)

Maybe I should start my own service to harness all this infrastructure with something like swift!


from the documentation: "You should expect 4 MB/s of throughput per TB of data stored as Nearline Storage. This throughput scales linearly with increased storage consumption."


That would mean that if you had about 1TB stored it would take more than 3 days to retrieve it (with an initial 3 second delay before it starts).


And since bandwidth scales linearly with used storage it would presumably always take more than 3 days to retrieve all of your data.


That's a good observation. I wonder how it works for multiple objects that are significantly smaller than 1TB. If I'm streaming one at 4MBps and then after a bit, decide to start another object download, does the original slow to 2MBps?


There it is. Still, someone like 'cperciva could pool a lot of their customers' data together and offer high speeds even for small amounts of data.


http://xkcd.com/1494/

The unusual thing about this market is that by constantly improving the price point of their product but keeping their profit margins the same, AWS has forced every competitor in it to actually compete at the limits of their capability - so any scheme like this that got any serious traction would presumably be self-undermining. The economists will have considered all of the unused provided capacity in the model before they priced it. Sorry to be miserable ;)


I don't know why you think so. AWS is not very price competitive. I make good money consulting on setup of more cost-effective alternatives to things like S3.

The large cloud providers sell on brand recognition, convenience and trust, not cost. They may compete with each other on cost to try to cannibalise each others markets, but they're not even close to pushing the envelope on cost efficiency in terms of the prices they offer.


Interesting - I'm speaking from general market sentiment (e.g. http://www.techrepublic.com/article/aws-is-playing-chicken-w...) but I'd never heard the "Mountain of Margin" argument (http://www.forbes.com/sites/mikekavis/2014/08/08/how-niche-c...) before you called it in to question - thanks for bringing it up. I now don't know what to believe - time to go and do my homework on this one.


Just wondering, what is more cost effective than S3 with the same reliability?

If you have your own servers I can see running zfs and replicating snapshots every few minutes to a remote machine but if you aren't, what else is competitive?

I do agree that a lot of AWS services aren't cost effective though, especially at any sort of non trivial scale.


Setting up your own with any of Ceph, Gluster, Riak, Swift or similar depending on your specific needs. Heck, I've got setups where we've been served well for years with a combination of inotifywait, grep and rsync (combined to trigger rsync instantly on modification events). It really depends on your access patterns, and there are lots of potential savings from making use of domain specific knowledge for your specific system.

In general, you can beat AWS with 3x replication across multiple data centres even with renting managed servers, especially as your bandwidth use grows as AWS bandwidth prices are absolutely ridiculous (as in, anything from a factor of 5 to 20 above what you'll pay if you shop around and depending on your other requirements). Lease to own in a colo drops the price even further.

Anything from 1/3 to 1/2 of AWS costs is reasonable with relatively moderate bandwidth usage, with the cost differential generally increasing substantially the more you access the data.

Most people also don't have uniform storage needs. If you use your own setup, people tend to be able to cut substantially more in cost by reducing redundancy for data where it's not necessary etc.

See also the Backblaze article that mentioned Reed-Solomon - if your app is suitable for doing similar you can totally blow the S3 costs further out of the water that way.



Google Cloud Storage is also massively overpriced.


Compared to what? Honest question - not trying to be cheeky.

If you are comparing against a self managed solution, are you factoring all costs into that equation (fully burdened labour costs, disposal, cooling, power, etc.).


For a reference point: The hardware cost for storage servers with full redundancy is in the order of $0.10 per GB (plus electricity, rack space etc.). Buying your own hardware and throwing it away every year is price competitive with Nearline (and you get full online storage, no three second delays and slow retrieval).

Of course there's good reasons not to roll your own solution until you reach a certain scale, but it's certainly possible to compete with Google and AWS on price.


> For a reference point: The hardware cost for storage servers with full redundancy is in the order of $0.10 per GB (plus electricity, rack space etc.). Buying your own hardware and throwing it away every year is price competitive with Nearline (and you get full online storage, no three second delays and slow retrieval).

Well, it would be price competitive if you considered only the hardware costs and not, e.g., the labor cost involved.


Consider that there are labor costs around operating services on AWS as well, and it's generally more expensive by the hour, and labor cost of managing your own servers is fairly low.

I have plenty of servers sitting in racks that haven't been touched in 5+ years; if you were to operate on a "buy and throw out in a year" practice, my typical labour costs would average a couple of hours per new server for setup. Or if you rent managed servers, you don't ever need to touch (or see) the hardware.

Conversely, you can get bandwidth at a tiny fraction of the cost of AWS bandwidth, so the moment you actually transfer data to/from your setup, AWS gets progressively more expensive.


The same egress and data transfer charges as for Google Cloud Storage Standard / DRA. Can transfer the data from Standard to Nearline bucket for free until Jun, 11 - https://cloud.google.com/storage/pricing#network-regions


That promotion is talking about transferring data between geographical regions, not storage classes.


slightly off topic here, but when i tried to setup an account, which should get me $300 free, it auto selects my country (Ireland) and then tells me:

Tax status Business

This service can only be used for business or commercial reasons. You are responsible for assessing and reporting VAT.

and then makes me enter a VAT number... well, I will stick with AWS and Azure, since neither of those "REQUIRE" a vat number or business status.

PS: i know a business name costs about EUR20 to setup, but the VAT requirement is the pain in the left one...


Where did you read about $300 for free?


Our story with Amazon Glacier (50TB):, we are a company with 50TB of data on multiple Servers and NAS, and some External drives, most of the data we needed to save/Archive for 10 years at least, it’s starting to become very hard to manage locally, so we saw glacier a great solution for us! Well, not exactly

We tried to manually using it, like APIs, Scripts, etc… it’s impossible, it was very had

So we tried cloudberry to help us upload, it will upload but also It won’t work for huge data, with glacier you won’t be able to search, list, find any file you want to download easily, also its not practical to manage all the backup and millions of files manually, also we got millions of photos we needed a way to find them easily like thumbs So Glacier had so many restrictions like 3-5 hours restore time, 5% restore quota hard to use even with utilizes like cloudberry, cant list, search, it’s not useable as its! But the price was attractive

We considered Seagate Evault , but it was expensive and so many hidden fees and complicated for our case Then we tired another solution called Zoolz ( http://www.zoolz.com ), Zoolz does not use your own AWS Account, but utilize their own account and from what I heard from them they got 5 Petabyte, so they got massive restore quota, around 5% of the 5 Petabyte, also its simple to use, they offer Zero restore cost, it’s like Mozy or Crashplan but for business and on glacier , like they internally create thumbs for your photos and store it on S3, so you have instant preview, also you will be able to search and browse your data easily and instantly, they got Servers and Polices, and got a reasonable price, all we wanted , we got the 50TB for $12,000 / year, its more expensive than using Glacier by itself, using Glacier will cost around $6000, but its not practical for a company to use it as a standalone storage

The only disadvantage is that when/if we needed a file we have to wait 4 hours to get it, which is fair, it was faster than when we used to use Tapes :), awe tried to restore 1 TB with 1.2 million files, it took us around 10 hours to complete, which was okay


Can you do rsync style differential backups with this?


gsutil (the Google Cloud Storage command-line tool) has an rsync option: https://cloud.google.com/storage/docs/gsutil/commands/rsync


Is azure has the product like this? Usually cloud war start from amazon, then google, and then azure is the last one


I still rather use a Cloud Service provider like Zoolz as it would be more efficient to benefit from other plans. Amazing idea though!


Great to finally see a cloud service competing against Glacier. Hopefully this will lead to some sort of response from AWS.


I wonder what the underlying hardware is?


Does anyone know of any online-storage offering that allows paying in bitcoin and allows API access? Thanks


I'm sure there are various FTP storage providers that are accepting Bitcoin.


tarsnap accepts bitcoins


"nearline, still online, but we could make a graph that is supposed to make you believe its not"


The transport for this service is HTTP? I imagine most competitors use that as well, right? How does encoding factor in here? I have to transcode my data to base64 in order to put it in or take it out, I assume? I "know" I'd only get billed for the data stored as the original octet/binary encoding. But what about the egress fees? Encoded or decoded data is the input for the billing?


HTTP can transfer arbitrary binary data just fine. There is no need to encode it.

I don't know which they use but HTTP traditionally uses a `Content Length` or an empty chunk to represent the end of data.


There is no need to base64 encode the data on the way in or way out. I cannot think of any APIs that use base64 encoded requests or responses.


Some of these Google/Amazon/Apple "Introducing Product X..." stories are starting to strike me as intellectually devoid, creeping into "free advertising" territory.

This particular story is about a Google service that is an extremely late entrant into the market (Amazon Glacier, etc) and offers almost nothing new. I'm not saying it's a bad product or not useful to folks, but it's not, by any means, groundbreaking. I'd much rather see the top spot of HN occupied by some startup's new idea or a researcher's new findings.

I also don't mean to imply that "big corporate" == bad. Certain products -- self driving cars, Space X automated landings, etc -- are absolutely worthy of our attention and discussion. I just hope that people would think twice before upvoting a story merely because it's from Google.


I suggest you read, for example, this article from TC:

http://techcrunch.com/2015/03/11/google-launches-cloud-stora...

This information should challenge your argument that Nearline is not groundbreaking, at least in the "cloud archival storage" space.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: