Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[dupe] AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks (amazon.com)
142 points by chang2301 on Dec 1, 2016 | hide | past | favorite | 112 comments



    Never underestimate the bandwidth of a station wagon full of
    tapes hurtling down the highway.
—Tanenbaum, Andrew S. (1989). Computer Networks. p. 57.


In the same vein, a South African carrier pigeon was faster than a popular ADSL solution in 2009. It carried a 4GB memory stick 60 miles in about an hour. It took them another hour to upload the data to their system. The ADSL solution only completed 4% of the transfer, in the same amount of time.

http://news.bbc.co.uk/2/hi/africa/8248056.stm

It was an obvious publicity stunt trying to bring attention to the slow connection speeds but it does illustrate this issue well.



It's even an official RFC:

https://www.ietf.org/rfc/rfc1149.txt


Never underestimate the street value either. I wonder if they have an armed escort for this truck -- the hardware must cost on the order of 10-20 million, and the data itself could be worth many multiples of that. Could make a great heist movie.


The Snowmobile product page covers this:

"Snowmobile is protected by 24/7 video surveillance and alarm monitoring, GPS tracking and may optionally be escorted by a security vehicle while in transit."


The hardware might be expensive but the data (if there's value to it) would likely be encrypted.


I'm sure you're correct, but encryption has never stopped movie villains. Kidnapping and coercion is usually the plot device used


Well, I'd say it's 50/50 kidnapping and coercion vs a hacker spending an hour breaking in to the systems.


An hour and a half if it's double encrypted.


> Could make a great heist movie.

Actually 'in the movie' the plot line comes at the end when you find out that the government is behind the heist in order to get hold of a large amount of end user data that they need for surveillance purposes.


I wonder if Amazon pays for the re-do if the truck ends up in a spectacular accident.


I would assume this is where insurance becomes involved in the matter on one side of the other.


Can't one simply mirror a Snowmobile to a second Snowmobile prior to transport?


Redundant Array of Independent Trucks?


we use RAIT 50 in prod


As other commenters noted, it's fascinating that no matter how advanced the networking technology progresses, we'll always have a variation of "sneakernet"[1] to bypass the limitations of the network. The sneakernet just evolves from floppies to 45-foot shipping containers.

If humans later colonize Mars and want to have the full 50-terabytes copy of Wikipedia in the biosphere, it's faster to send some harddrives as a rocket payload on a 6 month journey rather than try to transfer it via the 32kbps uplink[2] which would take ~500 years.

[1] https://en.wikipedia.org/wiki/Sneakernet

[2] http://mars.nasa.gov/msl/mission/communicationwithearth/data...


A screenshot of Tanenbaum's "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway" argument with the context around it from his book.

http://imgur.com/a/qi4BP


A thing bothers me regarding that calculation. Shouldn’t the time it takes to write the data to all those tapes be taken into account? I.e., if you’re using tapes to transport data, your real bottleneck is your tape drive write speed.


Perhaps an artifact of the time period. In the 90's, you likely already did backups to LTO/Ultrium tapes daily. So the time was considered already sunk cost.



Even with x-rays theoretically providing a future 1 Gbps link to Mars, the underlying issue is that the growth of data vastly outpaces the advancements in speed technologies.

In other words, by the time x-ray communication is a reality to the Mars colony, we'd want to copy 50-petabytes (~15 years transfer[1]) or 50-exabytes (~15000 years transfer). The 6-month rocket journey is still faster than those scenarios.

[1]http://www.wolframalpha.com/input/?i=(50*10%5E15)+%2F+((1*10...


I keep thinking this will stop because there's a human limit to what we can generate.

But then I install another 60 GB game and realize not any time soon...


Yep. And as I processors reach their ceiling, I imagine more effort being put on precomputed scenarios so that they become more or less glorified look up machines with instructions stringing them together.


I think the better way of phrasing this is that the growth of data vastly outpaces our ability to implement advancements in speed technologies.


Multiple laser beams? That would scale.


Adding a laser beam scales linearly, both in speed and cost. The amount of data we generate, store, and use scales exponentially.


yes but as long as the laser beam can carry absurd amounts of data, that linear coefficient is just fine. Nobody needs to sync trillions of cat videos (which is the source of exponential growth) between Mars and Earth.


No, the nature of technology building upon itself is the source of exponential growth. As long as storage and processor technology continues to improve - and it will - we will find uses for that space, and network technology will need to keep up.


I just realized how much of a limited resource the Mars -> Earth internet connection is going to be in the future. It'll need to be limited to only absolutely necessary communications for a long time.

Assuming we get enough people on Mars, they'll have their own internet over there.


Startup Idea: Intersteller Intermodal Container Services. If you're the first, you'll make the money.



I feel unusual when I see articles like this. I deploy "workloads" that require instances, auto scaling, Multi-AZ etc. It makes my projects feel minuscule at the scale of other companies that actually use something like this! I wonder how many companies will actually use this in any given year.


I imagine surprisingly many. I have operations that are not remotely on that scale. 8 employees with total data on the order of tens of terabytes. I found that to be a surprisingly heavy density of data per employee. A 1000 employee company with the same density is on the petabyte scale.


I wonder why this makes sense. Isn't it more useful to get a few hundred snowballs and ship them via Fedex? You can transfer in parallel and should be at the same speed as with Snowmobile. It's at the DC next day and the data will be faster in S3 than by truck. Also, the economies of scale will never pay off for Snowmobile, likely more for Snowball.

At the same time, logistics (incl insurance and security) is handled by companies that are very good at it. Fedex, DHL and the like offer physical security services for goods if you need it in addition to encryption.

Think it's a PR move only. They will probably find a few clients to somehow utilize one truck, but I don't think it's more efficient than Snowballs.


Installing, powering and cabling "a few hundred" of anything in a datacenter is a big deal. You probably don't have room. You may not have power. You have to deal with hundreds of boxes, cardboard isn't allowed on the datacenter floor (ideally), and just mucking around on the loading dock wrangling stupid stuff like shipping labels is going to suck up a ton of time.

[I'm a C++ dev who likes to help design and build datacenters. It's fun.]


But isn't that the same problem with the truck? You also need to get that connected to the datacenter for a week? I guess as that connection has to leave the building, it should be even harder?


Just run a conduit of fiber out to the parking lot temporarily.


The Snowball ships bare, without packing materials. You receive it, plug it in, transfer data / run code, and ship it.

Still, you're right that there's a logistic issue of physically locating, networking, and powering these devices.


"One Snowmobile can transport up to one hundred petabytes of data in a single trip, the equivalent of using about 1,250 AWS Snowball devices." -- https://aws.amazon.com/snowmobile/

You'd have to find a thousand 1GBe ports in your data center (unless Amazon would ship an expensive switch along with Snowballs) -- that's about two server cages' (10 racks x42U) worth. You will have to find a lot of power -- while Snowmobile can bring a generator.

I doubt there will be a lot of demand for Snowmobiles, though.


If you don't have the ports for Snowballs, you won't have them for Snowmobiles. And instead of copying 1 week, shipping, copying another week back, you can have the first data over after 1 day. This can be crucial to already integrate data before everything's there.

I understand the size of a Snowmobile, but if you don't somehow utilize it exactly 100%, I don't think building your own solution is cheaper than renting logistics as a service from others. Basically AWS in the logistics world.


The advantage with Snowmobile is you only need a dozen or so ports, not hundreds.

Also - think about the person having to do this - If I had to move 100 petabytes of data to AWS - I really don't want to be messing around with hundreds of appliances, tracking them, figuring what we uploaded onto what devices, figuring out how to fragment the data to properly fit on all the devices, etc....

Definitely want to go the Snowmobile route.


I'm not sure that's true -- thinking about our own (small) datacenter, we can find 10 10 gig ports much easier than we could scrounge up 100 1 gbit ports that are in the right place on the network to do a big data transfer like this (i.e. we're not going to open cabinets to plug snowballs into 10 different top of rack switches). Sure we could buy more switches to fan out our spare 10gig ports into 1gig ports, but why bother when AWS has them built-in to the truck?

It seems that any datacenter that's big enough to have a PB of data is going to be able to find 25 40 GBit ports easier than 1000 1Gbit ports.

Even if you only utilize 25% of the Snowmobile, that's 250 individual Snowballs you don't have to handle.

(just bounced this off of one of our network engineers -- if we did have to handle 100 snowballs, we would buy switches, set them up in the big conference room next to the datacenter and run the 10 gig fiber drops over to that conference room where the snowballs would be. He said we're probably looking at $30K in hardware costs to set it up, so the Snowmobile might be cheaper even for 100 snowballs worth of data)


> However, customers with exabyte-scale on-premises storage look at the 80 TB, do the math, and realize that an all-out data migration would still require lots of devices and some headache-inducing logistics.


Lots of devices certainly. But headache-inducing logistics? I think getting snowballs via Fedex/DHL, copying them and sending them back is easier than figuring out how to connect the truck to your in-house datacentre. Most won't have a spare 1Tbit/s connection in the parking lot.


If you're looking to transfer 100s of petabytes, I imagine you'd find a way to get a 1Tbps connection to the parking lot, rather than connect and disconnect thousands of Snowballs. It's clearly intended for a different use case than transferring a few petabytes.


> Isn't it more useful to get a few hundred snowballs and ship them via Fedex?

Corporate types, spending loads of company money, are more interested in convenience and turn key solutions to problems that rigs that cost less but give more to think about or other issues.


I'd wager this is already basically a trailer full of racked snowballs, perhaps without the ruggedized enclosures.


Atomic transaction FTW.


Unless my math is wrong, storing 100PB in S3 is something like $2,750,000 per month.

Then what happens in 5 years if local storage cost dropped by a factor of 10, but S3 cost did not drop?

Big risk, no?


If somehow a technology is developed to allow local storage cost to drop by a factor of 10, don't you think S3 would make use of the same technology to stay competitive?

Cloud storage is a commodity these days. The market saves you in this case -- if one cloud provider didn't use the 10x technology and pass along the 10x savings to the customer, another company would do it and steal all of their customers.


You'd still need to get your data out of Amazon's cloud and into the competitor's. That's neither cheap nor easy.


My point is that all cloud competitors would necessarily switch for economic reasons. If not to keep you from leaving, then to keep a competitor from capturing all new growth.


Thats not how it works. People still use IBM mainframes, despite linux being 10x cheaper or more. If you got locked in sometimes you get locked in.

Many AWS services have no easy migration path off it.


That's absolutely how it works.

If a 10x cost reduction storage technology comes along, cloud providers will necessarily adopt it and will reduce their prices by approximately 10x.

Here's why:

- If they don't, it will become more cost effective for potential customers to run their own datacenters rather than put data into the cloud, so their growth will basically stop.

- Even if potential customers don't want to run their own datacenters, both potential and existing customers will put new data to a competitor who did pass along the 10x savings to the customer. So again, their growth will basically stop.

This is the nature of a commodity product in a free market. Basically all cloud providers use an S3-compatible API, and costs and performance are in the same ballpark. There are tons of open source compatibility layers that abstract which provider you are putting to. If one of them starts costing 10x less, you just flip the switch and all new data goes there.

The ability for customers to completely cut off growth of their service if prices don't fall is a supreme motivator. The only case this wouldn't be true is if all of the cloud storage services formed a cartel to fix prices. But given that every time Google or Amazon lowers prices, the other one follows to maintain parity, we have evidence that that isn't the case.

Addendum:

I don't understand your analogy at all. IBM mainframes are a specific type of hardware that excel at highly available batch and transactional processing. Linux is an operating system. Linux is free, so it's infinitely cheaper. Also, IBM mainframes run Linux.


> If a 10x cost reduction storage technology comes along, cloud providers will necessarily adopt it and will reduce their prices by approximately 10x.

Here are my two counterpoints.

1) Bandwidth prices HAVE fallen 10x in the past N years. Many cloud providers (ovh, etc) DO offer this price drop. Yet how many people really left AWS or GCE for ovh? I would guess not that many.

2) As I said with mainframes.. there is a 10x cheaper option to a mainframe that has been around for oh 15 years. But people are still on them, BECAUSE they are still locked in to them. That is my point. Don't get locked into a single anything. Sending 100 PB data for a commercial entity to hold for you, with no guarantees of future pricing, is a bad move. Locking yourself in is one of the worst things you can do as a company.


@1: That doesn't directly address either of my points. You're just providing an orthogonal example of a something that cloud providers didn't move on.

I agree that the bandwidth pricing is almost certainly designed to create lock-in. What I am contending is that that is unrelated to the storage pricing. You'll notice that my arguments didn't include bandwidth pricing at all, because it is irrelevant to those arguments.

I'll give a more concrete example. Let's say I want to transfer 10PB out of S3. On their pricing sheet they actually say to contact them to get a quote, but before that the prices are dropping pretty fast as you get more data. e.g. 10TB is 9 cents per GB, but after you get past 300TB you're paying 5 cents per GB.

Let's be pessimistic and assume 5 cents per GB, even though you could probably get it for much cheaper by contacting them.

So 10PB will cost me (10PB * (5c/GB)) = $500K to export

Standard storage is about 2 cents per GB. So your 10PB sitting in S3 is costing you 200K / month just to sit there and do nothing.

Do you see the problem here? Moving onto the 20K / month provider becomes cost-positive after 3 months.

Even if they were to increase bandwidth costs 10x, it'd still become cost-positive in a relatively short amount of time (couple of years).

Furthermore, if you are paying S3 millions of dollars per year to store data, you're almost certainly in a position to get them to contractually agree to that cheaper-than-public bandwidth cost I mentioned earlier, so you don't even have to worry about the situation in which they hold you hostage by increasing bandwidth costs 1000x.

Yet furthermore, to reiterate my previous point even if they were to increase bandwidth costs 1000x on normal customers that didn't have enough clout to get contractual guarantees, that would kill their business. Nobody new would put any new data in them. Sure, they could hold the existing data hostage, but absolutely nobody is going to put any new data there.

Similarly, if they were to not decrease storage costs 10x compared to a virtually-identical competitor, nobody would put new data with them. This is the part that is totally unrelated to bandwidth costs. Everyone would start putting all of their new data into the competitor, regardless of their old data being locked in.

The fact that Amazon and Google both immediately reduce storage prices after the other one does is evidence that this is the case.

@2 I think you underestimate mainframes. People who use them aren't totally stupid. They do a specific thing very well. Right tool for the job and all that.


I am not sure. Has AWS bandwidth followed the general bandwidth drop that everyone else has seen?

If migrating out of AWS is 100 million dollars in bandwidth, not sure you would see a lot of people jump ship to a competitor.. data lockin and all.


If you are at that scale, it would probably be faster and cheaper to evacuate data on physical media rather than over the network. I believe one can use Snowball to do this:

https://aws.amazon.com/snowball/faqs/#export


Your standalone 4TB Western Digital hard drive isn't going to give you 99.999999999% durability and replicate your data across multiple datacenters. .03/GB is still insanely cheap IMO.


People with 100 PB are not using 4 TB Western Digital drives in solo mode.

They likely have multiple racks of high performance storage, likely in multiple data centers.


I suspect that Amazon will not only be dropping prices, but will be adding new ways to store data with variable pricing (e.g. new tiers as well as reduced redundancy). Not to mention that if you have 100PB in AWS you might get preferential pricing? Google recently increased their spread and now have 4 tiers. There is also an interesting article about Glacier storage and pricing here https://storagemojo.com/2014/04/25/amazons-glacier-secret-bd....


You suspect, but you don't know.

Most people have suspected that AWS bandwidth costs would go down.. but by and large, they have held steady for 5+ years.


It won't just be S3, it could be stored in Glacier. Considering AWS just cut list price for both of those, this makes sense.


Well on Glacier its more like $700,000.

Considering the cost to store 100PB on site, with redundant disks, geographically distributed, including property leases, power, security, and staffing. $700k might be considerably cheaper.


Presumably you get some kind of bulk discount if you're storing 100PB.


A surprising number of comments are questions that are answered explicitly in the article.

PSA: consume the full content before you comment on the content.


>> We needed a solution that could move our 100 PB archive but could not find one until now with AWS Snowmobile.

Around this time last year BackBlaze had 200PB of customer backups. They described storing it on 54,675 hard drives across 1,215 Storage Pods

So imagine 600 storage pods or half of BackBlaze's entire operation, for just one customer. Insane.


I love how this is a literal trailer pulled by a truck - see the pictures at https://aws.amazon.com/snowmobile/.

I wish they gave more details as to what hardware was in there - are there any pictures of what the trailer looks like on the inside?


Couldn't find any pictures of the inside, but techcrunch did get some high resolution photos of the outside, including the chiller connections: https://techcrunch.com/2016/11/30/amazon-will-truck-your-mas...


So sneaker net was once someone walking around with burnt CD-ROM, then it was a box of drives, and now its a truck.

Quite a dramatic illustration of the increase in data usage.

If one extrapolates next stop will be a train, and then a container ship full of hard drives.


> then it was a box of drives, and now its a truck.

Between that was FEDEX-ing a NAS. That's pretty much the standard data-exchange format in astronomy, so you have a fast link and don't need to bother plugging a bunch of drives in, just plug the NAS to the power and network and off you go.


Hmmm....

So when will The Universe be too small to data considering data use is exponential.

I imagine there are lower upper limits, but this would be the upper upper limit.


I'm at re:invent and they have a "making of" video next to a demo unit. They only show the physical construction of the power distribution and the raised floor. (Nothing about the racks or what's in them.)

It also appears that you never have access to the inside where the racks are. You can only access the last ~4 feet for power and data connections.


This really helps me, personally, abstract the concept of 'data' a lot more. It's not about files or records, data has a 'volume'. A few PB fills up a shipping container.

Next time my client asks me how "much" a PB is, I can just say "about a shippingcontainer's worth".


But bytes/volume isn't a constant. You may be able to say a 100 PB is equal to about a shipping container now, but in 10 years it will be much smaller (probably not keeping up with Moore's Law for storage though).


Even then it depends on medium, an LTO-7 or HDD can store ~26GB/cm3 while an SDXC tops out at around 1TB/cm3.


It depends on how the drives are packaged. A 2.5" SSD enclosure is much bigger than the actual storage device inside, which is closer in size to an SD card. Most of the bulk of this Snowmobile is probably taken up by enclosures.


This! So when some one thing 1EB is alot, we can "Nah, is only about 10 45feet containers of data"

I wonder what HDD capacity does it use, and when if we could see a 1EB per container.


Every blog post should have a Lego visualisation like this.


Agreed! But for the love of LEGO, after all that effort to make great custom scenes, they could have used a better camera!

Those photos look like they were taken with a poorly focusing cheap smartphone.


350KW seems a bit high for something that should effectively be an append-only file system. I would have expected 95% of the trailer to be in standby at any point in time.


100PB storage and a total network capacity of 1TB/s across multiple 40GB/s links means pretty serious hardware, before even considering the security and video surveillance systems.


Yeah, thought that number was a bit high. Maybe there's something else to it?



Encryption.


Air Conditioning.


While I admit that there may be customer demand for this, even if there is not, Amazon has certainly made a great marketing message.


There is: DigitalGlobe is/was using it.


Now this is real persistent container storage. ;) And persistent it will be, because even at 1Tb/s it will take you over a week to load the data onto it. The bandwidth while in transit is phenomenal, but before that it will be sitting at your DC's loading dock for quite a while.


So- suppose we could move that same 100 PB in a standard 24" cube FedEx box, collecting the data in less than a week and using only two 110v power connections. Would that be interesting? Oh, and it takes less than a single rack of gear.


Any idea on the type of link they are using to connect to the containers ?

If they fill the thing in 30 days, that average to 40 Go/s, way faster than 10gbe or Fibre Channel.


> Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.



Maybe someone at Amazon was reading xkcd[1]

1. https://xkcd.com/949/


Even before Snowball was launched the import/export team was joking about UPS trucks full of snowballs, both within the team, and then with management. I guess management found out there was actual customer demand for it.


Because it's a shipping container, maybe eventually they'll do worldwide data transfer?


Probably by plane. Shipping would otherwise probably be slower than using the network.


Worst case shipping is ~30 days (e.g. China to Hamburg), which is 38.6GB/s for 100PB in a container.

I doubt you have a permanent dedicated 40+GB link between you and the nearest AWS data center.

Shipping by boat would still be way faster than a network upload.


You are correct, I had my math wrong. But don't know if they're certified for shipping, not sure how the sea transport would affect the drives?


Well if the container is completely and securely sealed it should not affect the drives, after all the drives or other parts are shipped from factories.

But yeah I'd expect air-shipping as well, if only so that the container can be kept secure, I doubt Amazon is going to send security personnel for week-long trips on cargo ships.


Whoever said "The Internet is not a truck"? Seems like this blurs the lines a bit :)


I'd love to see actual pictures of that truck :)



They at least have external pictures of the "rig" on the main landing page: https://aws.amazon.com/snowmobile/


Apparently the author would to.

PPS – I will build and personally deliver (in exchange for a photo op and a bloggable story) Snowmobile models to the first 5 customers.


So this is absolutely a honeypot then, right?

There's no way to verify that this truck full of my corp's valuable data isn't stopped somewhere along the way and cloned and then driven to NSA or something.


You're putting it in the truck with the intention of shipping it to Amazon to load it onto their infrastructure. If you don't trust Amazon with your data, the truck isn't unique in any way...they can clone your data at their data center too.


Well that's what I'm saying. I mean I guess that rabbit hole goes all the way, even if Amazon brought their machines to me, there's still not a guarantee that it won't be stolen somehow.


I wonder how long it takes to load all the data in the truck (over 10GbE? or are there better ways?)


Right there in the article.

> Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: