In the same vein, a South African carrier pigeon was faster than a popular ADSL solution in 2009. It carried a 4GB memory stick 60 miles in about an hour. It took them another hour to upload the data to their system. The ADSL solution only completed 4% of the transfer, in the same amount of time.
Never underestimate the street value either. I wonder if they have an armed escort for this truck -- the hardware must cost on the order of 10-20 million, and the data itself could be worth many multiples of that. Could make a great heist movie.
"Snowmobile is protected by 24/7 video surveillance and alarm monitoring, GPS tracking and may optionally be escorted by a security vehicle while in transit."
Actually 'in the movie' the plot line comes at the end when you find out that the government is behind the heist in order to get hold of a large amount of end user data that they need for surveillance purposes.
As other commenters noted, it's fascinating that no matter how advanced the networking technology progresses, we'll always have a variation of "sneakernet"[1] to bypass the limitations of the network. The sneakernet just evolves from floppies to 45-foot shipping containers.
If humans later colonize Mars and want to have the full 50-terabytes copy of Wikipedia in the biosphere, it's faster to send some harddrives as a rocket payload on a 6 month journey rather than try to transfer it via the 32kbps uplink[2] which would take ~500 years.
A screenshot of Tanenbaum's "Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway" argument with the context around it from his book.
A thing bothers me regarding that calculation. Shouldn’t the time it takes to write the data to all those tapes be taken into account? I.e., if you’re using tapes to transport data, your real bottleneck is your tape drive write speed.
Perhaps an artifact of the time period. In the 90's, you likely already did backups to LTO/Ultrium tapes daily. So the time was considered already sunk cost.
Even with x-rays theoretically providing a future 1 Gbps link to Mars, the underlying issue is that the growth of data vastly outpaces the advancements in speed technologies.
In other words, by the time x-ray communication is a reality to the Mars colony, we'd want to copy 50-petabytes (~15 years transfer[1]) or 50-exabytes (~15000 years transfer). The 6-month rocket journey is still faster than those scenarios.
Yep. And as I processors reach their ceiling, I imagine more effort being put on precomputed scenarios so that they become more or less glorified look up machines with instructions stringing them together.
yes but as long as the laser beam can carry absurd amounts of data, that linear coefficient is just fine. Nobody needs to sync trillions of cat videos (which is the source of exponential growth) between Mars and Earth.
No, the nature of technology building upon itself is the source of exponential growth. As long as storage and processor technology continues to improve - and it will - we will find uses for that space, and network technology will need to keep up.
I just realized how much of a limited resource the Mars -> Earth internet connection is going to be in the future. It'll need to be limited to only absolutely necessary communications for a long time.
Assuming we get enough people on Mars, they'll have their own internet over there.
I feel unusual when I see articles like this. I deploy "workloads" that require instances, auto scaling, Multi-AZ etc. It makes my projects feel minuscule at the scale of other companies that actually use something like this! I wonder how many companies will actually use this in any given year.
I imagine surprisingly many. I have operations that are not remotely on that scale. 8 employees with total data on the order of tens of terabytes. I found that to be a surprisingly heavy density of data per employee. A 1000 employee company with the same density is on the petabyte scale.
I wonder why this makes sense. Isn't it more useful to get a few hundred snowballs and ship them via Fedex? You can transfer in parallel and should be at the same speed as with Snowmobile. It's at the DC next day and the data will be faster in S3 than by truck. Also, the economies of scale will never pay off for Snowmobile, likely more for Snowball.
At the same time, logistics (incl insurance and security) is handled by companies that are very good at it. Fedex, DHL and the like offer physical security services for goods if you need it in addition to encryption.
Think it's a PR move only. They will probably find a few clients to somehow utilize one truck, but I don't think it's more efficient than Snowballs.
Installing, powering and cabling "a few hundred" of anything in a datacenter is a big deal. You probably don't have room. You may not have power. You have to deal with hundreds of boxes, cardboard isn't allowed on the datacenter floor (ideally), and just mucking around on the loading dock wrangling stupid stuff like shipping labels is going to suck up a ton of time.
[I'm a C++ dev who likes to help design and build datacenters. It's fun.]
But isn't that the same problem with the truck? You also need to get that connected to the datacenter for a week? I guess as that connection has to leave the building, it should be even harder?
"One Snowmobile can transport up to one hundred petabytes of data in a single trip, the equivalent of using about 1,250 AWS Snowball devices." -- https://aws.amazon.com/snowmobile/
You'd have to find a thousand 1GBe ports in your data center (unless Amazon would ship an expensive switch along with Snowballs) -- that's about two server cages' (10 racks x42U) worth. You will have to find a lot of power -- while Snowmobile can bring a generator.
I doubt there will be a lot of demand for Snowmobiles, though.
If you don't have the ports for Snowballs, you won't have them for Snowmobiles. And instead of copying 1 week, shipping, copying another week back, you can have the first data over after 1 day. This can be crucial to already integrate data before everything's there.
I understand the size of a Snowmobile, but if you don't somehow utilize it exactly 100%, I don't think building your own solution is cheaper than renting logistics as a service from others. Basically AWS in the logistics world.
The advantage with Snowmobile is you only need a dozen or so ports, not hundreds.
Also - think about the person having to do this - If I had to move 100 petabytes of data to AWS - I really don't want to be messing around with hundreds of appliances, tracking them, figuring what we uploaded onto what devices, figuring out how to fragment the data to properly fit on all the devices, etc....
I'm not sure that's true -- thinking about our own (small) datacenter, we can find 10 10 gig ports much easier than we could scrounge up 100 1 gbit ports that are in the right place on the network to do a big data transfer like this (i.e. we're not going to open cabinets to plug snowballs into 10 different top of rack switches). Sure we could buy more switches to fan out our spare 10gig ports into 1gig ports, but why bother when AWS has them built-in to the truck?
It seems that any datacenter that's big enough to have a PB of data is going to be able to find 25 40 GBit ports easier than 1000 1Gbit ports.
Even if you only utilize 25% of the Snowmobile, that's 250 individual Snowballs you don't have to handle.
(just bounced this off of one of our network engineers -- if we did have to handle 100 snowballs, we would buy switches, set them up in the big conference room next to the datacenter and run the 10 gig fiber drops over to that conference room where the snowballs would be. He said we're probably looking at $30K in hardware costs to set it up, so the Snowmobile might be cheaper even for 100 snowballs worth of data)
> However, customers with exabyte-scale on-premises storage look at the 80 TB, do the math, and realize that an all-out data migration would still require lots of devices and some headache-inducing logistics.
Lots of devices certainly. But headache-inducing logistics? I think getting snowballs via Fedex/DHL, copying them and sending them back is easier than figuring out how to connect the truck to your in-house datacentre. Most won't have a spare 1Tbit/s connection in the parking lot.
If you're looking to transfer 100s of petabytes, I imagine you'd find a way to get a 1Tbps connection to the parking lot, rather than connect and disconnect thousands of Snowballs. It's clearly intended for a different use case than transferring a few petabytes.
> Isn't it more useful to get a few hundred snowballs and ship them via Fedex?
Corporate types, spending loads of company money, are more interested in convenience and turn key solutions to problems that rigs that cost less but give more to think about or other issues.
If somehow a technology is developed to allow local storage cost to drop by a factor of 10, don't you think S3 would make use of the same technology to stay competitive?
Cloud storage is a commodity these days. The market saves you in this case -- if one cloud provider didn't use the 10x technology and pass along the 10x savings to the customer, another company would do it and steal all of their customers.
My point is that all cloud competitors would necessarily switch for economic reasons. If not to keep you from leaving, then to keep a competitor from capturing all new growth.
If a 10x cost reduction storage technology comes along, cloud providers will necessarily adopt it and will reduce their prices by approximately 10x.
Here's why:
- If they don't, it will become more cost effective for potential customers to run their own datacenters rather than put data into the cloud, so their growth will basically stop.
- Even if potential customers don't want to run their own datacenters, both potential and existing customers will put new data to a competitor who did pass along the 10x savings to the customer. So again, their growth will basically stop.
This is the nature of a commodity product in a free market. Basically all cloud providers use an S3-compatible API, and costs and performance are in the same ballpark. There are tons of open source compatibility layers that abstract which provider you are putting to. If one of them starts costing 10x less, you just flip the switch and all new data goes there.
The ability for customers to completely cut off growth of their service if prices don't fall is a supreme motivator. The only case this wouldn't be true is if all of the cloud storage services formed a cartel to fix prices. But given that every time Google or Amazon lowers prices, the other one follows to maintain parity, we have evidence that that isn't the case.
Addendum:
I don't understand your analogy at all. IBM mainframes are a specific type of hardware that excel at highly available batch and transactional processing. Linux is an operating system. Linux is free, so it's infinitely cheaper. Also, IBM mainframes run Linux.
> If a 10x cost reduction storage technology comes along, cloud providers will necessarily adopt it and will reduce their prices by approximately 10x.
Here are my two counterpoints.
1) Bandwidth prices HAVE fallen 10x in the past N years. Many cloud providers (ovh, etc) DO offer this price drop. Yet how many people really left AWS or GCE for ovh? I would guess not that many.
2) As I said with mainframes.. there is a 10x cheaper option to a mainframe that has been around for oh 15 years. But people are still on them, BECAUSE they are still locked in to them. That is my point. Don't get locked into a single anything. Sending 100 PB data for a commercial entity to hold for you, with no guarantees of future pricing, is a bad move. Locking yourself in is one of the worst things you can do as a company.
@1: That doesn't directly address either of my points. You're just providing an orthogonal example of a something that cloud providers didn't move on.
I agree that the bandwidth pricing is almost certainly designed to create lock-in. What I am contending is that that is unrelated to the storage pricing. You'll notice that my arguments didn't include bandwidth pricing at all, because it is irrelevant to those arguments.
I'll give a more concrete example. Let's say I want to transfer 10PB out of S3. On their pricing sheet they actually say to contact them to get a quote, but before that the prices are dropping pretty fast as you get more data. e.g. 10TB is 9 cents per GB, but after you get past 300TB you're paying 5 cents per GB.
Let's be pessimistic and assume 5 cents per GB, even though you could probably get it for much cheaper by contacting them.
So 10PB will cost me (10PB * (5c/GB)) = $500K to export
Standard storage is about 2 cents per GB. So your 10PB sitting in S3 is costing you 200K / month just to sit there and do nothing.
Do you see the problem here? Moving onto the 20K / month provider becomes cost-positive after 3 months.
Even if they were to increase bandwidth costs 10x, it'd still become cost-positive in a relatively short amount of time (couple of years).
Furthermore, if you are paying S3 millions of dollars per year to store data, you're almost certainly in a position to get them to contractually agree to that cheaper-than-public bandwidth cost I mentioned earlier, so you don't even have to worry about the situation in which they hold you hostage by increasing bandwidth costs 1000x.
Yet furthermore, to reiterate my previous point even if they were to increase bandwidth costs 1000x on normal customers that didn't have enough clout to get contractual guarantees, that would kill their business. Nobody new would put any new data in them. Sure, they could hold the existing data hostage, but absolutely nobody is going to put any new data there.
Similarly, if they were to not decrease storage costs 10x compared to a virtually-identical competitor, nobody would put new data with them. This is the part that is totally unrelated to bandwidth costs. Everyone would start putting all of their new data into the competitor, regardless of their old data being locked in.
The fact that Amazon and Google both immediately reduce storage prices after the other one does is evidence that this is the case.
@2 I think you underestimate mainframes. People who use them aren't totally stupid. They do a specific thing very well. Right tool for the job and all that.
If you are at that scale, it would probably be faster and cheaper to evacuate data on physical media rather than over the network. I believe one can use Snowball to do this:
Your standalone 4TB Western Digital hard drive isn't going to give you 99.999999999% durability and replicate your data across multiple datacenters. .03/GB is still insanely cheap IMO.
I suspect that Amazon will not only be dropping prices, but will be adding new ways to store data with variable pricing (e.g. new tiers as well as reduced redundancy). Not to mention that if you have 100PB in AWS you might get preferential pricing? Google recently increased their spread and now have 4 tiers. There is also an interesting article about Glacier storage and pricing here https://storagemojo.com/2014/04/25/amazons-glacier-secret-bd....
Considering the cost to store 100PB on site, with redundant disks, geographically distributed, including property leases, power, security, and staffing. $700k might be considerably cheaper.
> then it was a box of drives, and now its a truck.
Between that was FEDEX-ing a NAS. That's pretty much the standard data-exchange format in astronomy, so you have a fast link and don't need to bother plugging a bunch of drives in, just plug the NAS to the power and network and off you go.
I'm at re:invent and they have a "making of" video next to a demo unit. They only show the physical construction of the power distribution and the raised floor. (Nothing about the racks or what's in them.)
It also appears that you never have access to the inside where the racks are. You can only access the last ~4 feet for power and data connections.
This really helps me, personally, abstract the concept of 'data' a lot more. It's not about files or records, data has a 'volume'. A few PB fills up a shipping container.
Next time my client asks me how "much" a PB is, I can just say "about a shippingcontainer's worth".
But bytes/volume isn't a constant. You may be able to say a 100 PB is equal to about a shipping container now, but in 10 years it will be much smaller (probably not keeping up with Moore's Law for storage though).
It depends on how the drives are packaged. A 2.5" SSD enclosure is much bigger than the actual storage device inside, which is closer in size to an SD card. Most of the bulk of this Snowmobile is probably taken up by enclosures.
350KW seems a bit high for something that should effectively be an append-only file system. I would have expected 95% of the trailer to be in standby at any point in time.
100PB storage and a total network capacity of 1TB/s across multiple 40GB/s links means pretty serious hardware, before even considering the security and video surveillance systems.
Now this is real persistent container storage. ;) And persistent it will be, because even at 1Tb/s it will take you over a week to load the data onto it. The bandwidth while in transit is phenomenal, but before that it will be sitting at your DC's loading dock for quite a while.
So- suppose we could move that same 100 PB in a standard 24" cube FedEx box, collecting the data in less than a week and using only two 110v power connections. Would that be interesting? Oh, and it takes less than a single rack of gear.
> Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.
Even before Snowball was launched the import/export team was joking about UPS trucks full of snowballs, both within the team, and then with management. I guess management found out there was actual customer demand for it.
Well if the container is completely and securely sealed it should not affect the drives, after all the drives or other parts are shipped from factories.
But yeah I'd expect air-shipping as well, if only so that the container can be kept secure, I doubt Amazon is going to send security personnel for week-long trips on cargo ships.
There's no way to verify that this truck full of my corp's valuable data isn't stopped somewhere along the way and cloned and then driven to NSA or something.
You're putting it in the truck with the intention of shipping it to Amazon to load it onto their infrastructure. If you don't trust Amazon with your data, the truck isn't unique in any way...they can clone your data at their data center too.
Well that's what I'm saying. I mean I guess that rabbit hole goes all the way, even if Amazon brought their machines to me, there's still not a guarantee that it won't be stolen somehow.
> Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.