Moving A Petabyte of Data

(With apologies for the headfake of posting this entry then taking it down – my fingers were working faster than my brain, and I accidentally posted the entry without completing it. Or proofreading it.)


I made a speech last week at which I asserted it was faster to send a petabyte of data from San Francisco to Hong Kong by sailboat, than by the internet.



I got quite a few “how can that possibly be true?” kinds of questions, so here’s the math. (Full disclosure, I am a mathematician by training, which guarantees me a lifetime of small “off by one” errors in all subsequent calculations – so if I get something wrong, be gentle).


A petabyte is a thousand terabytes, which is a million gigabytes, or a billion megabytes. Or 8 billion megabits. With me so far?


So if you had a half megabit per second internet connection, which is relatively high in the US (relatively low compared to residential bandwidth available in, say, Korea), it’d take you 16 billion seconds, or 266 million minutes, or 507 years to transmit the data. Can you sail to Hong Kong faster than that? At a full megabit, just divide the time in half. Even at a hundred megabits (about the highest, generally available, of any carrier I’ve seen), it’s a few years.


As Hal Stern once said to me, “Never understimate the bandwidth of a station wagon full of storage driving down the [New] Jersey Turnpike” – and now you understand why tape based storage has such a lasting appeal to so many enterprises recording, compiling, transporting or just plain archiving, very large quantities of data. From video surveillance to trading data. Standard tapes are 500GB each (currently), and fit nicely into cardboard boxes with overnight express labels.


One other big benefit to tape as an archive format? When the data’s at rest, it consumes no electricity – just imagine a petabyte of data spinning on even the most power efficient disk storage (for reference, a petabyte of active disk-based storage is the equivalent of more than 40 Thumpers, each drawing more than a kilowatt – and tipping the scales at something north of 150 lbs, slightly tougher to put on a sailboat, or in an overnight envelope). For data to be available, disks have to be kept spinning and cool (tape has no equivalent requirement).


Now there is no one hammer for all nails, and tape isn’t perfect for a lot of applications (near line storage, eg) – but it plays a prominent role in some remarkably cutting edge high performance computing applications, along with social networking and content aggregation sites (who think nothing of gathering terabytes of data every day) – tape archive isn’t just for banks or telcos running mainframes (although we’re good there, too).


So yes, at least for now, it’s faster to send a petabyte of data via a sailboat than the internet (at least defined by the bandwidth to which most of us have access).


Which btw, is another reason we’re refreshing our Solaris on DVD program – it’s more efficient for many folks to get a 4 Gigabyte DVD in the mail (for FREE) than nurse our download centers, a megabit at a time. (And I apologize for how slow the DVD deliveries have been – we haven’t exactly executed perfectly here, but hopefully it’s getting better as I type.)


And I don’t want to even think about moving a zettabyte.

36 Comments

Filed under General

36 responses to “Moving A Petabyte of Data

  1. Hi Johnathan,
    I totally agree with you that downloading Solaris could be a pain, especially with the bandwidth that we have here in Malaysia. Recently, Sun conducted the venerable Sun Tech Day in Kuala Lumpur and it was a blast!. I was looking forward to getting some Solaris DVDs. Unfortunately, only the x86 versions are available. A quick suggestion, perhaps your guys can have the Sparc version on a computer and write it to a DVD on demand (there isn’t that many demand for it so it should not be too much of a problem). In fact, I don’t even mind bringing my own DVD. Anyway, Sun Tech Day rocks! Do bring it back to Kuala Lumpur next year ,
    Cheers
    Azrul

  2. Vasu Vattipalli

    Jonathan,
    Just wanted to say, ever before Hal Stern commented “Not to understimate the bandwidth of a station wagon full of storage driving down the [New] Jersey Turnpike”, I believe Mr. Tanenbaum (my favorite Computer Networks’ Author) has quoted the original –
    “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.”
    Book – Computer Networks, 4th Ed. p. 91
    Check out the Wikiquote at http://en.wikiquote.org/wiki/Andrew_S._Tanenbaum
    I guess, no matter who quoted it first, the point still remains a valid one!
    -Vasu

  3. Hanh Nguyen

    People didn’t archive and transport a petabyte at a time. They usually produce small amount of data by day and archive it by night. As long as the speed of transporting data is greater than the speed of producing data, we will be fine. On other hand, the speed of conventional transportation is rather constant while bandwidth is improving dramatically. I believe 10 years from now, tape technology will disappear from computing world.

  4. Before you consider sending your petabyte of data by whatever means you think is fastest, ask yourself how old your petabyte is. Did you acquire or generate the whole petabyte in the last 24 hours? If not, you probably want to think about incremental transfers rather than sending the whole thing every time.
    BTW, how long does it take for a Solaris box (say a T2000) to generate media with a petabyte of data that can be put on the boat to Hong Kong?
    And during the time taken to generate that petabyte of media for the boat, how much additional data was acquired/generated?

  5. Peter

    Got my 11/06 DVD after one week (Australia), it was posted directly from the States too.
    Must be getting popular if distribution’s been difficult, I wonder how many Petabytes you’ve posted with Solaris 10 DVD’s๐Ÿ˜‰
    Peter.

  6. Moby Dick

    A few quick questions:
    How long it will take to put one petabyte on tape? Or are you planing to send the only copy on that boat? What if the boat sinks??!!
    Just to make my position clear: I agree that tape is the current best long term storage. At least until other feasible solution emerges. Perhaps if holographic storage delivers its promises, it can be a good alternative solution.

  7. Evgeny Kibalko

    Hi Jonothan, that was great hearing about a Solaris on DVD program but so far I haven’t found anything on your website mentioning it. Maybe it’s just being started I really hope so. I’ve been using Linux and Unix before and now I’d really want to try Solaris. Downloading is really not an option for me but getting the DVD in the mail would be perfect. Btw, if the program is true, are you going to ship the disks free of charge everywhere in the world? I’m from Russia by the way. Have a nice day.

  8. Jim H

    So, when will Moores Law in bandwidth terms catch up to move a Pedabyte in a reasonable period of time? As well as the “system” that will catch it. I can’t imagine that it would be a typical data catcher. Now, if you need anyone to sail a pedabyte to China, I’m up for the vacation๐Ÿ™‚

  9. It is SO COOL to see the CEO of a Fortune 500 company blog about his companies products… and to actually enjoy reading his blog : keep it up!!
    PS: I live in India & ordered my DVD in December… still haven’t got it (won’t come, will it?)… so yeah, you’re SLOW…

  10. old stk guy

    Hanh Nguyen posted that tape would be gone within 10 years. Haven’t we been hearing that for the last 20 years?

  11. Gerald Wise

    Your reference to Thumper reminds me of my disappointment in the “Code Name” section being removed from the Sun System Handbook on SunSolve. It was nice having this trivia information all in one location!

  12. Serge

    Sun Microsystems, Inc. The Network Is The Computer[tm].
    Also, “The Internet Is The Computer!”. Right?
    So, next step is “The ? Is The Computer!”.
    Solaris? Grid? Java? Web 2.0? SOA? Google? May be YES and may be NOT at all…
    Sun should find answer to this question mark. Otherwise, someone else will find out. After that, a problem of moving a zettabyte will be a trivial…

  13. The link in the blog goes to the freeware download
    http://www.sun.com/software/solaris/freeware/
    “refreshing our Solaris on DVD program”
    I don’t think that’s where you wanted the link to go

  14. Steve

    I think that some people are not seeing the big picture here. Jonathan is talking about the concept that Tape is dead. One person said that in 10 years, tape will be gone. I say that with current environmental and physical constraints on spinning disk, tape will be around (as an archive form) for a LONG time. Just look at cost, power to operate, points of failure, future rebuild times, portability and sheer mass comparisons between tape and disk and you will see that tape is much more practical for an archive. In closing, tape is not dead, it is simply migrating to the background and the medium where it makes the most sense. In terms of IOPS, disk rules, in terms of portability, recovery and reliability, tape is king.

  15. Dakshina

    “Never understimate the bandwidth of a station wagon full of storage driving down the [New] Jersey Turnpike”
    When I was in college,we had to solve a computer networks problem (chapter 1/2 ?? of Tanenbaum) based on this saying .Your blog reminded me of it ..๐Ÿ™‚

  16. md

    We have been hearing about the ‘last mile’ bottleneck for broadband connections for a long while now. Jonathan, as is his want, has provided a very simple and very illuminating picture of exactly what this quantifies to. The link to the Jim Gray conversation given in the comments above by Anonymous Coward is well worth reading as well as it provides an excellent addition to Jonathans posting. I remember a conversation that I had with an exec of a large telco supplier over 10 years ago when he poured cold water on the concept of movies on demand over broadband (radio at that stage) – ‘the Hollywood studios only make 20 odd movies a year, the providers will soon run out of content!’. In the era of youtube etc and to paraphrase Parkinsons Law – ‘content expands so as to swamp the bandwidth available for its distribution’
    http://en.wikipedia.org/wiki/Parkinson's_law
    PS: and I received my DVD by post today!

  17. apt

    Agree on tape, but you also did touch the 4GB Solaris DVD problem,
    where I want to emphasize John Birrells comment
    Why should Sun distribute the whole data & whole DVD when every server is different?
    E.g. debian startup CD is ~300MB , after that you just download from nearest mirror all packages needed for you (which usually is less than 4 GB) & you’re done !
    It’s distributed computing now, network is the computer, sharing the good (and the bad!), remember?
    Saw Nexenta ( http://www.gnusolaris.org ) doing some steps towards this, can Sun follow/improve, or would it hurt too much joining hands with people/companies willing to mirror Solaris and/or Suns packages ???
    (no matter what their servers will be on, why not reuse it, if Sun doesn’t want to spend money on its own infrastructure?)

  18. Hey Jon,
    I’ve been waiting for my Solaris DVDs!๐Ÿ™‚
    I hope they’ll come soon.
    Great post, I learned something new.

  19. SMS

    I don’t think tapes will go away anytime soon. However there are devices which allow you to turn the power off and transport it. How about implementing a bank of the following: http://media.freescale.com/phoenix.zhtml?c=196520&p=irol-newsArticle&ID=966767&highlight=&tid=rsspr even if you were only to power on bank at a time it would be much quicker than tape and pretty well everything else; however I am not sure about the size or such a device altough I suspect that a battery could run a massive amount of storage. I really like my Blu Ray disk (50 Gigs) on the PS3 and hint hint it would be nice to see Solaris on the PS3 in addition to Yellow Dog Linux.

  20. Zsolt Horváth

    If only Einstein was wrong and we could make those electrons travel faster we could catch the other bottleneck, latency.
    Who said lightspeed will be fast enough for everybody?๐Ÿ™‚
    By the way, thanks for the Solaris DVD, it arrived in about three weeks to Budapest/Hungary.

  21. Stephen Rossi

    Has Sun thought about hosting a .torrent for the Solaris DVD image?

  22. Serge

    “if Sun doesn’t want to spend money on its own infrastructure?”
    How much? Who else want to spent money on its own infrastructure?
    Spending money to innovate and inventing! This is the answers…

  23. Neil Davis

    And of course, for those of us in the UK, a billion is 10^12 and not 10^9, but hey, we’re a little island. When did the US version of a billion take precedent?

  24. Kevin Hutchinson

    Have you considered competing with Amazon S3? It’s like tape, but over the internet. I wish your network.com service could compete with Amazon EC2, but you don’t offer a web service API or general internet access to apps. I thought maybe you could do something akin to Rails Machines to offer service hosting (but with a JEE flavor). Regarding hosting, have you noticed how few web hosting companies offer Sun/Solaris? Now there’s a big opportunity…

  25. Anonymous

    I’m curious why the Solaris 10 DVD isn’t provided in torrent form. Many Linux distributions moved to torrents and I’m wondering why the official Solaris isn’t in torrent format with a Sun-run tracker and a few Sun-operated seeds. For speed and data integrity reasons I usually choose torrents over mirror downloads for large files. Wouldn’t a torrent solution offload some traffic from Sun’s servers.

  26. Hey Jon, this is an old story.

    And why a sailboat when one of your BlackBoxes looks so much better? ;-D

  27. In response to what the question mark is in “The ? is the computer”.
    I would have to say, “You are the computer!”

  28. An interesting discussion on “packet size”. It reminded me of the the classic RFC 1149 – IP over Avian Carriers. It was also spooky to see a photo quite similar to one I have taken myself.

  29. Johnathan,
    You are the most contempory CEO’s I have seen. I hope other Ceo’s will follow what you are doing. I noticed you have offices here in beautiful Austin. Keep up the good work!
    Kimberly King

  30. Serge

    Than you Dustin for offer, but I’ll retire soon, because my CPUs need a maintenance and my memory need a replacement. I’m not sure if I’ll be back to service soon…

  31. The sailboats are the computer?
    The Internet2 is currently faster than sailboats, and commercial Internet providers are upgrading their networks to handle today’s and tomorrow’s data as fast as customers and investors are willing to pay.
    The current verified record for transferring bulk data over a very long distance (Tokyo -> Pacific -> US -> Atlantic -> Amsterdam) is about 8.8 gigabits per second (search: Internet2 “land speed record”), which could transfer a marketing petabyte in 909090 seconds or about 10.5 days. A plane full of tapes is still faster, but sailboats can’t even come close.
    If someone builds a terabit per second network across the Pacific (search: “Trans-Pacific Express”), and fully dedicates the bandwidth to the transfer of a petabyte of data, it would take 8000 seconds, or a little over two hours, at an initial cost of about $500 million for a new cable and computing and storage equipment on each end.
    When I think of fast sailboats, I somehow think of Larry Ellison. While he may never have the technology to build a wind-blown sailboat this fast, he has the financial means to buy a company that has a boat that will install a fast cable for him, or at least lease an 10Gbps wave off of and installed cable for far less.

  32. Cheaper in bulk, is essentially what you’re saying.
    Which reminds me – if one aggregated all the Free/Libre and Open Source Software sourceforge-style sites, even including Microsoft’s Codeplex – as long as the software aggregated was licensed under terms definable as F/LOSS -, how many gigabytes would one have?
    And yes, it would be nice to finally receive the Solaris DVDs I’ve ordered. ๐Ÿ˜‰
    Thanks

  33. One word answer for your DVD problems: NetFlix. Yes, if you truly believe in the power of outsourcing and economies of scale, wouldn’t NetFlix (or a similar expert willing to take on the job) be the best provider of DVD-as-a-Service.

  34. GoodMorning

    I received my Solaris 10 DVD kit yesterday morning, the download time has been a bit of an issue for downloading operating systems, which is why I stuck with Solaris 9 for such a long time. I am not sure if I will upgrade my home web server to Solaris 10, as it is ran on an Ultra 5, and I am not sure if it will run too efficiently, however I will give it a go on my i386 machine, could you maybe recommend anywhere where I could get some newer SPARC hardware within the UK at a lower price? I have tried several sites including ebay and ITSupplies.net however I question the credibility of ebay, and ITSupplies.net seem to be overpriced. Unfortunately I have very little money as I am only in college and working in a weekend job.

    Anyway I tend to agree that we are somewhat limited by the bandwidth that we can fit through our intercontinental connections. But I think in the UK and USA the main problem is the speed of client connections, Here in the UK most of us are stuck on 2mbit unless you are lucky enough to live in London or Manchester. I guess in some parts of the US the situation is worse as exchanges and cabling to more remote areas could be limited. The limitations on bandwidth of data lines limit the technology that could evolve over the internet, for instance if we could subscribe to 100mbit lines cheaply then the possibilities would be somewhat never ending, especially for the media industries to deliver content. We wouldn’t be restricted to the same TV channels and unrecognized publishers and artist could become more recognized, simply because downloading one of their movies would be so much faster and therefore you would be more likely to watch it rather than waiting for a download to complete. So overall I think that the limitations of internet connections are holding us back, and that ISPs and telecommunications providers should be more concerned with technological advances, rather than trying to upgrade existing and often out of date systems.

  35. Wesley Parish

    Finally got the Solaris DVDs! It feels good. Many thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s