Friday, September 14, 2007

Of Bits and Grit

Jeremy Kirk wrote in Infoweek yesterday,
"In just three years, the bytes of data generated by digital cameras, mobile phones, businesses IT systems, and devices will equal the number of grains of sand on the world's beaches."
His reference is the IDC White Paper (PDF): "The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010", published March of this year. The report itself-- though an absolutely fascinating read--makes no mentions of sand nor beaches. What it does name is an estimated 988 exabytes of digital information will be "created captured and replicated" in the year 2010. This is a sixfold increase from the amount of similar data calculated for 2006.

So how does Mr. Kirk make the leap from this abstract information measure to the real and concrete aggregate (puns intended) of grains of sand? His calculations for this number and therefore the comparison are not presented in his article. So I must intercede with my own:
  • The size of a grain of sand is 0.25mm (range of .0125 to .25mm, I choose the larger for a lower total #)
  • The shape is cubic for 1 grain of sand (for efficient packing, raises the number by x2 perhaps against a fractal packing of 1.2 to 1.4 space factor over the 1.0 of the cube)
  • There are 1.5 million kilometers of shoreline (not all sand, but okay, presume 90% are)
  • It is 50 meters from waterline avg for a "beach" at low tide (that's a WAG if I ever heard one. The "breadth" of a beach is also a fractal measure.)
  • The beach consists of the first 1 meter depth of sand (the deeper you go, generally the rockier the sand becomes, but there is still sand in most places at 1 meter, so this is Very conservative, perhaps by a factor of five or more.)

With the above suppositions we calculate to: 4.8 x 10^21 grains of sand. In binary storage terms, that's about 4.8 ZB or "zettabytes"). Remove the wiggle room I gave myself in the WAG for shoreline percentage of sand and for packing of grains and you're still in the 4.0 ZB range.

So based on my admittedly gross calculations of sand grains, Mr. Kirk is off by a factor of 41. A small factor, one might consider, but when you're in the exabyte range, it adds up pretty quickly. Especially since the IDC whitepaper says that the rate is "sixfold between '06 and '10". With a linear growth rate (which is very likely wrong, it is probably log), the 4.8ZB figure won't be reached for 21 more years afterwards. Even with Log rates of increase, it would be nine years later, or 2019 when we'd have bit-to-grain veridical veraciousness.

So how could the Infoweek article be so far off the mark in this punchy but inaccurate lead to the article? The difference is obviously in the calculation for number of grains of sand. I've laid out my suppositions here. But what of Mr. Kirk? A quick Google on the question at hand and we can see that Mr. Kirk undoubtedly was "Feeling Lucky" and picked his "grains of sand" measure from the University of Hawaii (UofH) web page. The results given here co-incide nicely with the exabyte figure from the IDC paper.

But I don't accept the UofH presumptions, and therefore reject the results. Compare the reasoning of UofH--which is obviously concerned with methods and not results for this specific scientific quandry--to the methods described above. Massively different estimates in all dimensions on the UofH page leads to a number of 7.5 x 10^18, a number "close enough" for mister Kirk to correlate to the information space data in the IDC paper.

I conclude accepting the UofH values without criticism as sloppy research. Their example was used to show methods and so less that rigorous methods were used to determine the inputs to the calculation. I stand by my estimates, which were independently (cough) arrived at within a similar discussion which tries to map number of stars in the universe to these same grains of sand (I'm beginning to think that Blake's meme, mapping sand to the size of big things, like the Universe, has truly escaped the farm).

Other calculations support my findings with an 18% similarity (3.2 x 10^21), so I feel I'm on solid ground. Well, as solid as shifting sands and fractal coastlines can be.

Discussion


Of course, the real question (provided to me by my friend and colleague Michael Ellard) is this: When we get to the saddle point of sand and data, how much memory (computer chip) space will be necessary to create/store all that data?

I don't have the answer, but I can consider the problem.

  • The size of the store =>
  • size of the chip =>
  • size of the die =>
  • number of die on a silicon wafer =>
  • size of the silicon wafer =>
  • number of grains of sand to make all those wafers.

And is there a saddle point there? At which point will all the grains of sand of all the beaches of the world intersect with the need for computer chips to store all the information we are creating.

At that point, the entire world will be the computer.
And the answer will be: 42.
Q.E.D.