This Obscure Factor Is The Single Biggest Obstacle To AI
The promise of AI seems to be right around the corner, but not unless we deal with one crtical challenge that will delay AI by decades.
PHOTO CREDIT: Getty Images
The promise of AI is everywhere and in everything. From our homes to our cars and our refrigerators to our toothbrushes, it would seem that AI is finally ready to revolutionize our lives and our world.
Not so fast.
While computing power and advances in computer architectures have finally reached a point where they can learn quickly enough to achieve human level intelligence, in narrow applications, there’s still one part of AI that virtually nobody is talking about. And it’s likely to be the single greatest limiting factor in the progress of AI.
First, some background to help provide some perspective.
The Bottomless Cloud
We often hear about how the rate of data storage is increasingly exponentially. For example, it’s projected that by 2035 the world will have access to over one yottabyte of data - that’s one million petabytes, or more than the number of stars in the visible universe. In my new book, The Bottomless Cloud, we project that by 2200, at a rather conservative 40% compounded annual growth rate, storage capacity will exceed the storage available if we were to use every atom that makes up our planet!
That all sounds very cool and it makes for great graphics and PowerPoint slideware, but the math ignores the simple fact that while data is itself an infinite commodity, it’s storage has very real and severely limiting costs. Yes, that cost is going down as storage densities go up, in concert with Moore’s law, but not fast enough. A Petabyte in the cloud today costs about $400,000 on a traditional cloud such as Amazon's S3, or about $60,000 on a next generation cloud such as Wasabi’s Hot Cloud. If the storage cost trends of the last 60 years continues, that Petabyte will cost pennies by 2060. So, what’s the problem, right?
The problem is that when you start looking at just how much data is required for AI to achieve human level intelligence, you soon start to realize that our current and near-term data storage alternatives will simply not work, pushing AI out by decades. For example, to fully encodehte 40 trillion cells in a human body would require 60 Zetabytes of digital storage. Which means that even at 1 penny per petabyte it would cost somewhere in the order of 5 trillion dollars to create a digital twin for every human on the planet.That's about 25% of US GDP.
The Real Price of Autonomy
An even better example, and one closer to home, is that of autonomous vehicles (AVs). One of the least often talked about implications of AVs is that their relationship with data is radically different than almost any device in the past. (By the way, what I’m about to describe applies equally to any device that relies on AI and even rudimentary machine learning.)
The decisions an AV makes consist of two critical components; first, they need to be made fast, typically in fractions of a second, and second, the AV needs to learn from its decisions as well as the decisions of other AVs. The implications of this are fascinating and unexpected.
Due to the speed with which decisions need to be made, the AV requires significant onboard computing power and data storage capability. The increase in onboard data storage is the result of all of the sensors, contextual data about the vehicle and its environment, and data gathered from communication with other AVs in its proximity. This onboard data is used for real-time decision-making, since the latency of communicating with the Cloud can be a severe impediment to the speed with which these decisions need to be made. It’s one thing to drop a cell call with your co-worker and another all together for an AV to not have access to the data needed to make a nanosecond decision.
The volumes of data that go into this sort of real-time decision making, and the gathering of all the contextual data that goes into them, then need to be uploaded to the Cloud to fuel the ongoing learning which is so critical to future decisions. This creates a cycle of decision making and learning that dramatically accelerates the rate of both data capture and storage.
The net effect is that while an AV today may generate somewhere in the neighborhood of 1-2 TB of data per hour, the increase on onboard sensors as AVs progress to full autonomy, will result in a dramatic increase of data storage requirements, with the potential for a single AV to generate dozens of terabytes hourly. Storing this all onboard is well outside of the scope of any technology available today or in the foreseeable future. Yet it is also well outside of the cost-effective scope of the big three Cloud storage solutions from Amazon, Google, and Microsoft.
For example, if an AV generates 20TB a day (which is an incredibly conservative estimate) the yearly storage requirements would amount to 7.3 Peta bytes yearly. At the current costs of Cloud data storage that would amount to approximately $3,000,000/yr. That’s 60 times the cost of the automobile!
Fueling the Revolution
The bottom line is that AI is simply not affordable at these costs. Which brings up a fascinating analogy that’s just as rarely talked about.
In the very early part of the 20 Century, as gasoline powered cars were just beginning to make their appearance on roadways the infrastructure of gas stations didn’t exists. Early car owners would buy their gas at the general store or from modified heating oil trucks. A gallon of gas cost between $5-$8 in today’s dollars. It wasn’t until the 1920 and 30s that prices dropped to an affordable rate of about $2/gallon. That fueled the automotive industry. Without that affordability personal transportation would have not have taken off the way it did as quickly as it did.
The same applies to the evolution of AI. And, although I’m using AVs as an example, the logic applies to any fully autonomous device.
The challenge isn’t proving that AI works. It’s easy to do that as long as you don’t have to worry about the cost of data storage at scale. For example, Google proved that Deep Mind’s AlphaGo Zero could win at the 3000-year-old game of Go against the world’s reigning Go champion Lee Sedol. Scaling AI so that it can be used broadly and affordably is where the challenge sets in.
This isn’t a small problem. Many of the areas where AI promises to have revolutionary impact, such as healthcare, transportation, manufacturing, agriculture, and education are desperately in need of quantum advances in order to scale to meet the needs of the seven, soon to be ten, billion inhabitants of the planet.
So, is this a hard stop for AI? It may very well be if a few things don’t happen.
- First, we will need some monumental improvements in storage technology. Don’t discount this. In 1960 the world’s state of that are an IBM 350 disk drive held about 3.5 megabytes and weighed in at two tons. Today we can store 300,000 times as much data on a device that is one millionth of the weight.
- Second, we need to challenge the antiquated and ridiculously complex cloud data storage pricing models of the big three which use the industrial era notion of tiered storage. Basically, a carryover from file cabinets and banker’s boxes filled with paper. Locking digital data up in cold storage eliminates its value.
- Third, the Cloud itself is evolving. Having only three options for cloud storage is unlikely to provide the sort of competitive pressure and innovation needed to drive costs down quickly enough to meet the demand created by AI and machine learning applications.
- Fourth, at some point this will become an issue of national importance. Whichever nation is first to fully develop AI is very likely to have an enormous advantage other nations. (Check out my recent podcast in which I talk about Putin’s quote on this topic) In some ways this is no different that the nuclear arms race, with the exception that you cannot police who owns AI and how they use it. If data is indeed the new oil then we need think about its value from the standpoint of the value it has to a national competitive agenda.
The bottom line is that we need to challenge everything from the business models to the technologies used for data storage and make investment in data a national priority. In much the same way that the infrastructure for electric utilities was the foundation for 20thCentury industry, the data utility will be the infrastructure for the 21st Century.
AI may well hold the answers to many of the largest problems humanity will face as we move towards the inevitability of 10 billion global inhabitants. But it’s only going to give those answers up if we are able to affordably capture and store the data needed to realize its promise.