Life is about to become much more interesting in Big Data land. While data have ever been with us, it’s only recently that Hadoop and other Big Data technologies have dramatically altered the economics of collecting, managing, and utilizing data to drive businesses. Now it appears that even these open technologies, from Hadoop to Cassandra, are about to be transformed themselves, leaving the world of batch-oriented data processing behind in favor of real-time analytics.
Hold on to your seats.
As powerful as Hadoop is, it has one significant shortcoming: it’s batch-oriented. Even a few years ago, this was fine, as just being able to gather and crunch the data after the fact, at a much lower cost than traditional data mining, was a huge win. But, as Todd Papaioannou, formerly vice president of cloud architecture at Yahoo! and now founder of Continuuity, argues, “people are expecting much more of a real-time experience” on the web, something that Hadoop hasn’t historically delivered. He further notes,
It’s not clear to me, as an industry, that we have nailed that [real-time analysis] problem. It is clear to me that we need to solve that problem, and that the next big wave of applications is going to be real-time and to get to real-time, you have to take the human out of the loop.
This isn’t to suggest anyone should stay on the sidelines and wait for Hadoop (and other NoSQL databases) to achieve real-time status. Far from it. Many an industry is already transforming itself through the Big Data intelligence that Hadoop and other technologies enable, batch orientation and all.
After all, just starting to work with data at all is a big deal. The stakes are huge, as a wide variety of industries are sprinting to take advantage of the treasure troves of data available to them. As the Wall Street Journal reports, it used to be enough to mine receipts and other consumer data to find nuggets of information that could affect your business. But now the Holy Grail is “getting and making effective use of information as it happens.”
We’re not far off. Just as the financial services industry used to operate on a 20-minute time lag with stock information, but now streams real-time stock information to traders and others, so, too, will industries as varied as retail and agriculture increasingly base decisions on up-to-the-second information about purchasing trends, weather patterns, and more.
Of course, we still need better Big Data-savvy applications to make sense of the data. But these are coming.
What is needed now, perhaps more than anything else, is to enable Hadoop, the clear front-runner in the NoSQL sweepstakes, as a real-time data storage and processing tool. Continuuity doesn’t indicate how it plans to accomplish this, but there are others working on this same problem, including Nodeable. (Not surprisingly, we believe we have cracked the code. 🙂 Regardless of who crosses that real-time finishing line first, many industries are benefiting from the Big Data gold rush today, and will benefit even more when we can make Hadoop real-time and enable a host of data-intensive applications to tame some of Hadoop’s complexity.
Again, the stakes are huge, which is why so much investment is going into this, both in terms of venture capital and enterprise IT. As these converge, expect to see a transformation of how businesses operate. In real-time. At high levels of efficiency.