| By Bill McColl | Article Rating: |
|
| January 14, 2010 09:00 AM EST | Reads: |
2,458 |
Cloudcel on Ulitzer
Back in 1985, the world was pre-web, data volumes were small, and no one was grappling with information overload. Relational databases and the shiny new SQL query language were just about perfect for this era. At work, 100% of the data required by employees was internal business data, the data was highly structured, and was organized in simple tables. Users would pull data from the database when they realized they needed it.
Fast forward to 2010. Today, everyone is grappling constantly with information overload, both in their work and in their social life. Most data today is unstructured, and most of it is in files, streams or feeds, rather than in structured tables. Many of the data streams are realtime, and constantly changing. At work, most of the data required by employees is now external data, from the web, from analytics tools, and from monitoring systems of all kinds - all kinds of data about customers, partners, employees, competitors, marketing, advertising, pricing, infrastructure, and operations. Today what's needed is smart IT systems that can automatically analyze, filter and push exactly the right data to users in realtime, just when they need it. Oh, and since no one wants to own data processing hardware and software any more, those IT systems should be in the cloud.
So how has the IT industry responded to the dramatic changes brought about first by the web, then more recently by the realtime social web and the cloud. What tools are now available to users in this new era of Big Data where data volumes are growing exponentially.
From 1985 to 2004, SQL was essentially the only game in town. Around 2004, a number of companies, led by Google, and including Ebay, Yahoo and later Facebook, realized that they required levels of scalability, parallelism, performance and data flexibility that went way beyond what relational databases and SQL could provide. Their solution was to adopt a simple parallel programming framework, MapReduce, in place of SQL. MapReduce and its open source version Hadoop are now widely used to analyze very large data sets.
So what's next? If SQL was the first generation Big Data tool, and MapReduce/Hadoop was the second generation tool, what might a third generation tool look like? To answer this, we need to look at the areas in which MapReduce/Hadoop are weak - those areas are (a) realtime, and (b) ease-of-use. The MapReduce model is optimized for large-scale batch processing. As such, it is not a good fit for the growing number of applications requiring realtime stream processing. The model is also designed for use by experienced programmers, in the case of Hadoop, for use by experienced Java programmers. Unfortunately, the vast majority of those grappling with Big Data challenges today are "non-programmers". They are individuals or business users who rely on tools like Excel spreadsheets for processing their data. And there are a lot of them! Several hundred million Excel users alone.
The third generation of tools for Big Data will therefore need to offer the scalability, parallelism, performance and data flexibility of tools like Hadoop, but also be able to continuously process realtime data streams, and be as easy to use as a spreadsheet. At Cloudscale we've been tackling this challenge. Our Cloudcel service provides the first example of such a third generation Big Data tool.
SQL remains a great tool for handling structured, tabular data, and for transactional applications. MapReduce and Hadoop are great tools if you are a programmer and your task is to process two petabytes of historical data across three thousand servers in less than 24 hours. We now also have a third type of Big Data tool aimed at the much larger number of people who need a simple and easy-to-use, but powerful and scalable cloud-based service for analyzing the huge volumes of data that are now continuously bombarding them in their life and their work.
Published January 14, 2010 Reads 2,458
Copyright © 2010 Ulitzer, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
Related Stories
More Stories By Bill McColl
Bill McColl is Founder & CEO, Cloudscale Inc. In order to found Cloudscale he left Oxford University, where for over twenty years he was Professor of Computer Science, Head of the Parallel Computing Research Center, and Chairman of the Computer Science Faculty. He has led research, product and business teams in a number of areas: massively parallel algorithms and architectures, parallel programming languages and tools, datacenter virtualization and resource management, realtime stream processing, and cloud computing. Cloudscale is his second Silicon Valley software company. He was also founder and CEO of Sychron Inc., a Silicon Valley VC-backed software company developing scalable software systems for datacenter and desktop virtualization. McColl lives in Palo Alto, CA.
- Is This the End of Enterprise Software?
- Cloudscale CEO Launches Cloudcel Topic on Ulitzer
- 25 Years of Big Data: From SQL To The Cloud
- The Client-plus-Cloud Revolution
- Cloud Computing for Everyone
- Cloud Computing For The World's Excel Users
- DEMO Spring 2010 Launches Exceptional Crop of New Emerging Technologies
- DEMO Spring Conference Unveils Diverse Lineup of Emerging Technology Products

































Ulitzer content is offered under Creative Commons "Attribution Non-Commercial No Derivatives" License.
For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.
Any of the above conditions can be waived if you get written permission from Ulitzer, Inc., the copyright holder.
Nothing in this license impairs or restricts the author's moral rights.