Banishing the Confusion of Eight Big Data Myths

December 9, 2014
Chris Preimesberger
http://www.eweek.com/database/slideshows/banishing-the-confusion-of-eight-big-data-myths.html

Enterprises of all types and sizes are realizing that data sets being stored or archived in silos or in clouds—information they might have had considered too old or irrelevant, or only for regulation purposes—may have great potential value. It’s all about looking at a business’ history, making cogent queries, discovering insights and projecting what is likely to happen in the future, in order to become more customer-centric and inventory-effective. These companies are going into the internal business of analyzing data. As a result, organizations are in search of the necessary tools and information to take full advantage of the potential this movement offers. However, big data brings big hype, and big hype only brings big confusion of what’s what in the data market. In this slide show, eWEEK and Gary Nakamura, CEO of data application infrastructure provider Concurrent, discuss—and dismiss—the biggest myths that are disrupting the big data industry. Some of what turns out to be a myth may surprise you.

Myth 1: We Must Hire a Hadoop Expert

Hadoop is built on intricate concepts such as MapReduce, YARN, Spark and Hadoop Distributed File Systems (HDFS), and the constant change and announcements of subsystem-level technology further convolute the picture. Plenty of products and tools reduce the complexity and shield users entirely from this. There are open-source application frameworks and commercial products that significantly improve productivity and accessibility when working with Hadoop, up to the point where companies can use internal resources to execute on their big data strategy: enterprise Java developers, data warehouse developers and data analysts can quickly and easily leverage Hadoop.

Myth 2: Buying a Big Data Solution Means I’m Using Big Data

You’ve just convinced your organization to adopt a big data strategy, and you’ve purchased a solution. What’s next? Enterprises often get stuck at a point where they have the hardware and Hadoop software in place but don’t have the skill set to take advantage of it. Using big data means that you are using your data, executing a data strategy and helping your business with cost savings, revenue opportunities or additional insights. The key is lowering the bar for your organization to execute and deliver data products as quickly as possible. Delivering and running these production applications reliably and on time is the next set of challenges. When you achieve this level, you will know because your users will want more.

Myth 3: Big Data Is a Fad That Will Go Away in a Few Years

Ninety percent of the world’s data was created in the last three years. Sticking your head in sand and hoping that it will go away is a career-ending move. We may drop the “big” in big data in a few years, but whether you like it or not, your company will be in the business of data.

Myth 4: Businesses Need One Data Scientist for All Big Data Needs

For too long, businesses have been upholding the myth of the data science hero—the virtuoso who slays dragons and emerges with a treasure of an amazing app based on insights from big data. The truth is they can’t afford to rely on a single data scientist or developer because employees can leave an organization at any time. By building a “big data app factory” of processes and teams, companies can ensure that great work can be done over and over again—regardless of personnel changes.

Myth 5: Traditional Enterprise Data Warehouses Will Go Away

It’s unlikely that the technology of the past will completely go away. Enterprises will continue to rely on traditional enterprise data warehouses (EDWs). However, with the rapid evolution of Hadoop and accompanying products and technologies, the role of the EDW in the enterprise will significantly diminish. The flow of data will change, and it’s likely that Hadoop will be its first stop.

Myth 6: Apache Spark Is the Future of Hadoop

As usual, the new, sexy young object is always the most alluring. Apache Spark is currently one of those: It is a fast and general engine for large-scale, clustered data processing. However, rest assured, another will come along and take its place as the hottest thing on the market. What people often forget is that old reliable is old and reliable for a reason, as it usually has the breadth and depth needed to move your big data project forward. Resist the urge to move to the latest; if it ain’t broke, don’t fix it. Stick with what you know.

Myth 7: Big Data Is Only for the Largest of Enterprises

The “big” in big data is misleading. Everyone—including organizations large and small—is in the business of data. Sure, large enterprises collect massive amounts of data, but the abundance of data that small enterprises can collect and leverage for competitive advantage also can be immense. Just because your data may be small in volume does not mean you shouldn’t have a data strategy in place.

Myth 8: Big Data Is for Hadoop Experts

Enterprises today are rapidly adopting Hadoop to process, manage and make sense of growing volumes of data, and enterprises are now leveraging existing internal resources to drive their data strategies forward. There are now mature, reliable tools readily available for all software engineers to use to unlock the full potential of big data and Hadoop. As a result, no Hadoop expertise is required.