Organizations just starting their Big Data initiatives can already take advantage of many lessons learned by others. For example, it is important to set a goal and define an achievable project, with clear success criteria, right from the beginning.
Starting with a realistic plan is the first step towards success but other factors are also important. In our conversations with enterprises moving their projects into production, they quickly identified three key factors that helped them ensure success:
- The initiative is deemed business critical by senior leadership
- With cost savings initiatives, you need to ensure that the first set of processes moved are critical to the business for analytics, reporting, etc and resource intensive. With revenue generating initiatives, securing senior leadership sponsorship is a little easier but ROI needs to be proven faster. Most organizations try a mix of cost savings and revenue generating initiatives to ensure continued senior leadership sponsorship and set an expectation of 1-2 years, on average, to achieve to ROI.
- The development framework(s) used leverage existing resources skills
- Most of the industry reports still list a lack of availability of skilled resources as a barrier for adoption. There are still a lot of, Pig, Hive and MapReduce jobs being developed directly in Hadoop and this does require new skills. Organizations that achieved ROI, quickly, have largely minimized the number of applications they are developing directly in Hadoop and have moved to some sort of abstraction layer, beyond Hive and Pig.
- You can use GUI-based tools like Informatica or SnapLogic for ETL processes but they are limited beyond that use case. Cascading is a Java-based Open Source API framework. Scalding, also Open Source, support Scala-based development contributed by Twitter. More recently, of course, is Apache Spark. These are examples… there are more.
- There is a high level of operational transparency
- Many Big Data projects start as science experiments in one or more business teams. As you take your Hadoop infrastructure from experiment to a production environment, be sure it has the required operational transparency and is integrated into existing operational support systems. This provides business teams with confidence in the production readiness of the environment and that their data will be delivered within service levels.
- To do this, provide operations business teams with visibility into application performance, not just cluster performance. This will let you know who are using resources and what is consuming them. Without application level performance visibility, it is difficult to maintain reliable service levels at scale..
HomeAway is a great example of an organization that has found value from their Big Data investment because of these three factors. For example, one initiative gathers customer preference data from dozens of websites and uses it to refine their marketing and, in turn, increase bookings.
To hear more about HomeAway’s big data projects, join us on Nov. 10th, 2015 at 11:00 AM PT for a webinar with Rene, Austin, Michael and Francois from the HomeAway team. Learn how they successfully implemented a shared services Hadoop environment, are rapidly on-boarding new developers with a short learning curve, and have achieved operational excellence on Hadoop. Register here: http://info.cascading.io/webinar-homeaway-bigdata-increases-bookings-registration-0