2. Hadoop adoption rates
No plans 38%
Considering 32%
Experimenting 20%
Implementing 5%
In production 4%
Based on 158 respondents, BI Leadership Forum, April, 2012
www.bileadership.com 2
3. Hadoop workloads today
Staging area 92%
Online archive 92%
Transformation Engine 83%
Ad hoc queries 58%
Scheduled reports 42%
Visual exploration 25%
Data mining 58%
Based on respondents that have implemented
3 Hadoop. BI Leadership Forum, April, 2012
4. Hadoop workloads in 18 months
Today In 18 Months
Staging area 92%
92%
Online archive 92%
92%
Transformation Engine 83%
92%
Ad hoc queries 58%
67%
Scheduled reports 42%
67%
Visual exploration 25%
67%
Data mining 58%
83%
4 Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012
5. Hadoop’s impact on the data warehouse
Replaces it 0%
Offloads existing workloads 50%
Handles new workloads 67%
Shares existing workloads 33%
Shares new workloads 25%
Don't know 8%
Based on respondents that have implemented
5 Hadoop. BI Leadership Forum, April, 2012
6. What data does Hadoop support?
Today In 18 months
Web logs 67%
75%
System logs 67%
67%
Social media 58%
75%
Transaction data 92%
100%
Semi-structured data 58%
67%
Sensor data 17%
42%
Audio or video 0%
42%
Email 25%
42%
Documents 33%
50%
Based on respondents that have implemented
Hadoop. BI Leadership Forum, April, 2012
8. Adoption Rate of Hadoop by Non-
Implementers
Within 12 months 40%
Within 24 months 22%
Within 36 months 5%
In 3+ years 3%
Not sure 30%
Never 0%
Based on 76 respondents that have not yet implemented
Hadoop. BI Leadership Forum, April, 2012
9. Expected Use of Hadoop by Non-
Implementers
Staging area 37%
Online archive 23%
Transformation Engine 39%
Ad hoc queries 45%
Scheduled reports 5%
Visual exploration 27%
Data mining 57%
Not sure 23%
Other 5%
Based on respondents that have not yet implemented
Hadoop. BI Leadership Forum, April, 2012
www.bileader.com 9
10. Data that Non-Implementers Will
Store in Hadoop
Web logs 53%
System logs 33%
Social media 47%
Transaction data 44%
Semi-structured data 50%
Sensor data 24%
Audio or video 8%
Email 18%
Documents 18%
Not sure 11%
Based on respondents that have not yet implemented
Hadoop. BI Leadership Forum, April, 2012
www.bileader.com 10
Notes de l'éditeur
How do this without having companies hire specialists who know how to query Hadoop using Java or overcome latency. Latency: via Hcatalog – Query: Better interfacesWon’t fix things like user concurrency – THIS IS ASPIRATION BUT LOTS OF OBSTACLES PREVENTING – - Latency via batch, user concurrency cause no workload mgmt or prioritization or query optimizer Know coding
Offload log data, images, audio/video, data mining, transformationsTeradata appliances offload certain analytical workloads – Aster offloads unstructured data Allows Teradata to do more with what it has or add more structured data.
Figure 9
Hive converts queries into MR – Aster issues standard queries without creating MR jobsConnectivityPros: Easy to build and use; bring data down to analyze in RDBMS or in-memory cubeCons: Requires moving data from one system to the otherHybrid SystemsPros: One environment for all data and processingCons: Redundant if you already have Hadoop or RDBMSNative HadoopPros: Seamless access without translation Cons: MapReduce latency and external callsInteroperabilityPros: SQL access via Hadoop API; federated queries Cons: Lack of Hadoop metadata, not bidirectional