SlideShare une entreprise Scribd logo
1  sur  36
10 Common Hadoop-able
Problems
August 5, 2010
Topics

•   Introduction
•   10 Common Hadoop-able Problems
•   Summary
•   Questions




                Copyright 2010 Cloudera Inc. All rights reserved   2
Today’s speaker - Jeff Hammerbacher

 • hammer@cloudera.com
 • Studied Mathematics at Harvard
 • Worked as a Quant on Wall Street
 • Conceived, built, and led Data team at Facebook
    • Nearly 30 amazing engineers and data scientists
    • Several open source projects and research papers
 • Founder of Cloudera
    • Chief Scientist
    • Also, check out the book “Beautiful Data”


                  Copyright 2010 Cloudera Inc. All rights reserved   3
What is Hadoop?

• A scalable fault-tolerant distributed system for data storage
  and processing (open source under the Apache license)

• Scalable data processing engine
   • Hadoop Distributed File System (HDFS): self-healing high-bandwidth
     clustered storage
   • MapReduce: fault-tolerant distributed processing

• Key value
   •   Flexible -> store data without a schema and add it later as needed
   •   Affordable -> cost / TB at a fraction of traditional options
   •   Broadly adopted -> a large and active ecosystem
   •   Proven at scale -> dozens of petabyte + implementations in
       production today
                      Copyright 2010 Cloudera Inc. All Rights Reserved.     4
Cloudera’s Distribution for Hadoop, version 3
The industry’s leading Hadoop distribution



                                                  Hue                               Hue SDK

                               Oozie                              Oozie                 Hive
                                                                          Pig/
                                                                          Hive


                Flume, Sqoop                                                          HBase

                                                                                   Zookeeper



•   Open source – 100% Apache licensed
•   Simplified – Component versions & dependencies managed for you
•   Integrated – All components & functions interoperate through standard API’s
•   Reliable – Patched with fixes from future releases to improve stability
•   Supported – Employs project founders and committers for >70% of components
                               Copyright 2010 Cloudera Inc. All Rights Reserved.               5
How does Cloudera know which problems are
Hadoop-able?

 • Talking to 1000s of users
 • Supporting 100s of implementations
 • Experience putting Hadoop into production with
   customers across a range of industries




                Copyright 2010 Cloudera Inc. All rights reserved   6
Summary – 10 Common Hadoop-able Problems


 1. Modeling true risk                          6. Analyzing network data
                                                   to predict failure
 2. Customer churn
    analysis                                    7. Threat analysis
 3. Recommendation                              8. Trade surveillance
    engine
                                                9. Search quality
 4. Ad targeting
                                                10. Data “sandbox”
 5. PoS transaction analysis


                Copyright 2010 Cloudera Inc. All rights reserved            7
What is common across Hadoop-able problems?

 Nature of the data
 • Complex data
 • Multiple data sources
 • Lots of it

 Nature of the analysis
 • Batch processing
 • Parallel execution
 • Spread data over a cluster of servers
   and take the computation to the data

                  Copyright 2010 Cloudera Inc. All rights reserved   8
What Analysis is Possible With Hadoop?


 • Text mining                                   • Collaborative filtering
 • Index building                                • Prediction models
 • Graph creation and                            • Sentiment analysis
   analysis
                                                 • Risk assessment
 • Pattern recognition




                 Copyright 2010 Cloudera Inc. All rights reserved            9
Benefits of Analyzing With Hadoop

 • Previously impossible/impractical to do this analysis

 • Analysis conducted at lower cost

 • Analysis conducted in less time

 • Greater flexibility




                 Copyright 2010 Cloudera Inc. All rights reserved   10
Topics

•   Introduction
•   10 Common Hadoop-able Problems
•   Summary
•   Questions




                Copyright 2010 Cloudera Inc. All rights reserved   11
1. Modeling True Risk




              Copyright 2010 Cloudera Inc. All rights reserved   12
1. Modeling True Risk
 Solution with Hadoop
 • Source, parse and aggregate disparate data
   sources to build comprehensive data picture
    • e.g. credit card records, call recordings, chat
      sessions, emails, banking activity
 • Structure and analyze
    • Sentiment analysis, graph creation, pattern
      recognition

 Typical Industry
 • Financial Services (Banks, Insurance)
                    Copyright 2010 Cloudera Inc. All rights reserved   13
2. Customer Churn Analysis




             Copyright 2010 Cloudera Inc. All rights reserved   14
2. Customer Churn Analysis
 Solution with Hadoop
 • Rapidly test and build behavioral model of customer
   from disparate data sources
 • Structure and analyze with Hadoop
    • Traversing
    • Graph creation
    • Pattern recognition


 Typical Industry
 • Telecommunications, Financial Services
                    Copyright 2010 Cloudera Inc. All rights reserved   15
3. Recommendation Engine




            Copyright 2010 Cloudera Inc. All rights reserved   16
3. Recommendation Engine
 Solution with Hadoop

 • Batch processing framework
    • Allow execution in in parallel over large datasets
 • Collaborative filtering
    • Collecting ‘taste’ information from many users
    • Utilizing information to predict what similar
      users like

 Typical Industry
 • Ecommerce, Manufacturing, Retail
                    Copyright 2010 Cloudera Inc. All rights reserved   17
4. Ad Targeting




              Copyright 2010 Cloudera Inc. All rights reserved   18
4. Ad Targeting
 Solution with Hadoop

 • Data analysis can be conducted in parallel, reducing
   processing times from days to hours
 • With Hadoop, as data volumes grow the only
   expansion cost is hardware
 • Add more nodes without a degradation in
   performance

 Typical Industry
 • Advertising
                    Copyright 2010 Cloudera Inc. All rights reserved   19
5. Point of Sale Transaction Analysis




              Copyright 2010 Cloudera Inc. All rights reserved   20
5. Point of Sale Transaction Analysis
 Solution with Hadoop
 • Batch processing framework
    • Allow execution in in parallel over large datasets
 • Pattern recognition
    • Optimizing over multiple data sources
    • Utilizing information to predict demand


 Typical Industry
 • Retail

                    Copyright 2010 Cloudera Inc. All rights reserved   21
6. Analyzing Network Data to Predict Failure




              Copyright 2010 Cloudera Inc. All rights reserved   22
6. Analyzing Network Data to Predict Failure
 Solution with Hadoop
 • Take the computation to the data
    • Expand the range of indexing techniques from simple
       scans to more complex data mining
 • Better understand how the network reacts to
   fluctuations
    • How previously thought discrete anomalies may, in
       fact, be interconnected
 • Identify leading indicators of component failure
 Typical Industry
 • Utilities, Telecommunications,
   Data Centers
                    Copyright 2010 Cloudera Inc. All rights reserved   23
7. Threat Analysis




              Copyright 2010 Cloudera Inc. All rights reserved   24
7. Threat Analysis

 Solution with Hadoop

 • Parallel processing over huge datasets
 • Pattern recognition to identify anomalies i.e. threats

 Typical Industry
 • Security
 • Financial Services
 • General: spam fighting,
   click fraud
                    Copyright 2010 Cloudera Inc. All rights reserved   25
8. Trade Surveillance




              Copyright 2010 Cloudera Inc. All rights reserved   26
8. Trade Surveillance

 Solution with Hadoop

 • Batch processing framework
    • Allow execution in in parallel over large datasets
 • Pattern recognition
    • Detect trading anomalies and harmful behavior

 Typical Industry
 • Financial services
 • Regulatory bodies
                    Copyright 2010 Cloudera Inc. All rights reserved   27
9. Search Quality




              Copyright 2010 Cloudera Inc. All rights reserved   28
9. Search Quality
 Solution with Hadoop

 • Analyzing search attempts in conjunction with
   structured data
 • Pattern recognition
    • Browsing pattern of users performing searches in
      different categories

 Typical Industry
 • Web
 • Ecommerce

                    Copyright 2010 Cloudera Inc. All rights reserved   29
10. Data “Sandbox”




             Copyright 2010 Cloudera Inc. All rights reserved   30
10. Data “Sandbox”
 Solution with Hadoop

 • With Hadoop an organization can “dump” all this
   data into a HDFS cluster
 • Then use Hadoop to start trying out different
   analysis on the data
 • See patterns or relationships that allow the
   organization to derive additional value from data

 Typical Industry
 • Common across all industries
                    Copyright 2010 Cloudera Inc. All rights reserved   31
Topics

•   Introduction
•   10 Common Hadoop-able Problems
•   Summary
•   Questions




                Copyright 2010 Cloudera Inc. All rights reserved   32
Summary – 10 Common Hadoop-able Problems

 1. Modeling true risk                         6. Threat analysis
 2. Customer churn                             7. Analyzing network
    analysis                                      data to predict failure
 3. Recommendation                             8. Trade surveillance
    engine
                                               9. Search quality
 4. Ad targeting
                                               10. Data “sandbox”
 5. PoS transaction
    analysis

               Copyright 2010 Cloudera Inc. All rights reserved             33
Who is Cloudera?

• Enterprise software & services company providing the industry’s
  leading Hadoop-based data management platform
   • Founding team came from large Web companies



• Products: Cloudera Enterprise & Cloudera’s Distribution for Hadoop
   • All necessary packages, matched, tested and supported
   • Tools to support production use of Hadoop
   • The leading distribution for the enterprise


• Contributors and committers
   • Fixing, patching and adding features

                                                                    34
Hear More Examples @ Hadoop World 2010
http://www.cloudera.com/company/press-center/hadoop-world-nyc/


 •   2nd annual event focused on practical
     applications of Hadoop

 •   Date: October 12th 2010

 •   Location: Hilton New York                                               Confirmed speakers from

 •   Keynote from Tim O’Reilly – founder
     O’Reilly Media

 •   Pre and post conference training
     available for Hadoop and related projects

 •   36 business and technical focused sessions


                         Copyright 2010 Cloudera Inc. All Rights Reserved.                             35
Questions?




             Copyright 2010 Cloudera Inc. All Rights Reserved.   36

Contenu connexe

Tendances

Extracting Data from GE Smallworld into Common Information Model (CIM XML)
Extracting Data from GE Smallworld into Common Information Model (CIM XML)Extracting Data from GE Smallworld into Common Information Model (CIM XML)
Extracting Data from GE Smallworld into Common Information Model (CIM XML)Safe Software
 
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...Amazon Web Services
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
AWS RDS Presentation - DOAG Conference
AWS RDS Presentation - DOAG Conference AWS RDS Presentation - DOAG Conference
AWS RDS Presentation - DOAG Conference Amazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFSEdureka!
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Building RESTful APIs w/ Grape
Building RESTful APIs w/ GrapeBuilding RESTful APIs w/ Grape
Building RESTful APIs w/ GrapeDaniel Doubrovkine
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesAmazon Web Services
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Simplilearn
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureVARUN SAXENA
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoopinside-BigData.com
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdfEdureka!
 
Navigating GDPR Compliance on AWS
Navigating GDPR Compliance on AWSNavigating GDPR Compliance on AWS
Navigating GDPR Compliance on AWSAmazon Web Services
 

Tendances (20)

Extracting Data from GE Smallworld into Common Information Model (CIM XML)
Extracting Data from GE Smallworld into Common Information Model (CIM XML)Extracting Data from GE Smallworld into Common Information Model (CIM XML)
Extracting Data from GE Smallworld into Common Information Model (CIM XML)
 
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...
Secure your Web Applications with AWS Web Application Firewall (WAF) and AWS ...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
AWS RDS Presentation - DOAG Conference
AWS RDS Presentation - DOAG Conference AWS RDS Presentation - DOAG Conference
AWS RDS Presentation - DOAG Conference
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 
Hadoop Architecture and HDFS
Hadoop Architecture and HDFSHadoop Architecture and HDFS
Hadoop Architecture and HDFS
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Building RESTful APIs w/ Grape
Building RESTful APIs w/ GrapeBuilding RESTful APIs w/ Grape
Building RESTful APIs w/ Grape
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Serverless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best PracticesServerless Architecture - Design Patterns and Best Practices
Serverless Architecture - Design Patterns and Best Practices
 
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
Hadoop Ecosystem | Hadoop Ecosystem Tutorial | Hadoop Tutorial For Beginners ...
 
Application Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and FutureApplication Timeline Server - Past, Present and Future
Application Timeline Server - Past, Present and Future
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
Hadoop
HadoopHadoop
Hadoop
 
Hive vs. Impala
Hive vs. ImpalaHive vs. Impala
Hive vs. Impala
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
 
Navigating GDPR Compliance on AWS
Navigating GDPR Compliance on AWSNavigating GDPR Compliance on AWS
Navigating GDPR Compliance on AWS
 

En vedette

Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!Edureka!
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaCloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingJayush Luniya
 
Giip kb-hadoop sizing
Giip kb-hadoop sizingGiip kb-hadoop sizing
Giip kb-hadoop sizingLowy Shin
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemVolha Banadyseva
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsFadi Yousuf
 
Positive Psychology In 5 Slides
Positive Psychology In 5 SlidesPositive Psychology In 5 Slides
Positive Psychology In 5 SlidesBecky Washington
 
The 7 Pillars of Market Surveillance 2.0
The 7 Pillars of Market Surveillance 2.0The 7 Pillars of Market Surveillance 2.0
The 7 Pillars of Market Surveillance 2.0Software AG
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesCloudera, Inc.
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big DataLegacy Typesafe (now Lightbend)
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesOsama Jomaa
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsMilind Bhandarkar
 

En vedette (20)

Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
A day in the life of hadoop administrator!
A day in the life of hadoop administrator!A day in the life of hadoop administrator!
A day in the life of hadoop administrator!
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Aide d2 2_6
Aide d2 2_6Aide d2 2_6
Aide d2 2_6
 
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & TroubleshootingApache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
Apache Ambari: Simplified Hadoop Cluster Operation & Troubleshooting
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Optimization-Ppt
Optimization-PptOptimization-Ppt
Optimization-Ppt
 
Giip kb-hadoop sizing
Giip kb-hadoop sizingGiip kb-hadoop sizing
Giip kb-hadoop sizing
 
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and EcosystemErnestas Sysojevas. Hadoop Essentials and Ecosystem
Ernestas Sysojevas. Hadoop Essentials and Ecosystem
 
Introduction to Hadoop - The Essentials
Introduction to Hadoop - The EssentialsIntroduction to Hadoop - The Essentials
Introduction to Hadoop - The Essentials
 
Positive Psychology In 5 Slides
Positive Psychology In 5 SlidesPositive Psychology In 5 Slides
Positive Psychology In 5 Slides
 
The 7 Pillars of Market Surveillance 2.0
The 7 Pillars of Market Surveillance 2.0The 7 Pillars of Market Surveillance 2.0
The 7 Pillars of Market Surveillance 2.0
 
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency ObjectivesHadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 
Future of Data Intensive Applicaitons
Future of Data Intensive ApplicaitonsFuture of Data Intensive Applicaitons
Future of Data Intensive Applicaitons
 

Similaire à 10 Common Hadoop-able Problems Webinar

Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Cloudera, Inc.
 
Hadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVAHadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVACloudera, Inc.
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Cloudera, Inc.
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Pactera_US
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksHortonworks
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaMark Kerzner
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Stefan Lipp
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...TheInevitableCloud
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderainevitablecloud
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarPlatfora
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)Alexander Alten
 

Similaire à 10 Common Hadoop-able Problems Webinar (20)

Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
 
Hadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVAHadoop As The Platform For The Smartgrid At TVA
Hadoop As The Platform For The Smartgrid At TVA
 
Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5Hadoop summit cloudera keynote_v5
Hadoop summit cloudera keynote_v5
 
Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks Transform Your Business with Big Data and Hortonworks
Transform Your Business with Big Data and Hortonworks
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Transform You Business with Big Data and Hortonworks
Transform You Business with Big Data and HortonworksTransform You Business with Big Data and Hortonworks
Transform You Business with Big Data and Hortonworks
 
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of ClouderaHouston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
Houston Hadoop Meetup Presentation by Vikram Oberoi of Cloudera
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
Cloudera Big Data Integration Speedpitch at TDWI Munich June 2017
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
Intro to Big Data and Apache Hadoop by Dr. Amr Awadallah at CLOUD WEEKEND '13...
 
Cw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-clouderaCw13 big data and apache hadoop by amr awadallah-cloudera
Cw13 big data and apache hadoop by amr awadallah-cloudera
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Hadoop Data Reservoir Webinar
Hadoop Data Reservoir WebinarHadoop Data Reservoir Webinar
Hadoop Data Reservoir Webinar
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

10 Common Hadoop-able Problems Webinar

  • 2. Topics • Introduction • 10 Common Hadoop-able Problems • Summary • Questions Copyright 2010 Cloudera Inc. All rights reserved 2
  • 3. Today’s speaker - Jeff Hammerbacher • hammer@cloudera.com • Studied Mathematics at Harvard • Worked as a Quant on Wall Street • Conceived, built, and led Data team at Facebook • Nearly 30 amazing engineers and data scientists • Several open source projects and research papers • Founder of Cloudera • Chief Scientist • Also, check out the book “Beautiful Data” Copyright 2010 Cloudera Inc. All rights reserved 3
  • 4. What is Hadoop? • A scalable fault-tolerant distributed system for data storage and processing (open source under the Apache license) • Scalable data processing engine • Hadoop Distributed File System (HDFS): self-healing high-bandwidth clustered storage • MapReduce: fault-tolerant distributed processing • Key value • Flexible -> store data without a schema and add it later as needed • Affordable -> cost / TB at a fraction of traditional options • Broadly adopted -> a large and active ecosystem • Proven at scale -> dozens of petabyte + implementations in production today Copyright 2010 Cloudera Inc. All Rights Reserved. 4
  • 5. Cloudera’s Distribution for Hadoop, version 3 The industry’s leading Hadoop distribution Hue Hue SDK Oozie Oozie Hive Pig/ Hive Flume, Sqoop HBase Zookeeper • Open source – 100% Apache licensed • Simplified – Component versions & dependencies managed for you • Integrated – All components & functions interoperate through standard API’s • Reliable – Patched with fixes from future releases to improve stability • Supported – Employs project founders and committers for >70% of components Copyright 2010 Cloudera Inc. All Rights Reserved. 5
  • 6. How does Cloudera know which problems are Hadoop-able? • Talking to 1000s of users • Supporting 100s of implementations • Experience putting Hadoop into production with customers across a range of industries Copyright 2010 Cloudera Inc. All rights reserved 6
  • 7. Summary – 10 Common Hadoop-able Problems 1. Modeling true risk 6. Analyzing network data to predict failure 2. Customer churn analysis 7. Threat analysis 3. Recommendation 8. Trade surveillance engine 9. Search quality 4. Ad targeting 10. Data “sandbox” 5. PoS transaction analysis Copyright 2010 Cloudera Inc. All rights reserved 7
  • 8. What is common across Hadoop-able problems? Nature of the data • Complex data • Multiple data sources • Lots of it Nature of the analysis • Batch processing • Parallel execution • Spread data over a cluster of servers and take the computation to the data Copyright 2010 Cloudera Inc. All rights reserved 8
  • 9. What Analysis is Possible With Hadoop? • Text mining • Collaborative filtering • Index building • Prediction models • Graph creation and • Sentiment analysis analysis • Risk assessment • Pattern recognition Copyright 2010 Cloudera Inc. All rights reserved 9
  • 10. Benefits of Analyzing With Hadoop • Previously impossible/impractical to do this analysis • Analysis conducted at lower cost • Analysis conducted in less time • Greater flexibility Copyright 2010 Cloudera Inc. All rights reserved 10
  • 11. Topics • Introduction • 10 Common Hadoop-able Problems • Summary • Questions Copyright 2010 Cloudera Inc. All rights reserved 11
  • 12. 1. Modeling True Risk Copyright 2010 Cloudera Inc. All rights reserved 12
  • 13. 1. Modeling True Risk Solution with Hadoop • Source, parse and aggregate disparate data sources to build comprehensive data picture • e.g. credit card records, call recordings, chat sessions, emails, banking activity • Structure and analyze • Sentiment analysis, graph creation, pattern recognition Typical Industry • Financial Services (Banks, Insurance) Copyright 2010 Cloudera Inc. All rights reserved 13
  • 14. 2. Customer Churn Analysis Copyright 2010 Cloudera Inc. All rights reserved 14
  • 15. 2. Customer Churn Analysis Solution with Hadoop • Rapidly test and build behavioral model of customer from disparate data sources • Structure and analyze with Hadoop • Traversing • Graph creation • Pattern recognition Typical Industry • Telecommunications, Financial Services Copyright 2010 Cloudera Inc. All rights reserved 15
  • 16. 3. Recommendation Engine Copyright 2010 Cloudera Inc. All rights reserved 16
  • 17. 3. Recommendation Engine Solution with Hadoop • Batch processing framework • Allow execution in in parallel over large datasets • Collaborative filtering • Collecting ‘taste’ information from many users • Utilizing information to predict what similar users like Typical Industry • Ecommerce, Manufacturing, Retail Copyright 2010 Cloudera Inc. All rights reserved 17
  • 18. 4. Ad Targeting Copyright 2010 Cloudera Inc. All rights reserved 18
  • 19. 4. Ad Targeting Solution with Hadoop • Data analysis can be conducted in parallel, reducing processing times from days to hours • With Hadoop, as data volumes grow the only expansion cost is hardware • Add more nodes without a degradation in performance Typical Industry • Advertising Copyright 2010 Cloudera Inc. All rights reserved 19
  • 20. 5. Point of Sale Transaction Analysis Copyright 2010 Cloudera Inc. All rights reserved 20
  • 21. 5. Point of Sale Transaction Analysis Solution with Hadoop • Batch processing framework • Allow execution in in parallel over large datasets • Pattern recognition • Optimizing over multiple data sources • Utilizing information to predict demand Typical Industry • Retail Copyright 2010 Cloudera Inc. All rights reserved 21
  • 22. 6. Analyzing Network Data to Predict Failure Copyright 2010 Cloudera Inc. All rights reserved 22
  • 23. 6. Analyzing Network Data to Predict Failure Solution with Hadoop • Take the computation to the data • Expand the range of indexing techniques from simple scans to more complex data mining • Better understand how the network reacts to fluctuations • How previously thought discrete anomalies may, in fact, be interconnected • Identify leading indicators of component failure Typical Industry • Utilities, Telecommunications, Data Centers Copyright 2010 Cloudera Inc. All rights reserved 23
  • 24. 7. Threat Analysis Copyright 2010 Cloudera Inc. All rights reserved 24
  • 25. 7. Threat Analysis Solution with Hadoop • Parallel processing over huge datasets • Pattern recognition to identify anomalies i.e. threats Typical Industry • Security • Financial Services • General: spam fighting, click fraud Copyright 2010 Cloudera Inc. All rights reserved 25
  • 26. 8. Trade Surveillance Copyright 2010 Cloudera Inc. All rights reserved 26
  • 27. 8. Trade Surveillance Solution with Hadoop • Batch processing framework • Allow execution in in parallel over large datasets • Pattern recognition • Detect trading anomalies and harmful behavior Typical Industry • Financial services • Regulatory bodies Copyright 2010 Cloudera Inc. All rights reserved 27
  • 28. 9. Search Quality Copyright 2010 Cloudera Inc. All rights reserved 28
  • 29. 9. Search Quality Solution with Hadoop • Analyzing search attempts in conjunction with structured data • Pattern recognition • Browsing pattern of users performing searches in different categories Typical Industry • Web • Ecommerce Copyright 2010 Cloudera Inc. All rights reserved 29
  • 30. 10. Data “Sandbox” Copyright 2010 Cloudera Inc. All rights reserved 30
  • 31. 10. Data “Sandbox” Solution with Hadoop • With Hadoop an organization can “dump” all this data into a HDFS cluster • Then use Hadoop to start trying out different analysis on the data • See patterns or relationships that allow the organization to derive additional value from data Typical Industry • Common across all industries Copyright 2010 Cloudera Inc. All rights reserved 31
  • 32. Topics • Introduction • 10 Common Hadoop-able Problems • Summary • Questions Copyright 2010 Cloudera Inc. All rights reserved 32
  • 33. Summary – 10 Common Hadoop-able Problems 1. Modeling true risk 6. Threat analysis 2. Customer churn 7. Analyzing network analysis data to predict failure 3. Recommendation 8. Trade surveillance engine 9. Search quality 4. Ad targeting 10. Data “sandbox” 5. PoS transaction analysis Copyright 2010 Cloudera Inc. All rights reserved 33
  • 34. Who is Cloudera? • Enterprise software & services company providing the industry’s leading Hadoop-based data management platform • Founding team came from large Web companies • Products: Cloudera Enterprise & Cloudera’s Distribution for Hadoop • All necessary packages, matched, tested and supported • Tools to support production use of Hadoop • The leading distribution for the enterprise • Contributors and committers • Fixing, patching and adding features 34
  • 35. Hear More Examples @ Hadoop World 2010 http://www.cloudera.com/company/press-center/hadoop-world-nyc/ • 2nd annual event focused on practical applications of Hadoop • Date: October 12th 2010 • Location: Hilton New York Confirmed speakers from • Keynote from Tim O’Reilly – founder O’Reilly Media • Pre and post conference training available for Hadoop and related projects • 36 business and technical focused sessions Copyright 2010 Cloudera Inc. All Rights Reserved. 35
  • 36. Questions? Copyright 2010 Cloudera Inc. All Rights Reserved. 36