SlideShare une entreprise Scribd logo
1  sur  15
PROFIT
                  FROM ALL OF
                  YOUR DATA



February 2012

Hadoop in the Enterprise
Adam Smieszny | Systems Engineer
Agenda

    • Hadoop Overview
      • History of Hadoop
      • What is Hadoop
      • Hadoop in the Enterprise




2
                      ©2011 Cloudera, Inc. All Rights Reserved.
Existing Data Management

                                          10,000
GIGABYTES OF DATA CREATED (IN BILLIONS)




                                                       Current Database Solutions are
                                                       designed for structured data.

                                                              Optimized to answer known questions quickly
                                                              Schemas dictate form/context
                                              5,000
                                                              Difficult to adapt to new data types and new
                                                               questions
                                                              Expensive at Petabyte scale




                                                 0                                                                                                                      10%
                                                      2005                                            2010                                                   2015

                                                                                                                                  STRUCTURED DATA   UNSTRUCTURED DATA




                                          3
                                                                                      ©2011 Cloudera, Inc. All Rights Reserved.
Why the Need for Hadoop?

                                           10,000
 GIGABYTES OF DATA CREATED (IN BILLIONS)




                                                          1.8 trillion gigabytes of data was
                                                          created in 2011…

                                                               More than 90% is unstructured data
                                                               Approx. 500 quadrillion files
                                               5,000           Quantity doubles every 2 years
                                                                                                                                  More                                      More
                                                                                                                                 Content                                   Devices




                                                                                                                                                                             New &
                                                                                                                                  New                                        Better
                                                                                                                                 Sources                                      Info




                                                  0

                                                       2005                                               2010                                                        2015

                                                                                                                                           STRUCTURED DATA   UNSTRUCTURED DATA
Source: IDC 2011




                                           4
                                                                                     ©2011 Cloudera, Inc. All Rights Reserved.
The Origins of Hadoop




                                                                                                   Launches SQL support
                                                                                                        for Hadoop




                                                    Open Source
    Open source web        Publishes MapReduce     MapReduce and                 Runs 4,000-node   Hadoop wins Terabyte   Releases CDH and
 crawler project created      and GFS Paper      HDFS project created             Hadoop cluster     sort benchmark       Cloudera Enterprise
    by Doug Cutting                                by Doug Cutting




2002                                                                    2007                                                              2012




 5
                                                      ©2011 Cloudera, Inc. All Rights Reserved.
What is Apache Hadoop?

                                                                            CORE HADOOP COMPONENTS
  Hadoop is a platform for data
  storage and processing that is…                                    Hadoop                          MapReduce
                                                                 Distributed File
       Scalable                                                 System (HDFS)
       Fault tolerant
       Open source                                                File Sharing & Data
                                                                    Protection Across
                                                                                                   Distributed Computing
                                                                                                  Across Physical Servers
                                                                    Physical Servers




           Flexibility                             Scalability                                     Low Cost
 A single repository for storing    Scale-out architecture divides                      Can be deployed on commodity
  processing & analyzing any type     workloads across multiple                            hardware
  of data                             nodes
                                                                                          Open source platform guards
 Not bound by a single schema       Flexible file system eliminates                      against vendor lock
                                      ETL bottlenecks




  6
                                        ©2011 Cloudera, Inc. All Rights Reserved.
What is CDH?

   Cloudera’s Distribution Including
   Apache Hadoop (CDH) is an enterprise-ready
   distribution of Hadoop that is…
          100% Apache open source
          Contains all components needed for deployment
          Fully documented and supported
          Released on a reliable schedule



   Fastest Path to Success                     Stable and Reliable                            Community Driven
 No need to write your own scripts or    Extensive Cloudera QA systems,                 Incorporates only main-line
  do integration testing on different      software & processes                            components from the Apache
  components                                                                               Hadoop ecosystem – no forks or
                                          Tested & run in production at scale
                                                                                           proprietary underpinnings
 Works with a wide range of operating    Proven at scale in dozens of
  systems, hardware, databases and                                                        FREE
                                           enterprise environments
  data warehouses




   7
                                             ©2011 Cloudera, Inc. All Rights Reserved.
CDH & Enterprise Ecosystem



                   Drivers, language enhancements, testing

                   File System Mount            UI Framework                   SDK
                                FUSE-DFS                        HUE                  HUE SDK




         Sqoop          Workflow
                            APACHE OOZIE
                                                 Scheduling
                                                       APACHE OOZIE
                                                                             Metadata
                                                                                 APACHE HIVE
         frame-
          work,                             Languages / Compilers
                                                                                                  More
        adapters        Data
                    Integration
                                                   APACHE PIG, APACHE HIVE
                                                                                 Fast
                                                                              Read/Write        coming…
                                                                               Access
                    APACHE FLUME,
                    APACHE SQOOP                                              APACHE HBASE



                                                Coordination
                                                                             APACHE ZOOKEEPER




                                           Packaging, testing



8
Hadoop / RDBMS Use Cases


                                                                   Create context
                                                                                                                           Analyze
        unstructured data                                   (classification, text mining)




                                                                    Parse, aggregate
                                                                                                                        Analyze, report
          semi-structured data




                                                                                                                          Active archival
                                                                     Analyze, report                                   Long running queries
              structured data

Slide borrowed from Krishnan Parasuraman presentation at Enzee’11




      9
                                                                    Copyright 2011 Cloudera Inc. All rights reserved
Hadoop in Production
 How Apache Hadoop fits
 into your existing infrastructure.

     OPERATORS                                  ENGINEERS                ANALYSTS            BUSINESS USERS   CUSTOMERS




     Management                                                                                 Enterprise      Web
                                                   IDE’s               BI / Analytics
        Tools                                                                                   Reporting     Application




                                                                                        Enterprise Data
                                                                                         Warehouse



                                                                                  Low-Latency Serving
                                                                                       Systems



                                                 Relational
        Logs      Files   Web Data
                                                   Data




10
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Hadoop Use Cases
Use Case                     Application                     Industry                               Application            Use Case

                        Social Network Analysis                  Web                          Clickstream Sessionization


                         Content Optimization                  Media                          Clickstream Sessionization
   ADVANCED ANALYTICS




                                                                                                                              DATA PROCESSING
                          Network Analytics                     Telco                                 Mediation

                         Loyalty & Promotions
                                                               Retail                                Data Factory
                               Analysis

                            Fraud Analysis                  Financial                            Trade Reconciliation


                            Entity Analysis                   Federal                                  SIGINT


                         Sequencing Analysis          Bioinformatics                              Genome Mapping



  11
                                                  ©2011 Cloudera, Inc. All Rights Reserved.
Use Case: Customer Risk

 Build comprehensive data picture of customer side risk
     Publish a consolidated set of attributes for analysis
     Map ratings across products
 Parse and aggregate data from difference sources
     Credit and debit cards, product payments, deposits and savings
     Banking activity, browsing behavior, call logs, e-mails and chats
 Merge data into a single view
     A “fuzzy join” among data sources
     Structure and normalize attributes
     Sentiment analysis, pattern recognition


12
                            Copyright 2010 Cloudera Inc. All rights reserved
Use Case: Sentiment Analysis

 Internet generates a lot of chatter about brands
    Understanding what’s being said is crucial to protecting brand value
    Facebook, Twitter generate a lot of data for a global top brand
 Capturing and Processing direct feedback
    Better engagement and alerting via Sentiment Analysis
    Not yet ready for fully automated customer service
  Hadoop handles the diverse data types and processing
    Sources of data changing and semantics continuously evolving
    Sophistication of algorithms is improving daily




13
                           Copyright 2010 Cloudera Inc. All rights reserved
Journey of CDH Users

Discover the Benefits                                  Deploy                             Subscribe to
 of Apache Hadoop                                       CDH                            Cloudera Enterprise

 Gain the flexibility to store and mine   The fastest, surest path to success           Simplify and accelerate Apache
            all types of data                    with Apache Hadoop                          Hadoop deployment
                  •••                                        •••                                     •••
 Leverage the scale-out architecture      Stable, reliable version of Apache           Reduce adoption costs and risks
     for complex data analysis            Hadoop without the vendor lock-in                          •••
                  •••                      imposed by proprietary vendors
                                                                                        More effectively manage cluster
  Easily scale to meet growing data                          •••                                   resources
            requirements                       Integrates with your other                            •••
                  •••                       technology platforms ensuring               Leverage the experience of our
                                                 investment protection                             experts
  Avoid vendor lock-in with an open
         source technology




14
                                           ©2011 Cloudera, Inc. All Rights Reserved.
Get
 Hadoop
                http://www.cloudera.com/hadoop/




                                                      cloudera.com   twitter.com/
                                                                      cloudera

                                                                     facebook.com/
                                                                       cloudera




15
          ©2011 Cloudera, Inc. All Rights Reserved.

Contenu connexe

Tendances

Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecturesamaksh1982
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
sones company presentation
sones company presentationsones company presentation
sones company presentationsones GmbH
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopJoey Jablonski
 
Share point 2010 rm best practices
Share point 2010 rm best practicesShare point 2010 rm best practices
Share point 2010 rm best practicesMike Alsup
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoringSharon Burton
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
Vodafone xone fev142013v3 ext
Vodafone xone fev142013v3 extVodafone xone fev142013v3 ext
Vodafone xone fev142013v3 extInfiniteGraph
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on HadoopEMC
 

Tendances (19)

Taming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel ArchitectureTaming the Big Data Tsunami using Intel Architecture
Taming the Big Data Tsunami using Intel Architecture
 
SQL Server: Data Mining
SQL Server: Data MiningSQL Server: Data Mining
SQL Server: Data Mining
 
Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecture
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
sones company presentation
sones company presentationsones company presentation
sones company presentation
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
hadoop @ Ibmbigdata
hadoop @ Ibmbigdatahadoop @ Ibmbigdata
hadoop @ Ibmbigdata
 
Share point 2010 rm best practices
Share point 2010 rm best practicesShare point 2010 rm best practices
Share point 2010 rm best practices
 
10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring10 mistakes when moving to topic-based authoring
10 mistakes when moving to topic-based authoring
 
2012 06 hortonworks paris hug
2012 06 hortonworks paris hug2012 06 hortonworks paris hug
2012 06 hortonworks paris hug
 
P&O Analytics
P&O AnalyticsP&O Analytics
P&O Analytics
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Anti-social Databases
Anti-social DatabasesAnti-social Databases
Anti-social Databases
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Vodafone xone fev142013v3 ext
Vodafone xone fev142013v3 extVodafone xone fev142013v3 ext
Vodafone xone fev142013v3 ext
 
Analytics on Hadoop
Analytics on HadoopAnalytics on Hadoop
Analytics on Hadoop
 
Dw concepts
Dw conceptsDw concepts
Dw concepts
 

Similaire à Boston HUG - Cloudera presentation

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Bb3061 bess systems of record sv
Bb3061 bess systems of record svBb3061 bess systems of record sv
Bb3061 bess systems of record svCharlie Bess
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Jeffrey T. Pollock
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Cloudera, Inc.
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01eimhee
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopCloudera, Inc.
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_pointsAdam Muise
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Sap Bi OnDemand Overview
Sap Bi OnDemand OverviewSap Bi OnDemand Overview
Sap Bi OnDemand OverviewJohnMeadows_SAP
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 

Similaire à Boston HUG - Cloudera presentation (20)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Bb3061 bess systems of record sv
Bb3061 bess systems of record svBb3061 bess systems of record sv
Bb3061 bess systems of record sv
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 
2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points2013 march 26_thug_etl_cdc_talking_points
2013 march 26_thug_etl_cdc_talking_points
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Sap Bi OnDemand Overview
Sap Bi OnDemand OverviewSap Bi OnDemand Overview
Sap Bi OnDemand Overview
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 

Dernier

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Boston HUG - Cloudera presentation

  • 1. PROFIT FROM ALL OF YOUR DATA February 2012 Hadoop in the Enterprise Adam Smieszny | Systems Engineer
  • 2. Agenda • Hadoop Overview • History of Hadoop • What is Hadoop • Hadoop in the Enterprise 2 ©2011 Cloudera, Inc. All Rights Reserved.
  • 3. Existing Data Management 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) Current Database Solutions are designed for structured data.  Optimized to answer known questions quickly  Schemas dictate form/context 5,000  Difficult to adapt to new data types and new questions  Expensive at Petabyte scale 0 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA 3 ©2011 Cloudera, Inc. All Rights Reserved.
  • 4. Why the Need for Hadoop? 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 1.8 trillion gigabytes of data was created in 2011…  More than 90% is unstructured data  Approx. 500 quadrillion files 5,000  Quantity doubles every 2 years More More Content Devices New & New Better Sources Info 0 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA Source: IDC 2011 4 ©2011 Cloudera, Inc. All Rights Reserved.
  • 5. The Origins of Hadoop Launches SQL support for Hadoop Open Source Open source web Publishes MapReduce MapReduce and Runs 4,000-node Hadoop wins Terabyte Releases CDH and crawler project created and GFS Paper HDFS project created Hadoop cluster sort benchmark Cloudera Enterprise by Doug Cutting by Doug Cutting 2002 2007 2012 5 ©2011 Cloudera, Inc. All Rights Reserved.
  • 6. What is Apache Hadoop? CORE HADOOP COMPONENTS Hadoop is a platform for data storage and processing that is… Hadoop MapReduce Distributed File  Scalable System (HDFS)  Fault tolerant  Open source File Sharing & Data Protection Across Distributed Computing Across Physical Servers Physical Servers Flexibility Scalability Low Cost  A single repository for storing  Scale-out architecture divides  Can be deployed on commodity processing & analyzing any type workloads across multiple hardware of data nodes  Open source platform guards  Not bound by a single schema  Flexible file system eliminates against vendor lock ETL bottlenecks 6 ©2011 Cloudera, Inc. All Rights Reserved.
  • 7. What is CDH? Cloudera’s Distribution Including Apache Hadoop (CDH) is an enterprise-ready distribution of Hadoop that is…  100% Apache open source  Contains all components needed for deployment  Fully documented and supported  Released on a reliable schedule Fastest Path to Success Stable and Reliable Community Driven  No need to write your own scripts or  Extensive Cloudera QA systems,  Incorporates only main-line do integration testing on different software & processes components from the Apache components Hadoop ecosystem – no forks or  Tested & run in production at scale proprietary underpinnings  Works with a wide range of operating  Proven at scale in dozens of systems, hardware, databases and  FREE enterprise environments data warehouses 7 ©2011 Cloudera, Inc. All Rights Reserved.
  • 8. CDH & Enterprise Ecosystem Drivers, language enhancements, testing File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK Sqoop Workflow APACHE OOZIE Scheduling APACHE OOZIE Metadata APACHE HIVE frame- work, Languages / Compilers More adapters Data Integration APACHE PIG, APACHE HIVE Fast Read/Write coming… Access APACHE FLUME, APACHE SQOOP APACHE HBASE Coordination APACHE ZOOKEEPER Packaging, testing 8
  • 9. Hadoop / RDBMS Use Cases Create context Analyze unstructured data (classification, text mining) Parse, aggregate Analyze, report semi-structured data Active archival Analyze, report Long running queries structured data Slide borrowed from Krishnan Parasuraman presentation at Enzee’11 9 Copyright 2011 Cloudera Inc. All rights reserved
  • 10. Hadoop in Production How Apache Hadoop fits into your existing infrastructure. OPERATORS ENGINEERS ANALYSTS BUSINESS USERS CUSTOMERS Management Enterprise Web IDE’s BI / Analytics Tools Reporting Application Enterprise Data Warehouse Low-Latency Serving Systems Relational Logs Files Web Data Data 10 ©2011 Cloudera, Inc. All Rights Reserved.
  • 11. Hadoop Use Cases Use Case Application Industry Application Use Case Social Network Analysis Web Clickstream Sessionization Content Optimization Media Clickstream Sessionization ADVANCED ANALYTICS DATA PROCESSING Network Analytics Telco Mediation Loyalty & Promotions Retail Data Factory Analysis Fraud Analysis Financial Trade Reconciliation Entity Analysis Federal SIGINT Sequencing Analysis Bioinformatics Genome Mapping 11 ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Use Case: Customer Risk Build comprehensive data picture of customer side risk Publish a consolidated set of attributes for analysis Map ratings across products Parse and aggregate data from difference sources Credit and debit cards, product payments, deposits and savings Banking activity, browsing behavior, call logs, e-mails and chats Merge data into a single view A “fuzzy join” among data sources Structure and normalize attributes Sentiment analysis, pattern recognition 12 Copyright 2010 Cloudera Inc. All rights reserved
  • 13. Use Case: Sentiment Analysis Internet generates a lot of chatter about brands Understanding what’s being said is crucial to protecting brand value Facebook, Twitter generate a lot of data for a global top brand Capturing and Processing direct feedback Better engagement and alerting via Sentiment Analysis Not yet ready for fully automated customer service Hadoop handles the diverse data types and processing Sources of data changing and semantics continuously evolving Sophistication of algorithms is improving daily 13 Copyright 2010 Cloudera Inc. All rights reserved
  • 14. Journey of CDH Users Discover the Benefits Deploy Subscribe to of Apache Hadoop CDH Cloudera Enterprise Gain the flexibility to store and mine The fastest, surest path to success Simplify and accelerate Apache all types of data with Apache Hadoop Hadoop deployment ••• ••• ••• Leverage the scale-out architecture Stable, reliable version of Apache Reduce adoption costs and risks for complex data analysis Hadoop without the vendor lock-in ••• ••• imposed by proprietary vendors More effectively manage cluster Easily scale to meet growing data ••• resources requirements Integrates with your other ••• ••• technology platforms ensuring Leverage the experience of our investment protection experts Avoid vendor lock-in with an open source technology 14 ©2011 Cloudera, Inc. All Rights Reserved.
  • 15. Get Hadoop http://www.cloudera.com/hadoop/ cloudera.com twitter.com/ cloudera facebook.com/ cloudera 15 ©2011 Cloudera, Inc. All Rights Reserved.

Notes de l'éditeur

  1. FinSvc companies are realizing that they need to understand the fundamental risk in their customer base.All of a bank’s working capital originals with customers.Being able to better predict fluctuations can help them optimize how to put that capital to work.
  2. Much of the discussions about brands today happens in the social media.This not only impacts the companies perception but can have a direct influence on relationships with customers and the ability to sell.Hadoop is a natural solution for gathering and contextualizing discussions about company brands and products.