SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Trends and usage of
Apache Hadoop
Eric Baldeschwieler
CEO Hortonworks
Twitter: @jeric14, @hortonworks



January 2012




© Hortonworks Inc. 2011           Page 1
Agenda
• Define terms
  – What is Hadoop? Why does Hadoop matter?


• What drives Hadoop adoption?

• Observed Trends




     Architecting the Future of Big Data
                                              Page 2
     © Hortonworks Inc. 2011
Hortonworks Vision


 We believe that by 2015, more than
    half the world's data will be
   processed by Apache Hadoop


                         How to achieve that vision???
                                 Enable ecosystem around
                                 enterprise-viable platform.




                                                               Page 3
   © Hortonworks Inc. 2011
What is Apache Hadoop?
•  Solution for big data
    –  Deals with complexities of high
       volume, velocity & variety of data

•  Set of open source projects

•  Transforms commodity hardware
   into a service that:
    –  Stores petabytes of data reliably
    –  Allows huge distributed computations

•  Key attributes:
    –  Redundant and reliable (no data loss)
                                                One of the best examples of
    –  Extremely powerful                      open source driving innovation
    –  Batch processing centric                   and creating a market
    –  Easy to program distributed apps
    –  Runs on commodity hardware



                                                                          Page 4
         © Hortonworks Inc. 2011
Hortonworks Data Platform (HDP)
Key Components of “Standard Hadoop” Open Source Stack


     Core Apache Hadoop                                                      Related Hadoop Projects             Open APIs for:
                                                                                                                  •  Data Integration
                                                                                                                  •  Data Movement
                                                                                                                  •  App Job Management
                                                                                                                  •  System Management
                                                                            Pig                      Hive
                                                                         (Data Flow)                     (SQL)
                                             (Columnar NoSQL Store)
                                     HBase



                                                                                  MapReduce
        Zookeeper
                    (Coordination)




                                                                          (Distributed Programing Framework)



                                                                                       HCatalog
                                                                             (Table & Schema Management)



                                                                                 HDFS
                                                                      (Hadoop Distributed File System)




                                                                                                                                 Page 5
      © Hortonworks Inc. 2011
Big Data Trailblazers and Use Cases


                                                                data
                                analyzing web logs            analytics
                   advertising optimization        machine learning
                                                             mail anti-spam
                  text mining web search
                                                        content optimization
                   customer trend analysis
                                                 ad selection
             video & audio processing
                                                         data mining
                             user interest prediction
                                        social media




                                                                               Page 6
   © Hortonworks Inc. 2011
Yahoo!, Apache Hadoop & Hortonworks
http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop

      Yahoo! embraced Apache Hadoop, an open source platform, to
   crunch epic amounts of data using an army of dirt-cheap servers

                                         2006




                                  Hadoop at Yahoo!
                                    40K+ Servers
                                    170PB Storage
                                  5M+ Monthly Jobs
                                  1000+ Active Users



                                         2011




  Yahoo! spun off 22+ engineers into Hortonworks, a company focused on
    advancing open source Apache Hadoop for the broader market

                                                                         Page 7
        © Hortonworks Inc. 2011
What drives Hadoop adoption?




  Architecting the Future of Big Data
                                        Page 8
  © Hortonworks Inc. 2011
Market Drivers for Apache Hadoop
• Business drivers
  – High-value projects that require use of more data        Gartner predicts
                                                            800% data growth
  – Belief that there is great ROI in mastering big data    over next 5 years



• Financial drivers
  – Growing cost of data systems as percentage of IT spend
  – Cost advantage of commodity hardware + open source
  – Enables departmental-level big data strategies        80-90% of data
                                                            produced today
                                                            is unstructured

• Technical drivers
  – Existing solutions failing under growing requirements
       – 3Vs - Volume, velocity, variety
  – Proliferation of unstructured data

      © Hortonworks Inc. 2011                                           9
      © Hortonworks Inc. 2011
Every Market has Big Data
       Digital data is personal, everywhere, increasingly
      accessible, and will continue to grow exponentially




Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011.


                                                                                                                          Page 10
           © Hortonworks Inc. 2011
Broader Use Case Opportunities
Financial Services                            Healthcare
•  Detect/prevent fraud                       •  Patient monitoring
•  Model and manage risk                      •  Predictive modeling
•  Personalize banking/insurance products     •  Compliance, Archival, text search
•  Compliance, Archival, …                    •  Data driven research
Retail                                        Web / Social / Mobile
•  Behavior analysis                          •  Sentiment analysis
•  Cross selling, recommendation engines      •  Web log, image, and video analysis
•  Optimize pricing, placement, design        •  Personalization
•  Optimize inventory and distribution        •  Billing, Reporting, Network Analysis

Manufacturing                                 Government
•  Simulation, Analysis, Design               •  Detect/prevent fraud
•  Improve service via product sensor data    •  Security & Intelligence
•  “Digital factory” for lean manufacturing   •  Support open data initiatives



                                                                                     Page 11
           © Hortonworks Inc. 2011
Observed Trends




  Architecting the Future of Big Data
                                        Page 12
  © Hortonworks Inc. 2011
Trend: Agile Data
• The old way
  – Operational systems keep only current records, short history
  – Analytics systems keep only conformed / cleaned / digested data
  – Unstructured data locked away in operational silos
  – Archives offline
       – Inflexible, new questions require system redesigns

• The new trend
  – Keep raw data in Hadoop for a long time
  – Able to produce a new analytics view on-demand
  – Keep a new copy of data that was previously on in silos
  – Can directly do new reports, experiments at low incremental cost
  – New products / services can be added very quickly
  – Agile outcome justifies new infrastructure

      Architecting the Future of Big Data
                                                                  Page 13
      © Hortonworks Inc. 2011
Traditional Enterprise Data Architecture
  Data Silos
                                                                 Traditional Data Warehouses,
  Serving Applications                                                   BI & Analytics

Web       NoSQL                              Traditional ETL &
                                                                             Data      BI /
Serving   RDMS
                                …            Message buses             EDW
                                                                             Marts   Analytics




                          Serving   Social     Sensor          Text
                           Logs     Media       Data         Systems    …


                                    Unstructured Systems
                                                                                                 Page 14
          © Hortonworks Inc. 2011
Agile Data Architecture w/Hadoop
  Connecting All of Your Big Data
                                                                 Traditional Data Warehouses,
  Serving Applications                                                   BI & Analytics

Web       NoSQL                              Traditional ETL &
                                                                             Data      BI /
Serving   RDMS
                                …            Message buses             EDW
                                                                             Marts   Analytics




                                         EsTsL (s = Store)
                                         Custom Analytics




                          Serving   Social     Sensor          Text
                           Logs     Media       Data         Systems    …


                                    Unstructured Systems
                                                                                                 Page 15
          © Hortonworks Inc. 2011
Trend: Data driven development
• Limited runtime logic driven by huge lookup tables

• Data computed offline on Hadoop
  – Machine learning, other expensive computation offline
  – Personalization, classification, fraud, value analysis…


• Application development requires data science
  – Huge amounts of actually observed data key to modern services
  – Hadoop used as the science platform




      Architecting the Future of Big Data
                                                               Page 16
      © Hortonworks Inc. 2011
CASE STUDY
     YAHOO! HOMEPAGE

  •  Serving Maps	
                                        SCIENCE      »	
  Machine learning to build ever
            •  Users	
  -­‐	
  Interests	
                  HADOOP         better categorization models
  	
                                                        CLUSTER
  •  Five	
  Minute	
                        USER	
                         CATEGORIZATION	
  
       Produc7on	
                       BEHAVIOR	
                         MODELS	
  (weekly)	
  
  	
  
  •  Weekly	
                                              PRODUCTION
       Categoriza7on	
                                        HADOOP
                                                                        »	
  Identify user interests using
       models	
                          SERVING
                                                              CLUSTER
                                                                           Categorization models
                                            MAPS
                                 (every 5 minutes)
                                                              USER
                                                            BEHAVIOR



                                      SERVING	
  SYSTEMS                   ENGAGED	
  USERS


    Build	
  customized	
  home	
  pages	
  with	
  latest	
  data	
  (thousands	
  /	
  second)	
  
Copyright	
  Yahoo	
  2011	
                                                                                 17	
  
CASE STUDY
     YAHOO! HOMEPAGE


      Personalized
      for each visitor


      Result:
      twice the engagement

                                                       Recommended	
  links	
       News	
  Interests	
       Top	
  Searches	
  

                                                      +79% clicks                 +160% clicks +43% clicks
                                                      vs. randomly selected       vs. one size fits all     vs. editor selected




Copyright	
  Yahoo	
  2011	
  Hortonworks Inc. 2011
                         ©
                                                                                                                                    18	
  
Trend: Specialization of Data Systems
• Hadoop does not replace existing systems
  – It adds new capabilities to the enterprise
  – It can offload things that are not done efficiently in current systems
       – Especially in scale out situations


• Specialization of traditional data components
  – Use OLTP systems just for transactions
  – Use OLAP systems for interactive analysis


• Hadoop has LOTS of bandwidth to storage and CPU
  – Pull reporting out OLTP systems
  – Pull ELT out of OLAP systems


      Architecting the Future of Big Data
                                                                      Page 19
      © Hortonworks Inc. 2011
Hadoop and OLTP Systems
      MPP Processing of Online Transactions              Hadoop used to Process Reports
•    Mission critical                              •     Free up 50+% processing power for
•    Manages transactions & serves reports               transaction processing system
                                                   •     Significant cost savings due to commodity
                                                         nature of Hadoop


      Web
      Site
                               Transaction     Reports
                               Processing
      Web                       Systems
      Site
                                       $$$    Transaction
                                                 Logs
      Web
      Site




                                                                                             Page 20
             © Hortonworks Inc. 2011
Hadoop and OLAP Systems
 Fast loading, raw data staging, ELT &
           long-term archival                  Allow analysts to use tools they know
         (The Agile Data Zone)                (Take advantage of huge ecosystem of
                                                     BI and Analytics tooling)


Web


                       Hadoop                                       EDW
Mobile



Social
                                         Online
                                         Archival
Other
logs


                                                                               Page 21
         © Hortonworks Inc. 2011
TRENDS: Instrument Clouds of Things
 Clouds of things logging to Hadoop         HDFS + Map-Reduce
              Websites                          Or HBase
 Mobile phones, Enterprise devices…                 +
                                                 Analysis



                                Things
                                   Things




                                Things
                                   Things




                                Things
                                   Things




                                                                Page 22
      © Hortonworks Inc. 2011
Trend: Many POCs, Few Production Systems

• The problem
  – Hadoop is still a young technology
  – Hard to find knowledgeable staff
  – Integration with existing systems


• Hadoop market is maturing at speed
  – Emerging ecosystem of Hadoop platform solutions providers
  – Apache Hadoop continues to get better
  – Hadoop training and support available form several vendors




      Architecting the Future of Big Data
                                                                 Page 23
      © Hortonworks Inc. 2011
Growth in Hadoop Ecosystem
• Hardware vendors, Public Cloud (IAAS, PAAS)
  – Storage, Appliances, Preloaded commodity boxes, cloud

• Data Systems
  – All the major vendors announced Hadoop plans / products in 2011

• BI, Analytics and ETL
  – Hadoop integrations emerging

• Dedicated Hadoop Applications
  – Datamere, Karmashere, Platfora, …

• Systems Integrators
  – Regional and Global providers available

     Architecting the Future of Big Data
                                                                Page 24
     © Hortonworks Inc. 2011
Hadoop Continues to Improve
Apache community, including Hortonworks investing to improve Hadoop:
•  Make Hadoop an Open, Extensible, and Enterprise Viable Platform
•  Enable More Applications to Run on Apache Hadoop
                                                         “Hadoop.Beyond”
                                                      Platform actively evolving

                                       “Hadoop.Next”
                                        (Hadoop 0.23)
                                     HA, Next-gen HDFS & MapReduce
   “Hadoop.Now”                      Extension & Integration APIs
    (Hadoop 1.0)
Most stable version ever
HBase, security, WebHDFS




                                                                            Page 25
           © Hortonworks Inc. 2011
Hortonworks – Approachable Hadoop
•  Apache Hadoop Leadership
   –  Delivered every major release since 0.1
   –  Driving innovation across entire stack
   –  Experience managing world’s largest
      deployment
   –  Access to Yahoo’s 1,000+ Hadoop users
      and 40k+ nodes for testing, QA, etc.


•  Business Focus
   –  Provide 100% open source product
        –  Hortonworks Data Platform                Expert Role-based Training

   –  Help customers and partners overcome
      Hadoop knowledge gaps

                                                Full Lifecycle Support and Services
   –  Help organizations successfully develop
      and deploy solutions based on Hadoop
                                                 Evaluate       Pilot      Production


          Architecting the Future of Big Data
                                                                                 Page 26
          © Hortonworks Inc. 2011
Trend: Finding More Value Over Time
• Hadoop is usually brought in to solve a specific
  problem
  – Build seach indexes for Yahoo
  – Manage web site logs for Facebook
  – Users using EC2 to do data processing at Amazon
  – Simple reporting when existing tools don’t scale


• Once your data is in Hadoop more users find value

• Once you have Hadoop, folks add more data




     Architecting the Future of Big Data
                                                       Page 27
     © Hortonworks Inc. 2011
Thank You! Questions?
Eric Baldeschwieler
@jeric14 @hortonworks




                               Page 28
     © Hortonworks Inc. 2011

Contenu connexe

Tendances

SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunk
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - RangerIsheeta Sanghi
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf8840VinayShelke
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Session10-PHP Misconfiguration
Session10-PHP MisconfigurationSession10-PHP Misconfiguration
Session10-PHP Misconfigurationzakieh alizadeh
 
初學者都該了解的 HTTP 通訊協定基礎
初學者都該了解的 HTTP 通訊協定基礎初學者都該了解的 HTTP 通訊協定基礎
初學者都該了解的 HTTP 通訊協定基礎Will Huang
 

Tendances (20)

File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
SplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMwareSplunkLive! Splunk App for VMware
SplunkLive! Splunk App for VMware
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Apache Hadoop Security - Ranger
Apache Hadoop Security - RangerApache Hadoop Security - Ranger
Apache Hadoop Security - Ranger
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Splunk overview
Splunk overviewSplunk overview
Splunk overview
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Introduction to sqoop
Introduction to sqoopIntroduction to sqoop
Introduction to sqoop
 
Google Dorks
Google DorksGoogle Dorks
Google Dorks
 
Session10-PHP Misconfiguration
Session10-PHP MisconfigurationSession10-PHP Misconfiguration
Session10-PHP Misconfiguration
 
初學者都該了解的 HTTP 通訊協定基礎
初學者都該了解的 HTTP 通訊協定基礎初學者都該了解的 HTTP 通訊協定基礎
初學者都該了解的 HTTP 通訊協定基礎
 

Similaire à Hadoop Trends

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Hortonworks
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondTeradata Aster
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopPOSSCON
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsHortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopHortonworks
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionHortonworks
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetupRoby Chen
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataHortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 

Similaire à Hadoop Trends (20)

Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Introduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for WindowsIntroduction to Hortonworks Data Platform for Windows
Introduction to Hortonworks Data Platform for Windows
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Create a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache HadoopCreate a Smarter Data Lake with HP Haven and Apache Hadoop
Create a Smarter Data Lake with HP Haven and Apache Hadoop
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Enterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the UnionEnterprise Apache Hadoop: State of the Union
Enterprise Apache Hadoop: State of the Union
 
Hadoop for shanghai dev meetup
Hadoop for shanghai dev meetupHadoop for shanghai dev meetup
Hadoop for shanghai dev meetup
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - WebinarHortonworks and Platfora in Financial Services - Webinar
Hortonworks and Platfora in Financial Services - Webinar
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 

Plus de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Plus de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Hadoop Trends

  • 1. Trends and usage of Apache Hadoop Eric Baldeschwieler CEO Hortonworks Twitter: @jeric14, @hortonworks January 2012 © Hortonworks Inc. 2011 Page 1
  • 2. Agenda • Define terms – What is Hadoop? Why does Hadoop matter? • What drives Hadoop adoption? • Observed Trends Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Hortonworks Vision We believe that by 2015, more than half the world's data will be processed by Apache Hadoop How to achieve that vision??? Enable ecosystem around enterprise-viable platform. Page 3 © Hortonworks Inc. 2011
  • 4. What is Apache Hadoop? •  Solution for big data –  Deals with complexities of high volume, velocity & variety of data •  Set of open source projects •  Transforms commodity hardware into a service that: –  Stores petabytes of data reliably –  Allows huge distributed computations •  Key attributes: –  Redundant and reliable (no data loss) One of the best examples of –  Extremely powerful open source driving innovation –  Batch processing centric and creating a market –  Easy to program distributed apps –  Runs on commodity hardware Page 4 © Hortonworks Inc. 2011
  • 5. Hortonworks Data Platform (HDP) Key Components of “Standard Hadoop” Open Source Stack Core Apache Hadoop Related Hadoop Projects Open APIs for: •  Data Integration •  Data Movement •  App Job Management •  System Management Pig Hive (Data Flow) (SQL) (Columnar NoSQL Store) HBase MapReduce Zookeeper (Coordination) (Distributed Programing Framework) HCatalog (Table & Schema Management) HDFS (Hadoop Distributed File System) Page 5 © Hortonworks Inc. 2011
  • 6. Big Data Trailblazers and Use Cases data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media Page 6 © Hortonworks Inc. 2011
  • 7. Yahoo!, Apache Hadoop & Hortonworks http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop Yahoo! embraced Apache Hadoop, an open source platform, to crunch epic amounts of data using an army of dirt-cheap servers 2006 Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users 2011 Yahoo! spun off 22+ engineers into Hortonworks, a company focused on advancing open source Apache Hadoop for the broader market Page 7 © Hortonworks Inc. 2011
  • 8. What drives Hadoop adoption? Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Market Drivers for Apache Hadoop • Business drivers – High-value projects that require use of more data Gartner predicts 800% data growth – Belief that there is great ROI in mastering big data over next 5 years • Financial drivers – Growing cost of data systems as percentage of IT spend – Cost advantage of commodity hardware + open source – Enables departmental-level big data strategies 80-90% of data produced today is unstructured • Technical drivers – Existing solutions failing under growing requirements – 3Vs - Volume, velocity, variety – Proliferation of unstructured data © Hortonworks Inc. 2011 9 © Hortonworks Inc. 2011
  • 10. Every Market has Big Data Digital data is personal, everywhere, increasingly accessible, and will continue to grow exponentially Source: McKinsey & Company report. Big data: The next frontier for innovation, competition, and productivity. May 2011. Page 10 © Hortonworks Inc. 2011
  • 11. Broader Use Case Opportunities Financial Services Healthcare •  Detect/prevent fraud •  Patient monitoring •  Model and manage risk •  Predictive modeling •  Personalize banking/insurance products •  Compliance, Archival, text search •  Compliance, Archival, … •  Data driven research Retail Web / Social / Mobile •  Behavior analysis •  Sentiment analysis •  Cross selling, recommendation engines •  Web log, image, and video analysis •  Optimize pricing, placement, design •  Personalization •  Optimize inventory and distribution •  Billing, Reporting, Network Analysis Manufacturing Government •  Simulation, Analysis, Design •  Detect/prevent fraud •  Improve service via product sensor data •  Security & Intelligence •  “Digital factory” for lean manufacturing •  Support open data initiatives Page 11 © Hortonworks Inc. 2011
  • 12. Observed Trends Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Trend: Agile Data • The old way – Operational systems keep only current records, short history – Analytics systems keep only conformed / cleaned / digested data – Unstructured data locked away in operational silos – Archives offline – Inflexible, new questions require system redesigns • The new trend – Keep raw data in Hadoop for a long time – Able to produce a new analytics view on-demand – Keep a new copy of data that was previously on in silos – Can directly do new reports, experiments at low incremental cost – New products / services can be added very quickly – Agile outcome justifies new infrastructure Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. Traditional Enterprise Data Architecture Data Silos Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 14 © Hortonworks Inc. 2011
  • 15. Agile Data Architecture w/Hadoop Connecting All of Your Big Data Traditional Data Warehouses, Serving Applications BI & Analytics Web NoSQL Traditional ETL & Data BI / Serving RDMS … Message buses EDW Marts Analytics EsTsL (s = Store) Custom Analytics Serving Social Sensor Text Logs Media Data Systems … Unstructured Systems Page 15 © Hortonworks Inc. 2011
  • 16. Trend: Data driven development • Limited runtime logic driven by huge lookup tables • Data computed offline on Hadoop – Machine learning, other expensive computation offline – Personalization, classification, fraud, value analysis… • Application development requires data science – Huge amounts of actually observed data key to modern services – Hadoop used as the science platform Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. CASE STUDY YAHOO! HOMEPAGE •  Serving Maps   SCIENCE »  Machine learning to build ever •  Users  -­‐  Interests   HADOOP better categorization models   CLUSTER •  Five  Minute   USER   CATEGORIZATION   Produc7on   BEHAVIOR   MODELS  (weekly)     •  Weekly   PRODUCTION Categoriza7on   HADOOP »  Identify user interests using models   SERVING CLUSTER Categorization models MAPS (every 5 minutes) USER BEHAVIOR SERVING  SYSTEMS ENGAGED  USERS Build  customized  home  pages  with  latest  data  (thousands  /  second)   Copyright  Yahoo  2011   17  
  • 18. CASE STUDY YAHOO! HOMEPAGE Personalized for each visitor Result: twice the engagement Recommended  links   News  Interests   Top  Searches   +79% clicks +160% clicks +43% clicks vs. randomly selected vs. one size fits all vs. editor selected Copyright  Yahoo  2011  Hortonworks Inc. 2011 © 18  
  • 19. Trend: Specialization of Data Systems • Hadoop does not replace existing systems – It adds new capabilities to the enterprise – It can offload things that are not done efficiently in current systems – Especially in scale out situations • Specialization of traditional data components – Use OLTP systems just for transactions – Use OLAP systems for interactive analysis • Hadoop has LOTS of bandwidth to storage and CPU – Pull reporting out OLTP systems – Pull ELT out of OLAP systems Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Hadoop and OLTP Systems MPP Processing of Online Transactions Hadoop used to Process Reports •  Mission critical •  Free up 50+% processing power for •  Manages transactions & serves reports transaction processing system •  Significant cost savings due to commodity nature of Hadoop Web Site Transaction Reports Processing Web Systems Site $$$ Transaction Logs Web Site Page 20 © Hortonworks Inc. 2011
  • 21. Hadoop and OLAP Systems Fast loading, raw data staging, ELT & long-term archival Allow analysts to use tools they know (The Agile Data Zone) (Take advantage of huge ecosystem of BI and Analytics tooling) Web Hadoop EDW Mobile Social Online Archival Other logs Page 21 © Hortonworks Inc. 2011
  • 22. TRENDS: Instrument Clouds of Things Clouds of things logging to Hadoop HDFS + Map-Reduce Websites Or HBase Mobile phones, Enterprise devices… + Analysis Things Things Things Things Things Things Page 22 © Hortonworks Inc. 2011
  • 23. Trend: Many POCs, Few Production Systems • The problem – Hadoop is still a young technology – Hard to find knowledgeable staff – Integration with existing systems • Hadoop market is maturing at speed – Emerging ecosystem of Hadoop platform solutions providers – Apache Hadoop continues to get better – Hadoop training and support available form several vendors Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. Growth in Hadoop Ecosystem • Hardware vendors, Public Cloud (IAAS, PAAS) – Storage, Appliances, Preloaded commodity boxes, cloud • Data Systems – All the major vendors announced Hadoop plans / products in 2011 • BI, Analytics and ETL – Hadoop integrations emerging • Dedicated Hadoop Applications – Datamere, Karmashere, Platfora, … • Systems Integrators – Regional and Global providers available Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. Hadoop Continues to Improve Apache community, including Hortonworks investing to improve Hadoop: •  Make Hadoop an Open, Extensible, and Enterprise Viable Platform •  Enable More Applications to Run on Apache Hadoop “Hadoop.Beyond” Platform actively evolving “Hadoop.Next” (Hadoop 0.23) HA, Next-gen HDFS & MapReduce “Hadoop.Now” Extension & Integration APIs (Hadoop 1.0) Most stable version ever HBase, security, WebHDFS Page 25 © Hortonworks Inc. 2011
  • 26. Hortonworks – Approachable Hadoop •  Apache Hadoop Leadership –  Delivered every major release since 0.1 –  Driving innovation across entire stack –  Experience managing world’s largest deployment –  Access to Yahoo’s 1,000+ Hadoop users and 40k+ nodes for testing, QA, etc. •  Business Focus –  Provide 100% open source product –  Hortonworks Data Platform Expert Role-based Training –  Help customers and partners overcome Hadoop knowledge gaps Full Lifecycle Support and Services –  Help organizations successfully develop and deploy solutions based on Hadoop Evaluate Pilot Production Architecting the Future of Big Data Page 26 © Hortonworks Inc. 2011
  • 27. Trend: Finding More Value Over Time • Hadoop is usually brought in to solve a specific problem – Build seach indexes for Yahoo – Manage web site logs for Facebook – Users using EC2 to do data processing at Amazon – Simple reporting when existing tools don’t scale • Once your data is in Hadoop more users find value • Once you have Hadoop, folks add more data Architecting the Future of Big Data Page 27 © Hortonworks Inc. 2011
  • 28. Thank You! Questions? Eric Baldeschwieler @jeric14 @hortonworks Page 28 © Hortonworks Inc. 2011