SlideShare a Scribd company logo
1 of 35
Download to read offline
Extending the Enterprise Data Warehouse with Hadoop

                 Robert Lancaster and Jonathan Seidman
                                  Chicago Data Summit
                                         April 26 | 2011
Who We Are


•  Robert Lancaster
  –  Solutions Architect, Hotel Supply Team
  –  rlancaster@orbitz.com
  –  @rob1lancaster
•  Jonathan Seidman
  –  Lead Engineer, Business Intelligence/Big Data Team
  –  Co-founder/organizer of Chicago Hadoop User Group
     (http://www.meetup.com/Chicago-area-Hadoop-User-
     Group-CHUG)
  –  jseidman@orbitz.com
  –  @jseidman



                                                          page 2
Launched: 2001, Chicago, IL




                              page 3
Why are we using Hadoop?

 Stop me if you’ve heard this before…




                                        page 4
On Orbitz alone we do millions of searches and transactions
 daily, which leads to hundreds of gigabytes of log data
 every day.




                                                        page 5
Hadoop provides us with efficient, economical,
 scalable, and reliable storage and processing of
 these large amounts of data.


 $ per TB




                                                    page 6
And…


Hadoop places no constraints on how data is
 processed.




                                              page 7
Before Hadoop




                page 8
With Hadoop




              page 9
Access to this non-transactional data enables a number of
applications…




                                                            page 10
Optimizing Hotel Search




                          page 11
Recommendations




                  page 12
Page Performance Tracking




                            page 13
Cache Analysis
100.00%
                        72% of queries are                                                        Queries
                        singletons and make up
90.00%                                                                                            Searches
                        nearly a third of total
                        search volume.
80.00%                                                                                            Reverse Running Total
                                                                                                  (Searches)
 71.67%
                                                                                                  Reverse Running Total
70.00%                                                                                            (Queries)


60.00%
                                                                                   A small number of
                                                                                   queries (3%) make
50.00%                                                                             up more than a third
                                                                                   of search volume.
40.00%
                                                           34.30%
 31.87%

30.00%


20.00%


10.00%
                                                          2.78%

 0.00%
          1     2   3       4     5     6     7   8   9      10     11   12   13     14     15    16      17   18     19   20




                                                                                                                           page 14
User Segmentation




                    page 15
All of this is great, but…


Most of these efforts are driven by development
 teams.


The challenge now is to unlock the value in this data
 by making it more available to the rest of the
 organization.




                                                        page 16
“Given the ubiquity of data in modern organizations, a data
warehouse can keep pace today only by being “magnetic”:
attracting all the data sources that crop up within an
organization regardless of data quality niceties.”*




             *MAD Skills: New Analysis Practices for Big Data


                                                              page 17
In a better world…	





                        page 18
Integrating Hadoop with the Enterprise Data Warehouse

                   Robert Lancaster and Jonathan Seidman
                                    Chicago Data Summit
                                           April 26 | 2011
The goal is a unified view of the data, allowing us to use
the power of our existing tools for reporting and analysis.




                                                              page 20
BI vendors are working on integration with Hadoop…




                                                     page 21
And one more reporting tool…




                               page 22
Example Processing Pipeline for Web Analytics Data




                                                     page 23
Aggregating data for import into Data Warehouse




                                                  page 24
Example Use Case: Beta Data Processing




                                     page 25
Example Use Case – Beta Data Processing




                                          page 26
Example Use Case – Beta Data Processing Output




                                                 page 27
Example Use Case: RCDC Processing




                                    page 28
Example Use Case – RCDC Processing




                                     page 29
Example Use Case: Click Data Processing




                                      page 30
Click Data Processing – Current DW Processing




Web
                                           Data
Server	

 Web                                       Cleansing
  Web
 Server	

   Logs   ETL             DW     (Stored            DW
  Servers
                                           procedure)

                          3 hours          2 hours            ~20%
                                                             original
                                                               data
                                                               size



                                                                   page 31
Click Data Processing – New Hadoop Processing




Web                        Data
Server	

 Web                       Cleansing
  Web
 Server	

   Logs   HDFS   (MapReduce)      DW
  Servers




                                                 page 32
Conclusions


•  Market is still immature, but Hadoop has already become a
   valuable business intelligence tool, and will become an
   increasingly important part of a BI infrastructure.
•  Hadoop won’t replace your EDW, but any organization with a
   large EDW should at least be exploring Hadoop as a
   complement to their BI infrastructure.
•  Use Hadoop to offload the time and resource intensive
   processing of large data sets so you can free up your data
   warehouse to serve user needs.
•  The challenge now is making Hadoop more accessible to non-
   developers. Vendors are addressing this, so expect rapid
   advancements in Hadoop accessibility.



                                                                page 33
Oh, and also…


•  Orbitz is looking for a Lead Engineer for the BI/Big Data team.
•  Go to http://careers.orbitz.com/ and search for IRC19035.




                                                                     page 34
References


•  MAD Skills: New Analysis Practices for Big Data, Jeffrey
   Cohen, Brian Dolan, Mark Dunlap, Joseph Hellerstein, and
   Caleb Welton, 2009




                                                              page 35

More Related Content

What's hot

Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IEdureka!
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1Abbas Maazallahi
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry PerspectiveCloudera, Inc.
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesDataWorks Summit
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?Hortonworks
 
Flexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinFlexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinDmitriy Ryaboy
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010Cloudera, Inc.
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopEdureka!
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInHari Shankar Sreekumar
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUG IT
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop BasicsSonal Tiwari
 

What's hot (20)

Introduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -IIntroduction to Big data & Hadoop -I
Introduction to Big data & Hadoop -I
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Hadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data ArchitecturesHadoop Powers Modern Enterprise Data Architectures
Hadoop Powers Modern Enterprise Data Architectures
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Why hadoop for data science?
Why hadoop for data science?Why hadoop for data science?
Why hadoop for data science?
 
Flexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant TwinFlexible In-Situ Indexing for Hadoop via Elephant Twin
Flexible In-Situ Indexing for Hadoop via Elephant Twin
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010NTT Data - Shinichi Yamada - Hadoop World 2010
NTT Data - Shinichi Yamada - Hadoop World 2010
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Data infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedInData infrastructure and Hadoop at LinkedIn
Data infrastructure and Hadoop at LinkedIn
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
 
VMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware HadoopVMUGIT UC 2013 - 08a VMware Hadoop
VMUGIT UC 2013 - 08a VMware Hadoop
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 

Viewers also liked

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
 
Your Path to Big Data Sucess
Your Path to Big Data SucessYour Path to Big Data Sucess
Your Path to Big Data SucessCloudera, Inc.
 
SITNL 2015 - Big Data Small Pockets
SITNL 2015 - Big Data Small PocketsSITNL 2015 - Big Data Small Pockets
SITNL 2015 - Big Data Small PocketsJan van Ansem
 
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIG
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIGBotPrize 2014 Results. Human-Like Bots Competition at IEEE CIG
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIGAccenture Analytics
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis
 
Integrating BI - Data Warehouse and Big Data
Integrating BI - Data Warehouse and Big DataIntegrating BI - Data Warehouse and Big Data
Integrating BI - Data Warehouse and Big DataAccenture Analytics
 
Edw Data Arc
Edw Data ArcEdw Data Arc
Edw Data ArcAlex CK
 
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonEDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonDaniel Upton
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHumza Naseer
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution Hortonworks
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseDataWorks Summit
 

Viewers also liked (20)

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
Your Path to Big Data Sucess
Your Path to Big Data SucessYour Path to Big Data Sucess
Your Path to Big Data Sucess
 
SITNL 2015 - Big Data Small Pockets
SITNL 2015 - Big Data Small PocketsSITNL 2015 - Big Data Small Pockets
SITNL 2015 - Big Data Small Pockets
 
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIG
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIGBotPrize 2014 Results. Human-Like Bots Competition at IEEE CIG
BotPrize 2014 Results. Human-Like Bots Competition at IEEE CIG
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
Trivadis TechEvent 2016 DWH Modernization – in the Age of Big Data by Gregor ...
 
Integrating BI - Data Warehouse and Big Data
Integrating BI - Data Warehouse and Big DataIntegrating BI - Data Warehouse and Big Data
Integrating BI - Data Warehouse and Big Data
 
Edw Data Arc
Edw Data ArcEdw Data Arc
Edw Data Arc
 
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel UptonEDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
EDW Data Model Storming for Integration of NoSQL and RDBMS by Daniel Upton
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
The EDW Ecosystem
The EDW EcosystemThe EDW Ecosystem
The EDW Ecosystem
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Hadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing ArchitecturesHadoop Integration into Data Warehousing Architectures
Hadoop Integration into Data Warehousing Architectures
 
Modernise your EDW - Data Lake
Modernise your EDW - Data LakeModernise your EDW - Data Lake
Modernise your EDW - Data Lake
 
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about..."Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...
 
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsBest Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 
Hadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data WarehouseHadoop and Enterprise Data Warehouse
Hadoop and Enterprise Data Warehouse
 

Similar to Extending Enterprise Data Warehouse Hadoop

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.
 
Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Jonathan Seidman
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Cloudera, Inc.
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitzRaghu Kashyap
 
Mass tlc presentation menninger
Mass tlc presentation    menningerMass tlc presentation    menninger
Mass tlc presentation menningerMassTLC
 
Mass tlc presentation menninger
Mass tlc presentation    menningerMass tlc presentation    menninger
Mass tlc presentation menningerMassTLC
 
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012Seattle Interactive Conference
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataSteve Watt
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop siliconsudipt
 

Similar to Extending Enterprise Data Warehouse Hadoop (20)

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011
 
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...
 
Gartner peer forum sept 2011 orbitz
Gartner peer forum sept 2011   orbitzGartner peer forum sept 2011   orbitz
Gartner peer forum sept 2011 orbitz
 
Mass tlc presentation menninger
Mass tlc presentation    menningerMass tlc presentation    menninger
Mass tlc presentation menninger
 
Mass tlc presentation menninger
Mass tlc presentation    menningerMass tlc presentation    menninger
Mass tlc presentation menninger
 
Big Data
Big DataBig Data
Big Data
 
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012
Konrad Feldman - Big Data and The Future of Advertising and Marketing - SIC2012
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010Hadoop and Hive at Orbitz, Hadoop World 2010
Hadoop and Hive at Orbitz, Hadoop World 2010
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
Steve Watt Presentation
Steve Watt PresentationSteve Watt Presentation
Steve Watt Presentation
 
Introduction Big data
Introduction Big data  Introduction Big data
Introduction Big data
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Présentation on radoop
Présentation on radoop   Présentation on radoop
Présentation on radoop
 
Big Data & Data Mining
Big Data & Data MiningBig Data & Data Mining
Big Data & Data Mining
 

More from Jonathan Seidman

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Jonathan Seidman
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_finalJonathan Seidman
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Jonathan Seidman
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Jonathan Seidman
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 

More from Jonathan Seidman (9)

Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019Foundations for Successful Data Projects – Strata London 2019
Foundations for Successful Data Projects – Strata London 2019
 
Foundations strata sf-2019_final
Foundations strata sf-2019_finalFoundations strata sf-2019_final
Foundations strata sf-2019_final
 
Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018Architecting a Next Gen Data Platform – Strata New York 2018
Architecting a Next Gen Data Platform – Strata New York 2018
 
Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018Architecting a Next Gen Data Platform – Strata London 2018
Architecting a Next Gen Data Platform – Strata London 2018
 
Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017Architecting a Next Generation Data Platform – Strata Singapore 2017
Architecting a Next Generation Data Platform – Strata Singapore 2017
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
 
Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011Distributed Data Analysis with Hadoop and R - OSCON 2011
Distributed Data Analysis with Hadoop and R - OSCON 2011
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Extending Enterprise Data Warehouse Hadoop

  • 1. Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  • 2. Who We Are •  Robert Lancaster –  Solutions Architect, Hotel Supply Team –  rlancaster@orbitz.com –  @rob1lancaster •  Jonathan Seidman –  Lead Engineer, Business Intelligence/Big Data Team –  Co-founder/organizer of Chicago Hadoop User Group (http://www.meetup.com/Chicago-area-Hadoop-User- Group-CHUG) –  jseidman@orbitz.com –  @jseidman page 2
  • 4. Why are we using Hadoop? Stop me if you’ve heard this before… page 4
  • 5. On Orbitz alone we do millions of searches and transactions daily, which leads to hundreds of gigabytes of log data every day. page 5
  • 6. Hadoop provides us with efficient, economical, scalable, and reliable storage and processing of these large amounts of data. $ per TB page 6
  • 7. And… Hadoop places no constraints on how data is processed. page 7
  • 8. Before Hadoop page 8
  • 9. With Hadoop page 9
  • 10. Access to this non-transactional data enables a number of applications… page 10
  • 12. Recommendations page 12
  • 14. Cache Analysis 100.00% 72% of queries are Queries singletons and make up 90.00% Searches nearly a third of total search volume. 80.00% Reverse Running Total (Searches) 71.67% Reverse Running Total 70.00% (Queries) 60.00% A small number of queries (3%) make 50.00% up more than a third of search volume. 40.00% 34.30% 31.87% 30.00% 20.00% 10.00% 2.78% 0.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 page 14
  • 15. User Segmentation page 15
  • 16. All of this is great, but… Most of these efforts are driven by development teams. The challenge now is to unlock the value in this data by making it more available to the rest of the organization. page 16
  • 17. “Given the ubiquity of data in modern organizations, a data warehouse can keep pace today only by being “magnetic”: attracting all the data sources that crop up within an organization regardless of data quality niceties.”* *MAD Skills: New Analysis Practices for Big Data page 17
  • 18. In a better world… page 18
  • 19. Integrating Hadoop with the Enterprise Data Warehouse Robert Lancaster and Jonathan Seidman Chicago Data Summit April 26 | 2011
  • 20. The goal is a unified view of the data, allowing us to use the power of our existing tools for reporting and analysis. page 20
  • 21. BI vendors are working on integration with Hadoop… page 21
  • 22. And one more reporting tool… page 22
  • 23. Example Processing Pipeline for Web Analytics Data page 23
  • 24. Aggregating data for import into Data Warehouse page 24
  • 25. Example Use Case: Beta Data Processing page 25
  • 26. Example Use Case – Beta Data Processing page 26
  • 27. Example Use Case – Beta Data Processing Output page 27
  • 28. Example Use Case: RCDC Processing page 28
  • 29. Example Use Case – RCDC Processing page 29
  • 30. Example Use Case: Click Data Processing page 30
  • 31. Click Data Processing – Current DW Processing Web Data Server Web Cleansing Web Server Logs ETL DW (Stored DW Servers procedure) 3 hours 2 hours ~20% original data size page 31
  • 32. Click Data Processing – New Hadoop Processing Web Data Server Web Cleansing Web Server Logs HDFS (MapReduce) DW Servers page 32
  • 33. Conclusions •  Market is still immature, but Hadoop has already become a valuable business intelligence tool, and will become an increasingly important part of a BI infrastructure. •  Hadoop won’t replace your EDW, but any organization with a large EDW should at least be exploring Hadoop as a complement to their BI infrastructure. •  Use Hadoop to offload the time and resource intensive processing of large data sets so you can free up your data warehouse to serve user needs. •  The challenge now is making Hadoop more accessible to non- developers. Vendors are addressing this, so expect rapid advancements in Hadoop accessibility. page 33
  • 34. Oh, and also… •  Orbitz is looking for a Lead Engineer for the BI/Big Data team. •  Go to http://careers.orbitz.com/ and search for IRC19035. page 34
  • 35. References •  MAD Skills: New Analysis Practices for Big Data, Jeffrey Cohen, Brian Dolan, Mark Dunlap, Joseph Hellerstein, and Caleb Welton, 2009 page 35