Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan Seidman & Rob Lancaster, Orbitz Worldwide

•

8 j'aime•1,126 vues

Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge now is in bridging the gap between the data warehouse and Hadoop. In this talk we’ll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.

Technologie Business

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Hadoop World 2011 November 8 | 2011

Who We Are ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

page Launched in 2001 Over 160 million bookings

In 2009… ,[object Object],[object Object],[object Object],page

The Problem… ,[object Object],page Transactional data (e.g. bookings) and aggregated Non-transactional data Data Warehouse Non-transactional Data (e.g. searches)

Hadoop Provided a Solution… page Data Warehouse Detailed non-transactional data (what every user sees, clicks, etc.) Hadoop Transactional data (e.g. bookings) and aggregated Non-transactional data

Deploying Hadoop Enabled Multiple Applications… page

But Brought New Challenges… ,[object Object],[object Object],page

In Early 2011… ,[object Object],[object Object],[object Object],page

A View Shared Beyond Orbitz… page “ We strongly believe that Hadoop is the nucleus of the next-generation cloud EDW…” *James Kobielus, Forrester Research, “ Hadoop, Is It Soup Yet?” “… but that promise is still three to five years from fruition.”*

Two Primary Ways We Use Hadoop to Complement the EDW ,[object Object],[object Object],page

ETL Example: Proposed Dimensional Model page Raw logs Hadoop Dimensional model

ETL Example: Click Data Processing page Web Server Logs ETL DW Data Cleansing (Stored procedure) DW Web Server Web Servers Several hours of processing ~20% original data size Current Processing in Data Warehouse

ETL Example: Click Data Processing ,[object Object],[object Object],[object Object],[object Object],page Web Server Logs Hadoop Data Cleansing (MapReduce) DW Web Server Web Servers Proposed Processing in Hadoop

Analysis Example: Geo-Targeting Ads ,[object Object],[object Object],[object Object],page

BI Vendors Are Working on Hadoop Integration page Both big (relatively)…

Example Processing Pipeline for Web Analytics Data page

Use Case – Selection Errors: Introduction ,[object Object],[object Object],[object Object],page

Use Case – Selection Errors: Processing page

Use Case – Selection Errors: Visualization page

Use Case – Beta Data: Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

Use Case – Beta Data: Visualization page

Use Case – RCDC: Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

Conclusions ,[object Object],[object Object],[object Object],page

Recommandé

Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman

Gartner peer forum sept 2011 orbitzRaghu Kashyap

Web analyticsandbigdata techweek2011Raghu Kashyap

Earley Executive Roundtable Summary - Data AnalyticsEarley Information Science

Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud

Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman

Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman

Recipes for Unlocking Value from Big DataFadi Yousuf

Recommandé

Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011Jonathan Seidman

Gartner peer forum sept 2011 orbitzRaghu Kashyap

Web analyticsandbigdata techweek2011Raghu Kashyap

Earley Executive Roundtable Summary - Data AnalyticsEarley Information Science

Benchmarking Digital Readiness: Moving at the Speed of the MarketApigee | Google Cloud

Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman

Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman

Recipes for Unlocking Value from Big DataFadi Yousuf

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon

Data lake pptSwarnaLatha177

Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company

BigData AnalyticsMayank Kumar Sharma

Splunk Business AnalyticsCleverDATA

Getting to Real-Time in a Multi-Model ArchitectureBenjamin Nussbaum

The Emerging Data Lake IT StrategyThomas Kelly, PMP

Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY

Making Big Data Easy for EveryoneCaserta

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics

Dataiku Flow and dctc - Berlin BuzzwordsDataiku

Service generated big data and big data-as-a-serviceJYOTIR MOY

Knowledge Graphs as a Data PlatformBenjamin Nussbaum

mapr_case_study_experianErni Susanti

ESGYN OverviewRajender K Salgam

SplunkLive! Splunk for Business AnalyticsSplunk

2020 Big Data & Analytics Maturity Survey ResultsAtScale

PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup

Big Data & Data ScienceBrijeshGoyani

Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul

Hadoop and the Data Warehouse: When to Use Which DataWorks Summit

ebay Affiliate Marketing Network Analytics IntroductionYun Liu

Contenu connexe

Tendances

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon

Data lake pptSwarnaLatha177

Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company

BigData AnalyticsMayank Kumar Sharma

Splunk Business AnalyticsCleverDATA

Getting to Real-Time in a Multi-Model ArchitectureBenjamin Nussbaum

The Emerging Data Lake IT StrategyThomas Kelly, PMP

Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY

Making Big Data Easy for EveryoneCaserta

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics

Dataiku Flow and dctc - Berlin BuzzwordsDataiku

Service generated big data and big data-as-a-serviceJYOTIR MOY

Knowledge Graphs as a Data PlatformBenjamin Nussbaum

mapr_case_study_experianErni Susanti

ESGYN OverviewRajender K Salgam

SplunkLive! Splunk for Business AnalyticsSplunk

2020 Big Data & Analytics Maturity Survey ResultsAtScale

PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup

Big Data & Data ScienceBrijeshGoyani

Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul

Tendances (20)

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016

Data lake ppt

Big Data Modeling and Analytic Patterns – Beyond Schema on Read

BigData Analytics

Splunk Business Analytics

Getting to Real-Time in a Multi-Model Architecture

The Emerging Data Lake IT Strategy

Data Lake Architecture – Modern Strategies & Approaches

Making Big Data Easy for Everyone

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...

Dataiku Flow and dctc - Berlin Buzzwords

Service generated big data and big data-as-a-service

Knowledge Graphs as a Data Platform

mapr_case_study_experian

ESGYN Overview

SplunkLive! Splunk for Business Analytics

2020 Big Data & Analytics Maturity Survey Results

PASS Summit Data Storytelling with R Power BI and AzureML

Big Data & Data Science

Bitkom Cray presentation - on HPC affecting big data analytics in FS

En vedette

Hadoop and the Data Warehouse: When to Use Which DataWorks Summit

ebay Affiliate Marketing Network Analytics IntroductionYun Liu

Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit

Slides efeito estufagilvaniamsilva

Hadoop and Enterprise Data WarehouseDataWorks Summit

Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop ProfessionalsCloudera, Inc.

Iracema slide prontoEmanuelle Frazão

Hadoop and Your Data WarehouseCaserta

Apresentação de slides prontocandidacbertao

En vedette (9)

Hadoop and the Data Warehouse: When to Use Which

ebay Affiliate Marketing Network Analytics Introduction

Logical Data Warehouse: How to Build a Virtualized Data Services Layer

Slides efeito estufa

Hadoop and Enterprise Data Warehouse

Best Practices for the Hadoop Data Warehouse: EDW 101 for Hadoop Professionals

Iracema slide pronto

Hadoop and Your Data Warehouse

Apresentação de slides pronto

Similaire à Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan Seidman & Rob Lancaster, Orbitz Worldwide

Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.

Hadoop Demo eConvergencekvnnrao

Hadoop India Summit, Feb 2011 - InformaticaSanjeev Kumar

Hadoop and the Relational Database: The Best of Both WorldsInside Analysis

Building a Big Data SolutionJames Serra

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services

Creating a Next-Generation Big Data ArchitecturePerficient, Inc.

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl

Introduction To Big Data & HadoopBlackvard

Big data an elephant business opportunitiesBigdata Meetup Kochi

Big data and data miningEmran Hossain

Big Data and HadoopMaulikLakhani

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa

Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman

Finding business value in Big DataJames Serra

NoSQL Type, Bigdata, and AnalyticsSandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW

Growth hacking in the age of DataDaniel Saito

TSE_Pres12.pptxssuseracaaae2

Hadoop for Finance - sample chapterRajiv Tiwari

the Data World DistilledRTTS

Similaire à Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan Seidman & Rob Lancaster, Orbitz Worldwide (20)

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop

Hadoop Demo eConvergence

Hadoop India Summit, Feb 2011 - Informatica

Hadoop and the Relational Database: The Best of Both Worlds

Building a Big Data Solution

AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)

Creating a Next-Generation Big Data Architecture

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02

Introduction To Big Data & Hadoop

Big data an elephant business opportunities

Big data and data mining

Big Data and Hadoop

FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)

Hadoop and Hive at Orbitz, Hadoop World 2010

Finding business value in Big Data

NoSQL Type, Bigdata, and Analytics

Growth hacking in the age of Data

TSE_Pres12.pptx

Hadoop for Finance - sample chapter

the Data World Distilled

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Dernier

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

A Call to Action for Generative AI in 2024Results

Histor y of HAM Radio presentation slidevu2urc

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

A Domino Admins Adventures (Engage 2024)Gabriella Davis

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Understanding the Laravel MVC ArchitecturePixlogix Infotech

GenCyber Cyber Security Day PresentationMichael W. Hawkins

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

How to convert PDF to text with Nanonetsnaman860154

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Dernier (20)

Salesforce Community Group Quito, Salesforce 101

CNv6 Instructor Chapter 6 Quality of Service

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

A Call to Action for Generative AI in 2024

Histor y of HAM Radio presentation slide

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Maximizing Board Effectiveness 2024 Webinar.pptx

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Handwritten Text Recognition for manuscripts and early printed texts

A Domino Admins Adventures (Engage 2024)

SQL Database Design For Developers at php[tek] 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Unblocking The Main Thread Solving ANRs and Frozen Frames

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Understanding the Laravel MVC Architecture

GenCyber Cyber Security Day Presentation

08448380779 Call Girls In Civil Lines Women Seeking Men

How to convert PDF to text with Nanonets

🐬 The future of MySQL is Postgres 🐘

Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan Seidman & Rob Lancaster, Orbitz Worldwide

1. Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster and Jonathan Seidman Hadoop World 2011 November 8 | 2011

3. page Launched in 2001 Over 160 million bookings

4. Some History… page

7. Hadoop Provided a Solution… page Data Warehouse Detailed non-transactional data (what every user sees, clicks, etc.) Hadoop Transactional data (e.g. bookings) and aggregated Non-transactional data

8. Deploying Hadoop Enabled Multiple Applications… page

9. And Useful Analyses… page

10.

11.

12. A View Shared Beyond Orbitz… page “ We strongly believe that Hadoop is the nucleus of the next-generation cloud EDW…” *James Kobielus, Forrester Research, “ Hadoop, Is It Soup Yet?” “… but that promise is still three to five years from fruition.”*

13.

14. ETL Example: Proposed Dimensional Model page Raw logs Hadoop Dimensional model

15. ETL Example: Click Data Processing page Web Server Logs ETL DW Data Cleansing (Stored procedure) DW Web Server Web Servers Several hours of processing ~20% original data size Current Processing in Data Warehouse

16.

17.

18. BI Vendors Are Working on Hadoop Integration page Both big (relatively)…

19. And small… page

20. Example Processing Pipeline for Web Analytics Data page

21. page Example Use Case: Selection Errors

22.

23. Use Case – Selection Errors: Processing page

24. Use Case – Selection Errors: Visualization page

25. page Example Use Case: Beta Data

26.

27. Use Case – Beta Data Processing page

28. Use Case – Beta Data: Visualization page

29. page Example Use Case: RCDC

30.

31. Use Case – RCDC: Processing page

32. Use Case – RCDC: Visualization page

33.

34.

Notes de l'éditeur

Most people think of orbitz.com, but Orbitz Worldwide is really a global portfolio of leading online travel consumer brands including Orbitz, Cheaptickets, The Away Network, ebookers and HotelClub. Orbitz also provides business to business services - Orbitz Worldwide Distribution provides hotel booking capabilities to a number of leading carriers such as Amtrak, Delta, LAN, KLM, Air France and Orbitz for Business provides corporate travel services to a number of Fortune 100 clients Orbitz started in 1999, orbitz site launched in 2001.
The initial motivation was to solve a particular business problem. Orbitz wanted to be able to use intelligent algorithms to optimize various site functions, for example optimizing hotel search by showing consumers hotels that more closely match their preferences, leading to more bookings.
Improving hotel search requires access to such data as which hotels users saw in search results, which hotels they clicked on, and which hotels were actually booked. Much of this data was available in web analytics logs.
Our data warehouse contains a full record of all transactions, but much of the required non-transactional data was either not stored, or stored in aggregated fields.
Hadoop is being used to analyze and optimize cache performance – in this case hotel rate cache. This type of analysis will allow us to ensure that more requests can be served from the cache, optimizing the user experience and improving our “look-to-book” metrics. Hadoop is used to crunch data for input to a system to recommend products to users. Although we use third-party sites to monitor site performance, Hadoop allows the front end team to provide detailed reports on page download performance, providing valuable trending data not available from other sources.
1 st viz is just plot of the lat/long of hotel bookings for the month, illustrating the global nature of the business. 2 nd viz is a simple price prediction for air fares Data is used for analysis of user segments, which can drive personalization. This chart shows that Safari users click on hotels with higher mean and median prices as opposed to other users. This is just a handful of examples of how Hadoop is driving business value.
Recently received an email from a user seeking access to Hive. Sent him a detailed email with info on accessing Hive, etc. Received an email back basically saying “you lost me at ssh”.
Making part of BI team probably makes Orbitz unique, but it’s a reflection of the importance of big data to driving BI for the company.
probably both of these are common use cases at other companies employing Hadoop with an EDW.
Hadoop will be used to transform web analytics data into a dimensional model, allowing multiple business unit to generate reports providing valuable intelligence to improve business results.
Processing of click data gathered by web servers. This click data contains marketing info. data cleansing step is done inside data warehouse using a stored procedure further downstream processing is done to generate final data sets for reporting Although this processing generates the required user reports, this process consumes considerable time and resources on the data warehouse, consuming resources that could be used for reports, queries, etc.
ETL step is eliminated, instead raw logs will be uploaded to HDFS which is a much faster process Moving the data cleansing to MapReduce will allow us to take advantage of Hadoop’s efficiencies and greatly speed up the processing. Moves the “heavy lifting” of processing the relatively large data sets to Hadoop, and takes advantage of Hadoop’s efficiencies.
Data was apparently available in the DW, but wasn’t modeled to enable efficient querying. Points up a strength of Hadoop, which is that it places no constraints on how data is processed.
This provides an example of a typical processing flow for the large volumes of non-transactional data we’re collecting. This processing allows us to convert large volumes of un-structured data into structured data that can be queried, extracted, etc. for further processing.
This type of processing also allows us summarize large volumes of data into a data set that can be exported to the data warehouse, allowing us to query and report on that data using all of our standard BI tools.