Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011

•

8 j'aime•6,045 vues

Jonathan Seidman

Technologie Business

Who We Are ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

page Launched in 2001, Chicago, IL Over 160 million bookings

What is Hadoop? ,[object Object],[object Object],[object Object],[object Object],page

Why Hadoop? ,[object Object],page $ per TB

Why We Started Using Hadoop page Optimizing hotel search…

Why We Started Using Hadoop ,[object Object],[object Object],page

The Problem… ,[object Object],page Transactional Data (e.g. bookings) Data Warehouse Non-transactional Data (e.g. searches)

Hadoop Was Selected as a Solution… page Transactional Data (e.g. bookings) Data Warehouse Non-Transactional Data (e.g. searches) Hadoop

Unfortunately… ,[object Object],[object Object],[object Object],page

Current Big Data Infrastructure Hadoop page MapReduce HDFS MapReduce Jobs (Java, Python, R/RHIPE) Analytic Tools (Hive, Pig) Data Warehouse (Greenplum) psql, gpload, Sqoop External Analytical Jobs (Java, R, etc.) Aggregated Data Aggregated Data

Hadoop Architecture Details ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

Deploying Hadoop Enabled Multiple Applications… page

But Brought New Challenges… ,[object Object],[object Object],page

In Early 2011… ,[object Object],[object Object],[object Object],[object Object],page

One More Use Case – Click Data Processing ,[object Object],page

Click Data Processing – Current Data Warehouse Processing page Web Server Logs ETL DW Data Cleansing (Stored procedure) DW Web Server Web Servers 3 hours 2 hours ~20% original data size

Click Data Processing – Proposed Hadoop Processing page Web Server Logs HDFS Data Cleansing (MapReduce) DW Web Server Web Servers

Lessons Learned ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],page

Lessons Learned ,[object Object],[object Object],[object Object],page

In the Near Future… ,[object Object],[object Object],[object Object],[object Object],page

What is Web Analytics? ,[object Object],[object Object],[object Object],[object Object]

Challenges ,[object Object],[object Object],[object Object],[object Object],[object Object]

continued…. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data Categories ,[object Object],[object Object],[object Object],[object Object],[object Object]

Web Analytics & Big Data ,[object Object],[object Object],[object Object]

Data Analysis Jobs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Centralized Decentralization Web Analytics team + SEO team + Hotel optimization team

Model for success ,[object Object],[object Object],[object Object]

Should everyone do this? ,[object Object],[object Object],[object Object],[object Object]

Recommandé

Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman

Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman

Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman

Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman

Earley Executive Roundtable Summary - Data AnalyticsEarley Information Science

Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman

Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Cloudera, Inc.

Big Data Analytics for Non-ProgrammersEdureka!

Recommandé

Extending the Data Warehouse with Hadoop - Hadoop world 2011Jonathan Seidman

Extending the EDW with Hadoop - Chicago Data Summit 2011Jonathan Seidman

Distributed Data Analysis with Hadoop and R - OSCON 2011Jonathan Seidman

Distributed Data Analysis with Hadoop and R - Strangeloop 2011Jonathan Seidman

Earley Executive Roundtable Summary - Data AnalyticsEarley Information Science

Hadoop and Hive at Orbitz, Hadoop World 2010Jonathan Seidman

Hadoop World 2011: Extending Enterprise Data Warehouse with Hadoop - Jonathan...Cloudera, Inc.

Big Data Analytics for Non-ProgrammersEdureka!

Introduction to Bigdata and HADOOP vinoth kumar

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Big Data Real Time ApplicationsDataWorks Summit

Hadoop,Big Data Analytics and MoreTrendwise Analytics

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon

BI, Hive or Big Data Analytics? Datameer

Emergent Distributed Data Storagehybrid cloud

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second ...Cloudera, Inc.

Big data introduction, Hadoop in detailsMahmoud Yassin

WhatisbigdataandwhylearnhadoopEdureka!

Gartner peer forum sept 2011 orbitzRaghu Kashyap

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics

Why hadoop for data science?Hortonworks

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman

Big Data simplifiedPraveen Hanchinal

Hadoop: An Industry PerspectiveCloudera, Inc.

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

Big Data and Hadoop BasicsSonal Tiwari

Plantilla proyecto e_twinning gustavo espinosa villanuevaGustavo de Talavera

Vladimir_Suvorov_Big_dataIrina Krylova

Contenu connexe

Tendances

Introduction to Bigdata and HADOOP vinoth kumar

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

Big Data Real Time ApplicationsDataWorks Summit

Hadoop,Big Data Analytics and MoreTrendwise Analytics

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon

BI, Hive or Big Data Analytics? Datameer

Emergent Distributed Data Storagehybrid cloud

Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie

Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...Data Con LA

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second ...Cloudera, Inc.

Big data introduction, Hadoop in detailsMahmoud Yassin

WhatisbigdataandwhylearnhadoopEdureka!

Gartner peer forum sept 2011 orbitzRaghu Kashyap

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics

Why hadoop for data science?Hortonworks

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012Jonathan Seidman

Big Data simplifiedPraveen Hanchinal

Hadoop: An Industry PerspectiveCloudera, Inc.

Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz

Big Data and Hadoop BasicsSonal Tiwari

Tendances (20)

Introduction to Bigdata and HADOOP

Big Data Analytics with Hadoop, MongoDB and SQL Server

Big Data Real Time Applications

Hadoop,Big Data Analytics and More

The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016

BI, Hive or Big Data Analytics?

Emergent Distributed Data Storage

Rob peglar introduction_analytics _big data_hadoop

Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second ...

Big data introduction, Hadoop in details

Whatisbigdataandwhylearnhadoop

Gartner peer forum sept 2011 orbitz

The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...

Why hadoop for data science?

Integrating Hadoop Into the Enterprise – Hadoop Summit 2012

Big Data simplified

Hadoop: An Industry Perspective

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz

Big Data and Hadoop Basics

En vedette

Plantilla proyecto e_twinning gustavo espinosa villanuevaGustavo de Talavera

Vladimir_Suvorov_Big_dataIrina Krylova

Understanding Big DataSimplify360

Charting the Course: Using Data in the Museum to Explore, Innovate, and Reach...Robert J. Stein

Big Data for the CMOBruno Aziza

ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016Dinh Le Dat (Kevin D.)

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics

Big Data Taiwan 2014 Track2-1: SAP 善用足跡，預測未來 - 全方位的行銷視野Etu Solution

The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...gogo6

Growing Data Scientists by Amparo Alonso BetanzosBig Data Spain

Inferring the effect of an event using CausalImpact by Kay H. BrodersenBig Data Spain

Big Data Industry Insights 2015 Den Reymer

Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution

Big Data Analytics for the Industrial Internet of ThingsAnthony Chen

推動數位革命Amazon Web Services

Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享Etu Solution

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services

Gartner: Master Data Management FunctionalityGartner

A Brief History of Big DataBernard Marr

Internet of Things and Big Data: Vision and Concrete Use CasesMongoDB

En vedette (20)

Plantilla proyecto e_twinning gustavo espinosa villanueva

Vladimir_Suvorov_Big_data

Understanding Big Data

Charting the Course: Using Data in the Museum to Explore, Innovate, and Reach...

Big Data for the CMO

ANTS and BIG DATA - The it outsourcing trend - ICTCom 2016

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...

Big Data Taiwan 2014 Track2-1: SAP 善用足跡，預測未來 - 全方位的行銷視野

The IoT Food Chain – Picking the Right Dining Partner is Important with Dean ...

Growing Data Scientists by Amparo Alonso Betanzos

Inferring the effect of an event using CausalImpact by Kay H. Brodersen

Big Data Industry Insights 2015

Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

Big Data Analytics for the Industrial Internet of Things

推動數位革命

Big Data Tornado - 2015 台灣 Big Data 企業經典應用案例分享

大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)

Gartner: Master Data Management Functionality

A Brief History of Big Data

Internet of Things and Big Data: Vision and Concrete Use Cases

Similaire à Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011

Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.

Hadoop Demo eConvergencekvnnrao

TSE_Pres12.pptxssuseracaaae2

Introduction To Big Data & HadoopBlackvard

Hadoop and Big Data Analytics | SysforeSysfore Technologies

Building a Big Data SolutionJames Serra

Rajesh Angadi Brochure Rajesh Angadi

Exploring the Wider World of Big DataNetApp

Big data analysis concepts and referencesInformation Security Awareness Group

View on big data technologiesKrisshhna Daasaarii

Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk

Big data an elephant business opportunitiesBigdata Meetup Kochi

WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts

Creating a Next-Generation Big Data ArchitecturePerficient, Inc.

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl

Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin

Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu

C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks

Cloud as a Data PlatformAndrei Savu

Similaire à Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011 (20)

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop

Hadoop Demo eConvergence

TSE_Pres12.pptx

Introduction To Big Data & Hadoop

Hadoop and Big Data Analytics | Sysfore

Building a Big Data Solution

Rajesh Angadi Brochure

Exploring the Wider World of Big Data

Big data analysis concepts and references

View on big data technologies

Lecture 5 - Big Data and Hadoop Intro.ppt

Big data an elephant business opportunities

WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts

Creating a Next-Generation Big Data Architecture

Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02

Big Data: Its Characteristics And Architecture Capabilities

Big Data Tools: A Deep Dive into Essential Tools

The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios

C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...

Cloud as a Data Platform

Plus de Jonathan Seidman

Foundations for Successful Data Projects – Strata London 2019Jonathan Seidman

Foundations strata sf-2019_finalJonathan Seidman

Architecting a Next Gen Data Platform – Strata New York 2018Jonathan Seidman

Architecting a Next Gen Data Platform – Strata London 2018Jonathan Seidman

Architecting a Next Generation Data Platform – Strata Singapore 2017Jonathan Seidman

Application architectures with hadoop – big data techcon 2014Jonathan Seidman

Integrating hadoop - Big Data TechCon 2013Jonathan Seidman

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman

Real World Machine Learning at Orbitz, Strata 2011Jonathan Seidman

Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Jonathan Seidman

Plus de Jonathan Seidman (10)

Foundations for Successful Data Projects – Strata London 2019

Foundations strata sf-2019_final

Architecting a Next Gen Data Platform – Strata New York 2018

Architecting a Next Gen Data Platform – Strata London 2018

Architecting a Next Generation Data Platform – Strata Singapore 2017

Application architectures with hadoop – big data techcon 2014

Integrating hadoop - Big Data TechCon 2013

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011

Real World Machine Learning at Orbitz, Strata 2011

Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010

Dernier

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Partners Life - Insurer Innovation Award 2024The Digital Insurer

🐬 The future of MySQL is Postgres 🐘RTylerCroy

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Artificial Intelligence: Facts and MythsJoaquim Jorge

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Dernier (20)

presentation ICT roal in 21st century education

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Boost PC performance: How more available memory can improve productivity

Partners Life - Insurer Innovation Award 2024

🐬 The future of MySQL is Postgres 🐘

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

GenCyber Cyber Security Day Presentation

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Axa Assurance Maroc - Insurer Innovation Award 2024

Artificial Intelligence: Facts and Myths

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Powerful Google developer tools for immediate impact! (2023-24 C)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

The 7 Things I Know About Cyber Security After 25 Years | April 2024

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Strategies for Landing an Oracle DBA Job as a Fresher

Architecting for Big Data - Gartner Innovation Peer Forum Sept 2011

1. Architecting for Big Data Integrating Hadoop into an Enterprise Data Infrastructure Raghu Kashyap and Jonathan Seidman Gartner Peer Forum September 14 | 2011

3. page Launched in 2001, Chicago, IL Over 160 million bookings

6. Why We Started Using Hadoop page Optimizing hotel search…

9. Hadoop Was Selected as a Solution… page Transactional Data (e.g. bookings) Data Warehouse Non-Transactional Data (e.g. searches) Hadoop

10.

11. Current Big Data Infrastructure Hadoop page MapReduce HDFS MapReduce Jobs (Java, Python, R/RHIPE) Analytic Tools (Hive, Pig) Data Warehouse (Greenplum) psql, gpload, Sqoop External Analytical Jobs (Java, R, etc.) Aggregated Data Aggregated Data

12.

13. Deploying Hadoop Enabled Multiple Applications… page

14.

15.

16. Karmasphere Analyst page

17. Karmasphere Analyst page

18. Datameer Analytics Solution page

19. Datameer Analytics Solution page

20. Not to Mention Other BI Vendors… page

21.

22. Click Data Processing – Current Data Warehouse Processing page Web Server Logs ETL DW Data Cleansing (Stored procedure) DW Web Server Web Servers 3 hours 2 hours ~20% original data size

23. Click Data Processing – Proposed Hadoop Processing page Web Server Logs HDFS Data Cleansing (MapReduce) DW Web Server Web Servers

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34. Processing of Web Analytics Data

35. Aggregating data into Data Warehouse

36.

37. Business Insights page

38. Centralized Decentralization Web Analytics team + SEO team + Hotel optimization team

39.

40.

41.

Notes de l'éditeur

Most people think of orbitz.com, but Orbitz Worldwide is really a global portfolio of leading online travel consumer brands including Orbitz, Cheaptickets, The Away Network, ebookers and HotelClub. Orbitz also provides business to business services - Orbitz Worldwide Distribution provides hotel booking capabilities to a number of leading carriers such as Amtrak, Delta, LAN, KLM, Air France and Orbitz for Business provides corporate travel services to a number of Fortune 100 clients Orbitz started in 1999, orbitz site launched in 2001.
A couple of years ago when I mentioned Hadoop I’d often get blank stares, even from developers. I think most folks now are at least aware of what Hadoop is.
This chart isn’t exactly an apples-to-apples comparison, but provides some idea of the difference in cost per TB for the DW vs. Hadoop Hadoop doesn’t provide the same functionality as a data warehouse, but it does allow us to store and process data that wasn’t practical before for economic and technical reasons. Putting data into a DB or DWH requires having knowledge or making assumptions about how the data will be used. Either way you’re putting constraints around how the data is accessed and processed. With Hadoop each application can process the raw data in whatever way is required. If you decide you need to analyze different attributes you just run a new query.
The initial motivation was to solve a particular business problem. Orbitz wanted to be able to use intelligent algorithms to optimize various site functions, for example optimizing hotel search by showing consumers hotels that more closely match their preferences, leading to more bookings.
Improving hotel search requires access to such data as which hotels users saw in search results, which hotels they clicked on, and which hotels were actually booked. Much of this data was available in web analytics logs.
Management was supportive of anything that facilitated ML team efforts. But when we presented a hardware spec for servers with local non-raided storage, etc. syseng offered us blades with attached storage.
Hadoop is used to crunch data for input to a system to recommend products to users. Although we use third-party sites to monitor site performance, Hadoop allows the front end team to provide detailed reports on page download performance, providing valuable trending data not available from other sources. Data is used for analysis of user segments, which can drive personalization. This chart shows that Safari users click on hotels with higher mean and median prices as opposed to other users. This is just a handful of examples of how Hadoop is driving business value.
Recently received an email from a user seeking access to Hive. Sent him a detailed email with info on accessing Hive, etc. Received an email back basically saying “you lost me at ssh”.
Previous to 2011 Hadoop responsibilities were split across technology teams. Moving under a single team centralized responsibility and resources for Hadoop.
Processing of click data gathered by web servers. This click data contains marketing info. data cleansing step is done inside data warehouse using a stored procedure further downstream processing is done to generate final data sets for reporting Although this processing generates the required user reports, this process consumes considerable time and resources on the data warehouse, consuming resources that could be used for reports, queries, etc.
ETL step is eliminated, instead raw logs will be uploaded to HDFS which is a much faster process Moving the data cleansing to MapReduce will allow us to take advantage of Hadoop’s efficiencies and greatly speed up the processing. Moves the “heavy lifting” of processing the relatively large data sets to Hadoop, and takes advantage of Hadoop’s efficiencies.
Bad news is we need to significantly increase the number of servers in our cluster, the good news is that this is because teams are using Hadoop, and new projects are coming online.