SlideShare une entreprise Scribd logo
1  sur  9
Yahoo! TAO Case Study

Excerpt of “Tier-1 BI in the World of Big Data” presentation
at PASS 2011 Conference
by Thomas Kejser, Denny Lee, and Kenneth Lieu
Yahoo! TAO Business Challenge
                           Yahoo! manages a
                           powerful scalable
                           advertising exchange
                           that includes publishers
                           and advertisers
Yahoo! TAO Business Challenge
                           Advertisers want to get
                           the best bang for their
                           buck by reaching their
                           targeted audiences
                           effectively and efficiently
Yahoo! TAO Business Challenge



              Yahoo! needs visibility into how consumers
                 are responding to ads along many
               dimensions: web sites, creatives, time of
                day, gender, age, location to make the
                  exchange work as efficiently and
                       effectively as possible
Yahoo! TAO Technical Requirements
Visitors to Yahoo! Branded sites:   680,000,000
            Ad Impressions:   3,500,000,000    (per day)



     Rows Loaded:     464,000,000,000         (per qtr)



           Refresh Frequency:       Hourly
          Average Query Time:       <10 seconds
Yahoo! TAO Platform Architecture
                  How did we load so much so quickly?




          Data Aggregation & ETL      Data Archive & Staging                      BI Server
                                          Oracle 11G RAC                     SQL Server Analysis
                 Hadoop
                                                                              Services 2008 R2


  2PB
cluster
                File 1
                                           Partition 1                      Partition 1

                File 2
                                           Partition 2                      Partition 2


                File N                      Partition                       Partition
                                               N                               N
                              1.2TB                        135GB/day
                              /day                             compressed            24TB
                                                                                     Cube
                                                                                     /qtr
Yahoo! TAO Platform Architecture
                            Queries at the “speed of thought”


Adhoc Query/Visualization
                                                                     24TB
     Tableau Desktop 6                                               Cube
                                                                     /qtr
    Avg Query Time:
         6 secs
                                                                        464B rows of
                                                                       event level data
                                                                             /qtr



                                                    BI Query Servers
                                                    SQL Server Analysis
                                                    Services 2008 R2

     Optimization Application
                                                          Dimensions: 24
          Custom J2EE App
                                                     •
                                                     •    Attributes: 247
    Avg Query Time:                                  •    Measures: 207
         2 secs
Yahoo! TAO Return on Investment

                          For campaigns
                          optimized using TAO,
                          eCPMs (revenue)
                          has increased!




                          For campaigns
                          optimized using TAO,
                          advertisers spent
                          more with Yahoo! than
                          before
Yahoo! TAO Return on Investment




Yahoo! TAO exposed customer segment
performance to campaign managers and
 advertisers for the first time! No longer
         “flying audience blind”

Contenu connexe

Tendances

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Cloudera, Inc.
 

Tendances (20)

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopChicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop
 
Big data processing with apache spark part1
Big data processing with apache spark   part1Big data processing with apache spark   part1
Big data processing with apache spark part1
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
VMworld 2013: Big Data Extensions: Advanced Features and Customer Case Study
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Dbm630 Lecture01
Dbm630 Lecture01Dbm630 Lecture01
Dbm630 Lecture01
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics SystemFour Problems You Run into When DIY-ing a “Big Data” Analytics System
Four Problems You Run into When DIY-ing a “Big Data” Analytics System
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Red Hat - Presentation at Hortonworks Booth - Strata 2014
Red Hat - Presentation at Hortonworks Booth - Strata 2014Red Hat - Presentation at Hortonworks Booth - Strata 2014
Red Hat - Presentation at Hortonworks Booth - Strata 2014
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems詹剑锋:Big databench—benchmarking big data systems
詹剑锋:Big databench—benchmarking big data systems
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
Hadoop and big data
Hadoop and big dataHadoop and big data
Hadoop and big data
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 

Similaire à Yahoo! TAO Case Study Excerpt

Achieving Visibility and Insight across OpenStack Projects.ppt
Achieving Visibility and Insight across OpenStack Projects.pptAchieving Visibility and Insight across OpenStack Projects.ppt
Achieving Visibility and Insight across OpenStack Projects.ppt
OpenStack Foundation
 
Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011
JubatusOfficial
 

Similaire à Yahoo! TAO Case Study Excerpt (20)

Presentation RuChip Pte Ltd
Presentation RuChip Pte LtdPresentation RuChip Pte Ltd
Presentation RuChip Pte Ltd
 
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015TIBCO Advanced Analytics Meetup (TAAM) - June 2015
TIBCO Advanced Analytics Meetup (TAAM) - June 2015
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
zAgile for OpenStack Summit - v2-3.ppt
zAgile for OpenStack Summit - v2-3.pptzAgile for OpenStack Summit - v2-3.ppt
zAgile for OpenStack Summit - v2-3.ppt
 
Achieving Visibility and Insight across OpenStack Projects.ppt
Achieving Visibility and Insight across OpenStack Projects.pptAchieving Visibility and Insight across OpenStack Projects.ppt
Achieving Visibility and Insight across OpenStack Projects.ppt
 
Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011Jubatus Presentation on R&D forum 2011
Jubatus Presentation on R&D forum 2011
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and career
 
PatSeer
PatSeerPatSeer
PatSeer
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 
Big data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteauBig data paris 2011 is cool florian douetteau
Big data paris 2011 is cool florian douetteau
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...
 
BI on Cloud Computing
BI on Cloud ComputingBI on Cloud Computing
BI on Cloud Computing
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
Embedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of InnovationEmbedded Analytics: The Next Mega-Wave of Innovation
Embedded Analytics: The Next Mega-Wave of Innovation
 
Interactive query using hadoop
Interactive query using hadoopInteractive query using hadoop
Interactive query using hadoop
 
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
Hadoop World 2011: Architecting a Business-Critical Application in Hadoop - S...
 
Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011Keynote from ApacheCon NA 2011
Keynote from ApacheCon NA 2011
 

Plus de Denny Lee

Plus de Denny Lee (20)

Azure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database ServiceAzure Cosmos DB: Globally Distributed Multi-Model Database Service
Azure Cosmos DB: Globally Distributed Multi-Model Database Service
 
Spark to DocumentDB connector
Spark to DocumentDB connectorSpark to DocumentDB connector
Spark to DocumentDB connector
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
SQL Server Integration Services Best Practices
SQL Server Integration Services Best PracticesSQL Server Integration Services Best Practices
SQL Server Integration Services Best Practices
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
Differential Privacy Case Studies (CMU-MSR Mindswap on Privacy 2007)
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 
Designing, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons LearnedDesigning, Building, and Maintaining Large Cubes using Lessons Learned
Designing, Building, and Maintaining Large Cubes using Lessons Learned
 
SQLCAT - Data and Admin Security
SQLCAT - Data and Admin SecuritySQLCAT - Data and Admin Security
SQLCAT - Data and Admin Security
 
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
SQLCAT: Addressing Security and Compliance Issues with SQL Server 2008
 
SQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best PracticesSQLCAT: A Preview to PowerPivot Server Best Practices
SQLCAT: A Preview to PowerPivot Server Best Practices
 
Deploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePointDeploying and Managing PowerPivot for SharePoint
Deploying and Managing PowerPivot for SharePoint
 
Big Data, Bigger Brains
Big Data, Bigger BrainsBig Data, Bigger Brains
Big Data, Bigger Brains
 
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
 
How Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On TimeHow Concur uses Big Data to get you to Tableau Conference On Time
How Concur uses Big Data to get you to Tableau Conference On Time
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
Ensuring compliance of patient data with big data and bi [bdii 301-m] - (4078)
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 

Dernier

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Yahoo! TAO Case Study Excerpt

  • 1. Yahoo! TAO Case Study Excerpt of “Tier-1 BI in the World of Big Data” presentation at PASS 2011 Conference by Thomas Kejser, Denny Lee, and Kenneth Lieu
  • 2. Yahoo! TAO Business Challenge Yahoo! manages a powerful scalable advertising exchange that includes publishers and advertisers
  • 3. Yahoo! TAO Business Challenge Advertisers want to get the best bang for their buck by reaching their targeted audiences effectively and efficiently
  • 4. Yahoo! TAO Business Challenge Yahoo! needs visibility into how consumers are responding to ads along many dimensions: web sites, creatives, time of day, gender, age, location to make the exchange work as efficiently and effectively as possible
  • 5. Yahoo! TAO Technical Requirements Visitors to Yahoo! Branded sites: 680,000,000 Ad Impressions: 3,500,000,000 (per day) Rows Loaded: 464,000,000,000 (per qtr) Refresh Frequency: Hourly Average Query Time: <10 seconds
  • 6. Yahoo! TAO Platform Architecture How did we load so much so quickly? Data Aggregation & ETL Data Archive & Staging BI Server Oracle 11G RAC SQL Server Analysis Hadoop Services 2008 R2 2PB cluster File 1 Partition 1 Partition 1 File 2 Partition 2 Partition 2 File N Partition Partition N N 1.2TB 135GB/day /day compressed 24TB Cube /qtr
  • 7. Yahoo! TAO Platform Architecture Queries at the “speed of thought” Adhoc Query/Visualization 24TB Tableau Desktop 6 Cube /qtr Avg Query Time: 6 secs 464B rows of event level data /qtr BI Query Servers SQL Server Analysis Services 2008 R2 Optimization Application Dimensions: 24 Custom J2EE App • • Attributes: 247 Avg Query Time: • Measures: 207 2 secs
  • 8. Yahoo! TAO Return on Investment For campaigns optimized using TAO, eCPMs (revenue) has increased! For campaigns optimized using TAO, advertisers spent more with Yahoo! than before
  • 9. Yahoo! TAO Return on Investment Yahoo! TAO exposed customer segment performance to campaign managers and advertisers for the first time! No longer “flying audience blind”