SlideShare a Scribd company logo
1 of 12
Download to read offline
BI on Cloud Computing
Rohit Chatter
BI & Data Architect
Data & Insights Group
Yahoo India R & D, Bangalore
Agenda

•   Cloud Computing – Quick look
•   Cloud@Yahoo – The Yahoo! way
•   Case study of BI on Cloud @ Yahoo – Live case
•   My personal views – sharing experience
•   Q&A
Cloud Computing

• What is it?
  – style of computing where massively scalable IT-related capabilities are
    provided “as a service” using Internet technologies to multiple
    external customers
  – E.g. Amazon EC2/S3, Yahoo, Google

• Key Features
  – Multi-Tenancy, On demand resources, Device & location independence, API
    based, scalability and others

• A Perspective
Focus on the business!

• Desire - start a new website, iwanttosell.com
• Service - provide listings of items for sale, jobs, etc.
• Business does well and more features needed
    – How do we scale for demand?
•   Store listings as (key, category, description)
•   Customers quickly ask for keyword search
•   Add photos to listings
•   And the business continues to grow and grow!
Cloud Computing – a perspective




                   •   Copyright University of California at Berkeley
Internet Scale Generates BigData
• Yahoo is the most Visited Site on the Internet
   –   600M+ Unique Visitors per Month
   –   Billions of Page Views per Day
   –   Billions of Searches per Month
   –   Billions of Emails per Month
   –   Terabytes of Data per Day!

• And we crawl the Web
   – 100+ Billion Pages
   – 5+ Trillion Links
   – Petabytes of data
• Reading 100 Terabytes could be overwhelming
       Std PC – 100Mbps         Server – 10Gbps   1000 Std PC
          ~ 11 days                 ~ 1 day        ~ 15 mins
How is Yahoo seeing the space?

•   Yahoo sees two kinds of cloud services:            •   Technology – Open Source adoption
     – Horizontal Cloud Services                            – Hadoop – Grid
         • Functionality enabling tenants to build
            applications or new services on top of          – PIG – Programming language
            the cloud
         • The focus of CCDI                                – ZooKeeper -- High-Availability Directory
     – Functional Cloud Services                              and Configuration Service
         • Functionality that is useful in and of
            itself to tenants.                              – Oozie – Workflow engine
               – Yahoo!’s IndexTools; Yahoo!
                   properties aimed at end-users
                   e.g., flickr, Groups, Mail, News,
                   Shopping
         • Could be build on top of horizontal
            cloud services or from scratch
BI on Cloud – Case study

• Motivation
    •   Report & Data requirements unknown
    •   Evolving needs
    •   Large data processing on demand
    •   Web based access

•   Architecture
•   Functional View
•   What is computed where?
•   Few screenshots
•   Benefits
BI on Cloud – Architecture

  Functional View                                     What is computed
                                                           where
Apache Web Server          Load balanced web
      PHP
 Microstrategy/Hom                                  Derived Metrics – CTR, Depth,
                          App Server – BI layer            RPM, Coverage
      e Grown

  Oracle RDBMS       Aggregates &                     Rollups, Type 2 Dimension,

                     Metadata layer                       Alerts & Messaging
  BI Aggregates
    (H,D,W,M)
                                                             Metrics
 Utility Computing        Hadoop Grid + PIG        Impressions, Revenue, Clicks,
 Build Aggregates                                   Conversions, Quality Score,
                               Cloud                       Top keywords



  Data Source          Data – 100+ Gigabytes/Day
Dimension & Fact
BI on Cloud – Screenshot
In My Opinion

• Is Cloud ready for DW & BI?
• Pros & Cons of BI on Cloud
• Options looked at:
  – Custom solution & Hive
  – Microstrategy & Hive
  – Pentaho
‘Determine that things can and shall be done, and then we shall find the way!’
                                                                           A. Lincoln

More Related Content

Similar to BI on Cloud Computing: Leveraging Scalable Infrastructure

SharePoint User Experience Best Practices
SharePoint User Experience Best PracticesSharePoint User Experience Best Practices
SharePoint User Experience Best PracticesPerficient, Inc.
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinerySteve Loughran
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranJAX London
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
Drupal for Public Sector Organisations
Drupal for Public Sector OrganisationsDrupal for Public Sector Organisations
Drupal for Public Sector Organisationspanlogic
 
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...Lucia Garcia
 
MEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationMEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationDaniel Cohen-Dumani
 
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...Perficient, Inc.
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...Axway Appcelerator
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreTim Schneider
 
IDC Bari-12print
IDC Bari-12printIDC Bari-12print
IDC Bari-12printVMEngine
 
SaaS - Taking a Closer Look
SaaS - Taking a Closer LookSaaS - Taking a Closer Look
SaaS - Taking a Closer LookAnja Rej
 
Time to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamTime to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamInside Analysis
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesMongoDB
 

Similar to BI on Cloud Computing: Leveraging Scalable Infrastructure (20)

Highlights from SharePoint Conference 2011
Highlights from SharePoint Conference 2011Highlights from SharePoint Conference 2011
Highlights from SharePoint Conference 2011
 
SharePoint User Experience Best Practices
SharePoint User Experience Best PracticesSharePoint User Experience Best Practices
SharePoint User Experience Best Practices
 
Hadoop as data refinery
Hadoop as data refineryHadoop as data refinery
Hadoop as data refinery
 
Hadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve LoughranHadoop as Data Refinery - Steve Loughran
Hadoop as Data Refinery - Steve Loughran
 
How we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changing
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Drupal for Public Sector Organisations
Drupal for Public Sector OrganisationsDrupal for Public Sector Organisations
Drupal for Public Sector Organisations
 
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
William Makower, CEO, Panlogic Ltd; Robert Hill , Web Services Manager, Oxfor...
 
MEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentationMEEC Baltimore SharePoint 2010 presentation
MEEC Baltimore SharePoint 2010 presentation
 
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
SharePoint My Sites: Aligning Business Needs, Corporate Culture & SharePoint ...
 
Codestrong 2012 breakout session the role of cloud services in your next ge...
Codestrong 2012 breakout session   the role of cloud services in your next ge...Codestrong 2012 breakout session   the role of cloud services in your next ge...
Codestrong 2012 breakout session the role of cloud services in your next ge...
 
U of A Web Strategy and Sitecore
U of A Web Strategy and SitecoreU of A Web Strategy and Sitecore
U of A Web Strategy and Sitecore
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
IDC Bari-12print
IDC Bari-12printIDC Bari-12print
IDC Bari-12print
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
SaaS - Taking a Closer Look
SaaS - Taking a Closer LookSaaS - Taking a Closer Look
SaaS - Taking a Closer Look
 
Time to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going MainstreamTime to Fly - Why Predictive Analytics is Going Mainstream
Time to Fly - Why Predictive Analytics is Going Mainstream
 
Nosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use CasesNosql Now 2012: MongoDB Use Cases
Nosql Now 2012: MongoDB Use Cases
 

More from tdwiindia

TDWI Speaker profiles
TDWI Speaker profilesTDWI Speaker profiles
TDWI Speaker profilestdwiindia
 
TDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State VisionTDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State Visiontdwiindia
 
Tdwi event summary
Tdwi event summaryTdwi event summary
Tdwi event summarytdwiindia
 
Intel IT Cloud Strategy
Intel IT Cloud StrategyIntel IT Cloud Strategy
Intel IT Cloud Strategytdwiindia
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPtdwiindia
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloudtdwiindia
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloudtdwiindia
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloudtdwiindia
 
Business Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business PerspectiveBusiness Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business Perspectivetdwiindia
 

More from tdwiindia (9)

TDWI Speaker profiles
TDWI Speaker profilesTDWI Speaker profiles
TDWI Speaker profiles
 
TDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State VisionTDWI Inda BI on Cloud Future State Vision
TDWI Inda BI on Cloud Future State Vision
 
Tdwi event summary
Tdwi event summaryTdwi event summary
Tdwi event summary
 
Intel IT Cloud Strategy
Intel IT Cloud StrategyIntel IT Cloud Strategy
Intel IT Cloud Strategy
 
BI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAPBI on Cloud - Perspective from SAP
BI on Cloud - Perspective from SAP
 
Data Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on CloudData Warehousing Infrastructure on Cloud
Data Warehousing Infrastructure on Cloud
 
Big data appliances for BI on Cloud
Big data appliances for BI on CloudBig data appliances for BI on Cloud
Big data appliances for BI on Cloud
 
What is BI on Cloud
What is BI on CloudWhat is BI on Cloud
What is BI on Cloud
 
Business Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business PerspectiveBusiness Intelligence on Cloud: A Business Perspective
Business Intelligence on Cloud: A Business Perspective
 

Recently uploaded

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

BI on Cloud Computing: Leveraging Scalable Infrastructure

  • 1. BI on Cloud Computing Rohit Chatter BI & Data Architect Data & Insights Group Yahoo India R & D, Bangalore
  • 2. Agenda • Cloud Computing – Quick look • Cloud@Yahoo – The Yahoo! way • Case study of BI on Cloud @ Yahoo – Live case • My personal views – sharing experience • Q&A
  • 3. Cloud Computing • What is it? – style of computing where massively scalable IT-related capabilities are provided “as a service” using Internet technologies to multiple external customers – E.g. Amazon EC2/S3, Yahoo, Google • Key Features – Multi-Tenancy, On demand resources, Device & location independence, API based, scalability and others • A Perspective
  • 4. Focus on the business! • Desire - start a new website, iwanttosell.com • Service - provide listings of items for sale, jobs, etc. • Business does well and more features needed – How do we scale for demand? • Store listings as (key, category, description) • Customers quickly ask for keyword search • Add photos to listings • And the business continues to grow and grow!
  • 5. Cloud Computing – a perspective • Copyright University of California at Berkeley
  • 6. Internet Scale Generates BigData • Yahoo is the most Visited Site on the Internet – 600M+ Unique Visitors per Month – Billions of Page Views per Day – Billions of Searches per Month – Billions of Emails per Month – Terabytes of Data per Day! • And we crawl the Web – 100+ Billion Pages – 5+ Trillion Links – Petabytes of data • Reading 100 Terabytes could be overwhelming Std PC – 100Mbps Server – 10Gbps 1000 Std PC ~ 11 days ~ 1 day ~ 15 mins
  • 7. How is Yahoo seeing the space? • Yahoo sees two kinds of cloud services: • Technology – Open Source adoption – Horizontal Cloud Services – Hadoop – Grid • Functionality enabling tenants to build applications or new services on top of – PIG – Programming language the cloud • The focus of CCDI – ZooKeeper -- High-Availability Directory – Functional Cloud Services and Configuration Service • Functionality that is useful in and of itself to tenants. – Oozie – Workflow engine – Yahoo!’s IndexTools; Yahoo! properties aimed at end-users e.g., flickr, Groups, Mail, News, Shopping • Could be build on top of horizontal cloud services or from scratch
  • 8. BI on Cloud – Case study • Motivation • Report & Data requirements unknown • Evolving needs • Large data processing on demand • Web based access • Architecture • Functional View • What is computed where? • Few screenshots • Benefits
  • 9. BI on Cloud – Architecture Functional View What is computed where Apache Web Server Load balanced web PHP Microstrategy/Hom Derived Metrics – CTR, Depth, App Server – BI layer RPM, Coverage e Grown Oracle RDBMS Aggregates & Rollups, Type 2 Dimension, Metadata layer Alerts & Messaging BI Aggregates (H,D,W,M) Metrics Utility Computing Hadoop Grid + PIG Impressions, Revenue, Clicks, Build Aggregates Conversions, Quality Score, Cloud Top keywords Data Source Data – 100+ Gigabytes/Day Dimension & Fact
  • 10. BI on Cloud – Screenshot
  • 11. In My Opinion • Is Cloud ready for DW & BI? • Pros & Cons of BI on Cloud • Options looked at: – Custom solution & Hive – Microstrategy & Hive – Pentaho
  • 12. ‘Determine that things can and shall be done, and then we shall find the way!’ A. Lincoln