SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Data Analysis at Facebook


                  Jeff Hammerbacher, Ding Zhou*
                  Facebook Inc.
Outline
• How does Facebook work
• Managing Big Data
• Data Analysis for Business Intelligence
• Data Analysis for “Artificial Intelligence”
• Questions
How does Facebook work?
Profile page - content generation portal
Newsfeed page - content consumption portal
Friends page - social graph portal
App page - social app platform
Facebook Data
▪   Social Graph Data
    ▪   The Nodes:
        ▪
            100m+ users; 100+ dimensions each user (numerical, text, categorical);
        ▪
            350k registrations daily;
    ▪   The Edges:
        ▪
            200+ friends each user (median);
        ▪
            20 categories of edges (fb friends, co-workers, family, etc);

▪   Social Behavior Data
    ▪   Social Interactions: interactions among users, via 100+ interaction types;
    ▪   Social Actions: between users and 33k+ facebook apps, via 200+ action types;

▪   Social Content Data
    ▪   Content of Posts, Notes, Photos, Video, etc
Managing Big Data
▪   Data scale [backend]:
    ▪   Over 1.3 PB raw capacity in largest cluster;
    ▪   Nearly 2 TB uncompressed data per day;
    ▪   Over 20 TB read/write per day;
▪   Distributed Data management:
    ▪   HDFS/Hadoop (MapReduce in Java);
    ▪   MetaStore (MetaData management);
    ▪   Hive QL (Query language on Hadoop+MetaStore);
    ▪   Usage:
        ▪
            at least 50 engineers have run hadoop jobs
        ▪
            3,514 Jobs weekly
        ▪
            821 Projections,152 Joins, 800 Aggregates, 600 Loaders weekly
Hadoop - MapReduce in Java


                     facebook:1
                     data:1                                  analysis:1
                     team:1                                  data:1
                                                             data:1
                                                             facebook:1   analysis:1
facebook data team           uses: 1                                      data:2
uses hadoop for              hadoop: 1                                    facebook:1
data analysis                for: 1                                       for:1
                                                                          hadoop:1
                                                                          team:1
                                                             for:1
                                                                          uses: 1
                                                             hadoop:1
                                                             team:1
                                                             uses: 1
                             data:1
                             analysis:1



                          MapReduce Execution Flow
                           [Dean, J and Ghemawat, S, 2004]
Data Analysis for Business Intelligence
Data for Business Intelligence
▪   General Goal:
    ▪   support growth and monetization strategies, and product decisions
▪   User Behavior Studies
    ▪   NUX: Longitudinal study using LARS and recursive partitioning to identify features predictive
        of engagement;
    ▪   Identity*: Unsupervised learning over user session data to identify common usage patterns.
        Techniques employed include K-Means, PageRank, dimension reduction methods;
▪   Experimentation Platform
    ▪   Columbus*: Top-level site health metrics; drill down by user groups (country, age, gender...);
    ▪
        Columbus++*: A/B testing for impact of site change on site health metrics;;

▪   Reporting System
    ▪   ad-hoc analysis done by Hive queries
                                                              * - underlined are projects that Ding Zhou participates in;
Columbus
                           Geographical bird-view of
                           growth by country




      Comparison between
      user groups
Data Analysis for “Artificial Intelligence”
                       -- predicting user social behavior
who the user will
    interact with

• predict interactions between friends

• features are user profile and browsing history

• tried linear models and tree models

• applied for search, newsfeed, etc
who the user hasn’t
      found yet

• missing edge prediction problem

• observations are friend/non-friend pairs

• features include profile and local graph info

• profile info more informative

• graph info supplemental if profile incomplete
what applications the
    user may like*

• 33k apps, only 0.1% of them used;

• a different recommendation problem;

• prediction model not applicable,
 user preference unavailable;

• build a prediction model to infer “user ratings”;

• user-based + item-based recommendation

• how to combine profile, social graph, ratings?



                  * projects that Ding Zhou participates in;
what content is
          interesting*
• newsfeed as the main content distribution channel

• stories generated by 100s of social actions:
 on the site, platform, or the Web

• <0.1% of possible stories are shown

• predictions built on story features, and user
 browsing history




                    * projects that Ding Zhou participates in;
Challenges in Data
- 100s of TBs of meaningful data available
- 1,000s of non-trivial features
- sampling not always applicable (e.g. small app has no user data)
- prediction requirements
 ▪   models regularly applied for 10 billion novel samples
 ▪   models used on-the-fly for 100k samples in 50 ms
Special Machine Learning Problems
- use machine learning to predict user behavior
 ▪   labels: insufficient; inferred implicitly; imbalanced;
 ▪   features: high-dimensional; strongly correlated; noisy;


- scale requires distributed algorithms
 ▪   in-house implementation of tree ensemble methods (bagging predictors)
 ▪   larger training sets grant performance improvements


- speed and accuracy improvements underway
tip of the iceberg

    Questions?
(c) 2004-2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

Contenu connexe

En vedette

PilotLabs IBS - Facebook analysis rankings
PilotLabs IBS - Facebook analysis rankingsPilotLabs IBS - Facebook analysis rankings
PilotLabs IBS - Facebook analysis rankingsBjorn M
 
Infographic: UK social media usage - Facebook
Infographic: UK social media usage - FacebookInfographic: UK social media usage - Facebook
Infographic: UK social media usage - FacebookHarris Interactive UK
 
Facebook Privacy Setting Tutorial
Facebook Privacy Setting Tutorial Facebook Privacy Setting Tutorial
Facebook Privacy Setting Tutorial KARMUN1295
 
Facebook tutorial
Facebook tutorialFacebook tutorial
Facebook tutorialKFCPRB
 
Facebook Usage Stats
Facebook Usage StatsFacebook Usage Stats
Facebook Usage StatsNeiman Outlen
 
Creating facebook page tutorial 2014
Creating facebook page tutorial 2014 Creating facebook page tutorial 2014
Creating facebook page tutorial 2014 Jaymar Villamor
 
After 55 facebook_tutorial
After 55 facebook_tutorialAfter 55 facebook_tutorial
After 55 facebook_tutorialTammy Fry, Ph.D.
 
Facebook Tutorial Video
Facebook Tutorial VideoFacebook Tutorial Video
Facebook Tutorial VideoMaggie Ansell
 
Facebook 101 personal usage
Facebook 101 personal usageFacebook 101 personal usage
Facebook 101 personal usageKristi Kirkland
 
AthleteTrax Marketing Strategy 2015
AthleteTrax Marketing Strategy 2015AthleteTrax Marketing Strategy 2015
AthleteTrax Marketing Strategy 2015Neiman Outlen
 

En vedette (13)

PilotLabs IBS - Facebook analysis rankings
PilotLabs IBS - Facebook analysis rankingsPilotLabs IBS - Facebook analysis rankings
PilotLabs IBS - Facebook analysis rankings
 
Infographic: UK social media usage - Facebook
Infographic: UK social media usage - FacebookInfographic: UK social media usage - Facebook
Infographic: UK social media usage - Facebook
 
Tutorial on Twitter
Tutorial on TwitterTutorial on Twitter
Tutorial on Twitter
 
Facebook Privacy Setting Tutorial
Facebook Privacy Setting Tutorial Facebook Privacy Setting Tutorial
Facebook Privacy Setting Tutorial
 
Facebook tutorial
Facebook tutorialFacebook tutorial
Facebook tutorial
 
Facebook Usage Stats
Facebook Usage StatsFacebook Usage Stats
Facebook Usage Stats
 
Facebook Tutorial
Facebook TutorialFacebook Tutorial
Facebook Tutorial
 
Creating facebook page tutorial 2014
Creating facebook page tutorial 2014 Creating facebook page tutorial 2014
Creating facebook page tutorial 2014
 
After 55 facebook_tutorial
After 55 facebook_tutorialAfter 55 facebook_tutorial
After 55 facebook_tutorial
 
Facebook Tutorial Video
Facebook Tutorial VideoFacebook Tutorial Video
Facebook Tutorial Video
 
Facebook 101 personal usage
Facebook 101 personal usageFacebook 101 personal usage
Facebook 101 personal usage
 
Twitter tutorial
Twitter tutorialTwitter tutorial
Twitter tutorial
 
AthleteTrax Marketing Strategy 2015
AthleteTrax Marketing Strategy 2015AthleteTrax Marketing Strategy 2015
AthleteTrax Marketing Strategy 2015
 

Similaire à joint statistical meeting 2008

Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...AIST
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Fabien Gandon
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...Fabien Gandon
 
DSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfDSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfAbhiThorat6
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Sept 15 2012 bxb show me the numbers
Sept 15 2012  bxb show me the numbersSept 15 2012  bxb show me the numbers
Sept 15 2012 bxb show me the numbersHack the Hood
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...guest5b1607
 
Jan 11 2013 learning lab 2013 show me the metrics
Jan 11 2013 learning lab 2013 show me the metricsJan 11 2013 learning lab 2013 show me the metrics
Jan 11 2013 learning lab 2013 show me the metricsHack the Hood
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSotiris Beis
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionSymeon Papadopoulos
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearcheXascale Infolab
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdfZixunZhou
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
Data Tools cosystem_for_non_programmers
Data Tools cosystem_for_non_programmersData Tools cosystem_for_non_programmers
Data Tools cosystem_for_non_programmersitnig
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmersOutliers Collective
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmekideaport
 

Similaire à joint statistical meeting 2008 (20)

Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
Dmitry Bugaychenko - Smart.Data@ОК.ru. How to make the world a bit better usi...
 
Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018Overview of the Research in Wimmics 2018
Overview of the Research in Wimmics 2018
 
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
One Web of pages, One Web of peoples, One Web of Services, One Web of Data, O...
 
DSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdfDSBDA Miniproject Assignment - TE A (1).pdf
DSBDA Miniproject Assignment - TE A (1).pdf
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Sept 15 2012 bxb show me the numbers
Sept 15 2012  bxb show me the numbersSept 15 2012  bxb show me the numbers
Sept 15 2012 bxb show me the numbers
 
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
Text Analytics Summit 2009 - Roddy Lindsay - "Social Media, Happiness, Petaby...
 
Jan 11 2013 learning lab 2013 show me the metrics
Jan 11 2013 learning lab 2013 show me the metricsJan 11 2013 learning lab 2013 show me the metrics
Jan 11 2013 learning lab 2013 show me the metrics
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Benchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detectionBenchmarking graph databases on the problem of community detection
Benchmarking graph databases on the problem of community detection
 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web Search
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Presentation1.pdf
Presentation1.pdfPresentation1.pdf
Presentation1.pdf
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Data Tools cosystem_for_non_programmers
Data Tools cosystem_for_non_programmersData Tools cosystem_for_non_programmers
Data Tools cosystem_for_non_programmers
 
Data tools ecosystem for non-programmers
Data tools ecosystem for non-programmersData tools ecosystem for non-programmers
Data tools ecosystem for non-programmers
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 

Dernier

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Dernier (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

joint statistical meeting 2008

  • 1.
  • 2. Data Analysis at Facebook Jeff Hammerbacher, Ding Zhou* Facebook Inc.
  • 3. Outline • How does Facebook work • Managing Big Data • Data Analysis for Business Intelligence • Data Analysis for “Artificial Intelligence” • Questions
  • 5. Profile page - content generation portal
  • 6. Newsfeed page - content consumption portal
  • 7. Friends page - social graph portal
  • 8. App page - social app platform
  • 9. Facebook Data ▪ Social Graph Data ▪ The Nodes: ▪ 100m+ users; 100+ dimensions each user (numerical, text, categorical); ▪ 350k registrations daily; ▪ The Edges: ▪ 200+ friends each user (median); ▪ 20 categories of edges (fb friends, co-workers, family, etc); ▪ Social Behavior Data ▪ Social Interactions: interactions among users, via 100+ interaction types; ▪ Social Actions: between users and 33k+ facebook apps, via 200+ action types; ▪ Social Content Data ▪ Content of Posts, Notes, Photos, Video, etc
  • 10. Managing Big Data ▪ Data scale [backend]: ▪ Over 1.3 PB raw capacity in largest cluster; ▪ Nearly 2 TB uncompressed data per day; ▪ Over 20 TB read/write per day; ▪ Distributed Data management: ▪ HDFS/Hadoop (MapReduce in Java); ▪ MetaStore (MetaData management); ▪ Hive QL (Query language on Hadoop+MetaStore); ▪ Usage: ▪ at least 50 engineers have run hadoop jobs ▪ 3,514 Jobs weekly ▪ 821 Projections,152 Joins, 800 Aggregates, 600 Loaders weekly
  • 11. Hadoop - MapReduce in Java facebook:1 data:1 analysis:1 team:1 data:1 data:1 facebook:1 analysis:1 facebook data team uses: 1 data:2 uses hadoop for hadoop: 1 facebook:1 data analysis for: 1 for:1 hadoop:1 team:1 for:1 uses: 1 hadoop:1 team:1 uses: 1 data:1 analysis:1 MapReduce Execution Flow [Dean, J and Ghemawat, S, 2004]
  • 12. Data Analysis for Business Intelligence
  • 13. Data for Business Intelligence ▪ General Goal: ▪ support growth and monetization strategies, and product decisions ▪ User Behavior Studies ▪ NUX: Longitudinal study using LARS and recursive partitioning to identify features predictive of engagement; ▪ Identity*: Unsupervised learning over user session data to identify common usage patterns. Techniques employed include K-Means, PageRank, dimension reduction methods; ▪ Experimentation Platform ▪ Columbus*: Top-level site health metrics; drill down by user groups (country, age, gender...); ▪ Columbus++*: A/B testing for impact of site change on site health metrics;; ▪ Reporting System ▪ ad-hoc analysis done by Hive queries * - underlined are projects that Ding Zhou participates in;
  • 14. Columbus Geographical bird-view of growth by country Comparison between user groups
  • 15. Data Analysis for “Artificial Intelligence” -- predicting user social behavior
  • 16. who the user will interact with • predict interactions between friends • features are user profile and browsing history • tried linear models and tree models • applied for search, newsfeed, etc
  • 17. who the user hasn’t found yet • missing edge prediction problem • observations are friend/non-friend pairs • features include profile and local graph info • profile info more informative • graph info supplemental if profile incomplete
  • 18. what applications the user may like* • 33k apps, only 0.1% of them used; • a different recommendation problem; • prediction model not applicable, user preference unavailable; • build a prediction model to infer “user ratings”; • user-based + item-based recommendation • how to combine profile, social graph, ratings? * projects that Ding Zhou participates in;
  • 19. what content is interesting* • newsfeed as the main content distribution channel • stories generated by 100s of social actions: on the site, platform, or the Web • <0.1% of possible stories are shown • predictions built on story features, and user browsing history * projects that Ding Zhou participates in;
  • 20. Challenges in Data - 100s of TBs of meaningful data available - 1,000s of non-trivial features - sampling not always applicable (e.g. small app has no user data) - prediction requirements ▪ models regularly applied for 10 billion novel samples ▪ models used on-the-fly for 100k samples in 50 ms
  • 21. Special Machine Learning Problems - use machine learning to predict user behavior ▪ labels: insufficient; inferred implicitly; imbalanced; ▪ features: high-dimensional; strongly correlated; noisy; - scale requires distributed algorithms ▪ in-house implementation of tree ensemble methods (bagging predictors) ▪ larger training sets grant performance improvements - speed and accuracy improvements underway
  • 22. tip of the iceberg Questions?
  • 23. (c) 2004-2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0