SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Evolution of Big Data
  Architectures@
     Facebook
Architecture Summit, Shenzhen, August 2012
               Ashish Thusoo
About Me

• Currently Co-founder/CEO of Qubole
• Ran the Data Infrastructure Team at
  Facebook till 2011
• Co-founded Apache Hive @ Facebook
Outline

• Big Data @ Facebook - Scope & Scale
• Evolution of Big Data Architectures @ FB
• Qubole
Big Data @ FB(2011):
        Scale

• 25 PB of compressed data ~ 150 PB of
  uncompressed data
• 400 TB/day (uncompressed) of new data
• 1 new job every second
Big Data @ FB: Scope

• Simple reporting
• Model generation
• Adhoc analysis + data science
• Index generation
• Many many others...
A/B Testing Email #1
A/B Testing Email #2
A/B Testing Email #2 is
      3x Better
Evolution: 2007-2011
                 DW Size in TB
    30000
                                    25000

    22500


    15000
                             8000
     7500

            15   250   800
        0
         2007 2008 2009 2010 2011
2007: Traditional EDW


                 Scribe Mid-Tier
                                                Summarization Cluster
 Web Clusters




                                   NAS Filers


MySQL Clusters                                  RDBMS Data Warehouse
2007: Pain Points
                                                - compute close to storage
                                                    (early map/reduce)
                 Scribe Mid-Tier

 Web Clusters




                                                               Summarization Cluster
                                   NAS Filers



MySQL Clusters




                                        - daily ETL > 24 hours
                                     - Lots of tuning/indexes etc.
                                     - Lots of hardware planning
                                                               RDBMS Data Warehouse
2007: Limitations
• Most use cases were
  in business metrics -
  data science, model
  building etc. not
  possible
• Only summary data
  was stored online -
  details archived away
2008: Move to Hadoop


                 Scribe Mid-Tier
                                                Summarization Cluster
 Web Clusters




                                   NAS Filers




MySQL Clusters                                  RDBMS Data Warehouse
2008: Move to Hadoop


                 Scribe Mid-Tier             Batch
                                            copier/
 Web Clusters
                                            loaders


                                                Hadoop/Hive Data Warehouse
                                   NAS Filers




MySQL Clusters
                                                      RDBMS Data Mart
2008: Immediate Pros
• Data science at
  scale became
  possible
• For the first time all
  of the instrumented
  data could be held
  online
• Use cases expanded
2009: Democratizing
            Data

                 Scribe Mid-Tier
 Web Clusters


                                                Hadoop/Hive Data Warehouse
                                   NAS Filers




MySQL Clusters
                                                     RDBMS Data Mart
2009: Democratizing
 Databee &
               Data                                 Nectar:
Chronos: Data                                 instrumentation &
   Pipeline                                   schema aware data
 Framework                                         collection




 HiPal: Adhoc                                      Scrapes:
Queries + Data   Hadoop/Hive Data Warehouse      Configuration
  Discovery                                        Driven
2009: Democratizing
    Data(Nectar)
• Typical Nectar Pipeline
 • Simple schema evolution
    built in
 • json encoded short term
    data
 • decomposing json for
    long term storage
2009: Democratizing
    Data (Tools)
• HiPal - data discovery
  and query authoring
• Charting and
  dashboard generation
  tools
2009: Democratizing
    Data (Tools)

• Databee: Workflow
  language
• Chronos: Scheduling
  tool
2009: Cons of
     Democratization
• Isolation to protect
  against Bad Jobs
• Fair sharing of the
  cluster - what is a
  high priority job
  and how to enforce
  it
2010: Controlling
         Chaos
• Isolation
• Reducing operational overhead
• Better resource utilization
• Measurement, ownership, accountability
2010: Isolation

                   Scribe Mid-Tier
 Web Clusters

                                                  Hadoop/Hive Data Warehouse


                                     NAS Filers



MySQL Clusters
2010: Isolation

                   Scribe Mid-Tier
 Web Clusters

                                                  Platinum Warehouse

                                                      Hive Replication
                                     NAS Filers



MySQL Clusters




                                                   Silver Warehouse
2010: Ops Efficiency

 Web Clusters    Scribe HDFS

                       ptail: parallel             Platinum Warehouse
                        tail on hdfs                   Hive Replication
                             near real time data
                                 consumers

MySQL Clusters




                                                    Silver Warehouse
2010: Resource
        Utilization (Disk)

•   HDFS-RAID: from 3
    replicas to 2.2 replicas

•   RCFile: Row columnar
    format for compressing
    Hive tables
2010: Resource
       Utilization (CPU)
•   Continuous copier/
    loaders

•   Incremental scrapes

•   Hive optimizations to
    save CPU
2010: Monitoring(SLAs)

•   Per job statistics rolled
    up to owner/group/team

•   Expected time of arrival
    vs Actual time of arrival
    of data

•   Simple data quality
    metrics
2011: New
        Requirements

• More real time requirements for
  aggregations
• Optimizing resource utilization
2011: Beyond Hadoop


• Puma for real time analytics
• Peregrine for simple and fast queries
2011: Puma

 Web Clusters     Scribe HDFS

                        ptail: parallel             Platinum Warehouse
                         tail on hdfs                   Hive Replication
                              near real time data
                                  consumers

MySQL Clusters




                                                     Silver Warehouse
2011: Puma


    Scribe HDFS



          ptail: parallel tail on
                   hdfs




 Puma Clusters
                                    Hbase Cluster
Some takeaways
• Operating and optimizing Data
  Infrastructure is a hard problem
 • Lots of components from log collection,
    storage, compute, query processing, tools
    and interfaces
 • Lots of choices within each part of the
    stack
Qubole
• Mission:
 • Data Infrastructure in the Cloud made
    Easy, Fast and Reliable
 • We take care of operating and optimizing
    this infrastructure so that you can focus
    on your data, analysis, algorithms and
    building your data apps
Qubole - Information
• Early Trial(by invitation):
 • www.qubole.com
• Come talk to us to join a small and
  passionate team
  • jobs@qubole.com
• Follow us on twitter/facebook/linkedin
Evolution of Big Data Architectures at Facebook

Contenu connexe

Tendances

NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation freeBenoit Perroud
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranMapR Technologies
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiJoydeep Sen Sarma
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingCloudera, Inc.
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 

Tendances (20)

10c introduction
10c introduction10c introduction
10c introduction
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Hadoop Fundamentals I
Hadoop Fundamentals IHadoop Fundamentals I
Hadoop Fundamentals I
 
NoSQL overview implementation free
NoSQL overview implementation freeNoSQL overview implementation free
NoSQL overview implementation free
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer ShiranThe Future of Hadoop: MapR VP of Product Management, Tomer Shiran
The Future of Hadoop: MapR VP of Product Management, Tomer Shiran
 
Hadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-DelhiHadoop Hive Talk At IIT-Delhi
Hadoop Hive Talk At IIT-Delhi
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Hadoop: Distributed Data Processing
Hadoop: Distributed Data ProcessingHadoop: Distributed Data Processing
Hadoop: Distributed Data Processing
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
 
HDFS
HDFSHDFS
HDFS
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop
Hadoop Hadoop
Hadoop
 

En vedette

Scaling agileteamsderby2012
Scaling agileteamsderby2012Scaling agileteamsderby2012
Scaling agileteamsderby2012drewz lin
 
低功耗服务器定制与绿色计算——章文嵩(淘宝)
低功耗服务器定制与绿色计算——章文嵩(淘宝)低功耗服务器定制与绿色计算——章文嵩(淘宝)
低功耗服务器定制与绿色计算——章文嵩(淘宝)drewz lin
 
Writing high quality code for agile2012
Writing high quality code for agile2012Writing high quality code for agile2012
Writing high quality code for agile2012drewz lin
 
Pragmatic notdogmatictdd agile2012
Pragmatic notdogmatictdd   agile2012Pragmatic notdogmatictdd   agile2012
Pragmatic notdogmatictdd agile2012drewz lin
 
Top100summit 芈珺七拼八凑搭建移动自动化测试框架
Top100summit 芈珺七拼八凑搭建移动自动化测试框架Top100summit 芈珺七拼八凑搭建移动自动化测试框架
Top100summit 芈珺七拼八凑搭建移动自动化测试框架drewz lin
 
Continuous delivery agile_2012
Continuous delivery agile_2012Continuous delivery agile_2012
Continuous delivery agile_2012drewz lin
 
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝drewz lin
 
Story mapstestplansandothercrosscutting
Story mapstestplansandothercrosscuttingStory mapstestplansandothercrosscutting
Story mapstestplansandothercrosscuttingdrewz lin
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013drewz lin
 
Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearydrewz lin
 

En vedette (10)

Scaling agileteamsderby2012
Scaling agileteamsderby2012Scaling agileteamsderby2012
Scaling agileteamsderby2012
 
低功耗服务器定制与绿色计算——章文嵩(淘宝)
低功耗服务器定制与绿色计算——章文嵩(淘宝)低功耗服务器定制与绿色计算——章文嵩(淘宝)
低功耗服务器定制与绿色计算——章文嵩(淘宝)
 
Writing high quality code for agile2012
Writing high quality code for agile2012Writing high quality code for agile2012
Writing high quality code for agile2012
 
Pragmatic notdogmatictdd agile2012
Pragmatic notdogmatictdd   agile2012Pragmatic notdogmatictdd   agile2012
Pragmatic notdogmatictdd agile2012
 
Top100summit 芈珺七拼八凑搭建移动自动化测试框架
Top100summit 芈珺七拼八凑搭建移动自动化测试框架Top100summit 芈珺七拼八凑搭建移动自动化测试框架
Top100summit 芈珺七拼八凑搭建移动自动化测试框架
 
Continuous delivery agile_2012
Continuous delivery agile_2012Continuous delivery agile_2012
Continuous delivery agile_2012
 
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝
F1 07 淘宝软件基础设施构建实践_章文嵩_淘宝
 
Story mapstestplansandothercrosscutting
Story mapstestplansandothercrosscuttingStory mapstestplansandothercrosscutting
Story mapstestplansandothercrosscutting
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
 
Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
 

Similaire à Evolution of Big Data Architectures at Facebook

Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Datacwensel
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Acunu
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvewKunal Khanna
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 

Similaire à Evolution of Big Data Architectures at Facebook (20)

Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 

Plus de drewz lin

Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13drewz lin
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrichdrewz lin
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2drewz lin
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2drewz lin
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfdrewz lin
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equaldrewz lin
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21drewz lin
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansendrewz lin
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaoladrewz lin
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsdrewz lin
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentationdrewz lin
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsdrewz lin
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martindrewz lin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowaspdrewz lin
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usadrewz lin
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013drewz lin
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架drewz lin
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈drewz lin
 
无线App的性能分析和监控实践 rickyqiu
无线App的性能分析和监控实践 rickyqiu无线App的性能分析和监控实践 rickyqiu
无线App的性能分析和监控实践 rickyqiudrewz lin
 
网易移动自动化测试实践(孔庆云)
网易移动自动化测试实践(孔庆云)网易移动自动化测试实践(孔庆云)
网易移动自动化测试实践(孔庆云)drewz lin
 

Plus de drewz lin (20)

Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈
 
无线App的性能分析和监控实践 rickyqiu
无线App的性能分析和监控实践 rickyqiu无线App的性能分析和监控实践 rickyqiu
无线App的性能分析和监控实践 rickyqiu
 
网易移动自动化测试实践(孔庆云)
网易移动自动化测试实践(孔庆云)网易移动自动化测试实践(孔庆云)
网易移动自动化测试实践(孔庆云)
 

Dernier

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Dernier (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Evolution of Big Data Architectures at Facebook

  • 1. Evolution of Big Data Architectures@ Facebook Architecture Summit, Shenzhen, August 2012 Ashish Thusoo
  • 2. About Me • Currently Co-founder/CEO of Qubole • Ran the Data Infrastructure Team at Facebook till 2011 • Co-founded Apache Hive @ Facebook
  • 3. Outline • Big Data @ Facebook - Scope & Scale • Evolution of Big Data Architectures @ FB • Qubole
  • 4. Big Data @ FB(2011): Scale • 25 PB of compressed data ~ 150 PB of uncompressed data • 400 TB/day (uncompressed) of new data • 1 new job every second
  • 5. Big Data @ FB: Scope • Simple reporting • Model generation • Adhoc analysis + data science • Index generation • Many many others...
  • 8. A/B Testing Email #2 is 3x Better
  • 9. Evolution: 2007-2011 DW Size in TB 30000 25000 22500 15000 8000 7500 15 250 800 0 2007 2008 2009 2010 2011
  • 10. 2007: Traditional EDW Scribe Mid-Tier Summarization Cluster Web Clusters NAS Filers MySQL Clusters RDBMS Data Warehouse
  • 11. 2007: Pain Points - compute close to storage (early map/reduce) Scribe Mid-Tier Web Clusters Summarization Cluster NAS Filers MySQL Clusters - daily ETL > 24 hours - Lots of tuning/indexes etc. - Lots of hardware planning RDBMS Data Warehouse
  • 12. 2007: Limitations • Most use cases were in business metrics - data science, model building etc. not possible • Only summary data was stored online - details archived away
  • 13. 2008: Move to Hadoop Scribe Mid-Tier Summarization Cluster Web Clusters NAS Filers MySQL Clusters RDBMS Data Warehouse
  • 14. 2008: Move to Hadoop Scribe Mid-Tier Batch copier/ Web Clusters loaders Hadoop/Hive Data Warehouse NAS Filers MySQL Clusters RDBMS Data Mart
  • 15. 2008: Immediate Pros • Data science at scale became possible • For the first time all of the instrumented data could be held online • Use cases expanded
  • 16. 2009: Democratizing Data Scribe Mid-Tier Web Clusters Hadoop/Hive Data Warehouse NAS Filers MySQL Clusters RDBMS Data Mart
  • 17. 2009: Democratizing Databee & Data Nectar: Chronos: Data instrumentation & Pipeline schema aware data Framework collection HiPal: Adhoc Scrapes: Queries + Data Hadoop/Hive Data Warehouse Configuration Discovery Driven
  • 18. 2009: Democratizing Data(Nectar) • Typical Nectar Pipeline • Simple schema evolution built in • json encoded short term data • decomposing json for long term storage
  • 19. 2009: Democratizing Data (Tools) • HiPal - data discovery and query authoring • Charting and dashboard generation tools
  • 20. 2009: Democratizing Data (Tools) • Databee: Workflow language • Chronos: Scheduling tool
  • 21. 2009: Cons of Democratization • Isolation to protect against Bad Jobs • Fair sharing of the cluster - what is a high priority job and how to enforce it
  • 22. 2010: Controlling Chaos • Isolation • Reducing operational overhead • Better resource utilization • Measurement, ownership, accountability
  • 23. 2010: Isolation Scribe Mid-Tier Web Clusters Hadoop/Hive Data Warehouse NAS Filers MySQL Clusters
  • 24. 2010: Isolation Scribe Mid-Tier Web Clusters Platinum Warehouse Hive Replication NAS Filers MySQL Clusters Silver Warehouse
  • 25. 2010: Ops Efficiency Web Clusters Scribe HDFS ptail: parallel Platinum Warehouse tail on hdfs Hive Replication near real time data consumers MySQL Clusters Silver Warehouse
  • 26. 2010: Resource Utilization (Disk) • HDFS-RAID: from 3 replicas to 2.2 replicas • RCFile: Row columnar format for compressing Hive tables
  • 27. 2010: Resource Utilization (CPU) • Continuous copier/ loaders • Incremental scrapes • Hive optimizations to save CPU
  • 28. 2010: Monitoring(SLAs) • Per job statistics rolled up to owner/group/team • Expected time of arrival vs Actual time of arrival of data • Simple data quality metrics
  • 29. 2011: New Requirements • More real time requirements for aggregations • Optimizing resource utilization
  • 30. 2011: Beyond Hadoop • Puma for real time analytics • Peregrine for simple and fast queries
  • 31. 2011: Puma Web Clusters Scribe HDFS ptail: parallel Platinum Warehouse tail on hdfs Hive Replication near real time data consumers MySQL Clusters Silver Warehouse
  • 32. 2011: Puma Scribe HDFS ptail: parallel tail on hdfs Puma Clusters Hbase Cluster
  • 33. Some takeaways • Operating and optimizing Data Infrastructure is a hard problem • Lots of components from log collection, storage, compute, query processing, tools and interfaces • Lots of choices within each part of the stack
  • 34. Qubole • Mission: • Data Infrastructure in the Cloud made Easy, Fast and Reliable • We take care of operating and optimizing this infrastructure so that you can focus on your data, analysis, algorithms and building your data apps
  • 35. Qubole - Information • Early Trial(by invitation): • www.qubole.com • Come talk to us to join a small and passionate team • jobs@qubole.com • Follow us on twitter/facebook/linkedin