SlideShare une entreprise Scribd logo
1  sur  24
Building a Highly
    Scalable, Open
Source Twitter Clone
 Dan Diephouse (dan@netzooid.com)
  Paul Brown (prb@mult.ifario.us)
Motivation
★   Wide (and growing) variety of
    non-relational databases.
    (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk)


★   Twitter application model
    presents interesting
    challenges of scope and
    scale.
    (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
Storage Metaphors
★   Key/Value Store
    Opaque values; fast and simple.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt
    ★   Dynomite — http://bit.ly/12AYmf
    ★   Redis — http://bit.ly/LBtCh
    ★   Tokyo Tyrant — http://bit.ly/oU4uV
    ★   Voldemort – http://bit.ly/oU4uV
Key/Value
Key Value

1




2




3
Storage Metaphors
★   Document-Oriented
    Unstructured content; rich queries.
★   Examples:
    ★   CouchDB — http://bit.ly/JAgUM
    ★   MongoDB — http://bit.ly/HDDOV
    ★   SOLR — http://bit.ly/q4gyi
    ★   XML databases...
Document-Oriented
ID=“dan-tweet-1”,
TEXT=“hello world”

ID=dan-tweet-2,
TEXT=“Twirp!”,
IN-REPLY-TO=“paul-tweet-5”
Storage Metaphors
★   Column-Oriented

    Organized in columns; easily scanned.
★   Examples:
    ★   Cassandra* — http://bit.ly/EdUEt

    ★   BigTable — http://bit.ly/QqMYA
        (available within AppEngine)


    ★   HBase — http://bit.ly/Zck7F
    ★   SimpleDB — http://bit.ly/toh0P
        (Typica library for Java — http://bit.ly/22kxZ4)
Column-Oriented
    Name          Date                  Tweet Text
    Bob           20090506              Eating dinner.
    Dan           20090507              Is it Friday yet?
    Dan           20090506              Beer me!
    Ralph         20090508              My bum itches.



Index   Name         Index   Date         Index   Tweet Text
0       Bob          0       20090506     0       Eating dinner.
1       Dan          1       20090507     1       Is it Friday yet?
2       Dan          2       20090506     2       Beer me!
3       Ralph        3       20090508     3       My bum itches.

        Storage          Storage                  Storage
Every Store is Special.

★   Lots of different little tweaks
    to the storage model.
★   Widely varying levels of
    maturity.
★   Growing communities.
★   Limited (but growing) tooling,
    libraries, and production
    adoption.
Reliability Through
        Replication
★   Consistent hashing to assign
    keys to partitions.
★   Partitions replicated on
    multiple nodes for
    redundancy.
★   Minimum number of successful
    reads to consider a write
    complete.
Reliability Through
         Replication
    PUT (k,v)

Client
Web UI
http://tat1.datapr0n.com:8080
Stores
★   Tweets
    Individual tweets.

★   Friends’ Timeline
    Fixed-length timelines.

★   Users
    Info and followers.

★   Command Queue
    Actions to perform (tweet, follow, etc.).
Data
★   Command (Java serialization)
    Keyed by node name, increasing ID.
★   Tweets (Java serialization)
    Keyed by user name, increasing ID.
★   FriendsTimeline (Java serialization)
    Keyed by username.
    List of date, tweet ID.
★   Users (Java serialization)
    Keyed by username.
    Followers (list), Followed (list), last tweet ID.
Life of a Tweet, Part I
                 1

                     Beer me.                    Users
1.User tweets.                             2


2.Find next
  tweet ID for                             3   Commands




                                Web Tier
  user.
3.Store “tweet                                  Friends
                                                Timeline

  for user”
  command.
                                                Tweets
Life of a Tweet, Part II
                         Where's
1. Read next command.   Demi with
                                                          Users
                        my beer?!?
2. Store tweet in
   user’s timeline                              1
   (Tweets).                                            Commands

                                                    4




                                     Web Tier
3. Store tweet ID in
   friends’
                                                    3
   timelines.                                            Friends
                                                         Timeline
   (Requires *many*
   operations.)
                                                    2

4. DELETE command.                                       Tweets
Some Patterns
★   “Sequences” are implemented
    as race-for-non-collision.
★   “Joins” are common keys or
    keys referenced from values.
★   “Transactions” are idempotent
    operations with DELETE at the
    end.
Operations
★   Deploy to Amazon EC2
    ★   2 nodes for Voldemort
    ★   2 nodes for Tomcat
    ★   1 node for Cacti
★   All “small” instances w/RightScale CentOS
    5.2 image.
★   Minor inconvenience of “EBS” volume for
    MySQL for Cacti.
    (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
Deployment
★   Lots of choices for automated rollout
    (Chef, Capistrano, etc.)
★   Took simplest path — Maven build, Ant
    (scp/ssh and property substitution
    tasks), and bash scripts.
    for i in vn1 vn2; do

      ant -Dnode=${i} setup-v-node

    done

★   Takes ~30 seconds to provision a Tomcat
    or Voldemort node.
Dashboarding
★   As above, lots of choices
    (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.)


★   Cacti as simplest choice.
    yum install -y cacti

★   Vanilla SNMP on nodes for host
    data.
★   Minimal extensions to Voldemort
    for stats in Cacti-friendly
    format.
Dashboarding
Performance
★   270 req/sec for getFriendsTimeline against
    web tier.
    ★   21 GETs on V stores to pull data.
    ★   5600 req/sec for V is similar to
        performance reported at NoSQL meetup (20k
        req/sec) when adjusted for hardware.
    ★   Cache on the web tier could make this
        faster...
★   Some hassles when hammering individual keys
    with rapid updates.
Take Aways
★   Linked-list representation deserves some thought
    (and experiments).
    Dynomite + Osmos (http://bit.ly/BYMdW)

★   Additional use cases (search, rich API, replies,
    direct messages, etc.) might alter design.
★   BigTable/HBase approach deserves another look.
★   Source code is available; come and git it.

    http://github.com/prb/bigbird

    git://github.com/prb/bigbird.git
Coordinates
★   Dan Diephouse (@dandiep)
    dan@netzooid.com
    http://netzooid.com
★   Paul Brown (@paulrbrown)
    prb@mult.ifario.us
    http://mult.ifario.us/a

Contenu connexe

Tendances

Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureHaris456
 
An overview of TCP (Transmission Control Protocol)
An overview of TCP (Transmission Control Protocol)An overview of TCP (Transmission Control Protocol)
An overview of TCP (Transmission Control Protocol)Ammad Marwat
 
Block cipher modes of operation
Block cipher modes of operation Block cipher modes of operation
Block cipher modes of operation harshit chavda
 
Information and data security cryptography and network security
Information and data security cryptography and network securityInformation and data security cryptography and network security
Information and data security cryptography and network securityMazin Alwaaly
 
Inbound marketing AMA Webinar
Inbound marketing AMA WebinarInbound marketing AMA Webinar
Inbound marketing AMA WebinarHubSpot
 
The constrained application protocol (CoAP)
The constrained application protocol (CoAP)The constrained application protocol (CoAP)
The constrained application protocol (CoAP)Hamdamboy (함담보이)
 
HTTP vs HTTPS, Do You Really Need HTTPS?
HTTP vs HTTPS, Do You Really Need HTTPS?HTTP vs HTTPS, Do You Really Need HTTPS?
HTTP vs HTTPS, Do You Really Need HTTPS?CheapSSLsecurity
 
Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresentElma Belitz
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherNiloy Biswas
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)Haris Ahmed
 
PPP (Point to Point Protocol)
PPP (Point to Point Protocol)PPP (Point to Point Protocol)
PPP (Point to Point Protocol)Ali Jafar
 

Tendances (20)

Computer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer ArchitectureComputer Memory Hierarchy Computer Architecture
Computer Memory Hierarchy Computer Architecture
 
Sha
ShaSha
Sha
 
An overview of TCP (Transmission Control Protocol)
An overview of TCP (Transmission Control Protocol)An overview of TCP (Transmission Control Protocol)
An overview of TCP (Transmission Control Protocol)
 
ch14.ppt
ch14.pptch14.ppt
ch14.ppt
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Block cipher modes of operation
Block cipher modes of operation Block cipher modes of operation
Block cipher modes of operation
 
Firewall presentation
Firewall presentationFirewall presentation
Firewall presentation
 
Information and data security cryptography and network security
Information and data security cryptography and network securityInformation and data security cryptography and network security
Information and data security cryptography and network security
 
Atm
AtmAtm
Atm
 
Dqdb
DqdbDqdb
Dqdb
 
Inbound marketing AMA Webinar
Inbound marketing AMA WebinarInbound marketing AMA Webinar
Inbound marketing AMA Webinar
 
The constrained application protocol (CoAP)
The constrained application protocol (CoAP)The constrained application protocol (CoAP)
The constrained application protocol (CoAP)
 
IEEE 802.1 x
IEEE 802.1 xIEEE 802.1 x
IEEE 802.1 x
 
HTTP vs HTTPS, Do You Really Need HTTPS?
HTTP vs HTTPS, Do You Really Need HTTPS?HTTP vs HTTPS, Do You Really Need HTTPS?
HTTP vs HTTPS, Do You Really Need HTTPS?
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 
Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresent
 
Cryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipherCryptography - Block cipher & stream cipher
Cryptography - Block cipher & stream cipher
 
Data Encryption Standard (DES)
Data Encryption Standard (DES)Data Encryption Standard (DES)
Data Encryption Standard (DES)
 
Ipsec
IpsecIpsec
Ipsec
 
PPP (Point to Point Protocol)
PPP (Point to Point Protocol)PPP (Point to Point Protocol)
PPP (Point to Point Protocol)
 

Similaire à Building a Highly Scalable, Open Source Twitter Clone

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeDavid Boike
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdfYogeshwaran R
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMGuillaume Arnaud
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF WorkshopIdo Flatow
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Rich Bowen
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central DispatchRobert Brown
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalHoward Glynn
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterLucidworks
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialTom Croucher
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesPeter Hlavaty
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshynvtors
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task QueueRichard Leland
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Bostonbenbrowning
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDocker, Inc.
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitationegypt
 

Similaire à Building a Highly Scalable, Open Source Twitter Clone (20)

Modeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught MeModeling Tricks My Relational Database Never Taught Me
Modeling Tricks My Relational Database Never Taught Me
 
IP Multicast on ec2
IP Multicast on ec2IP Multicast on ec2
IP Multicast on ec2
 
Docker interview Questions-3.pdf
Docker interview Questions-3.pdfDocker interview Questions-3.pdf
Docker interview Questions-3.pdf
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
spdy
spdyspdy
spdy
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Grand Central Dispatch
Grand Central DispatchGrand Central Dispatch
Grand Central Dispatch
 
Real world cloud formation feb 2014 final
Real world cloud formation feb 2014 finalReal world cloud formation feb 2014 final
Real world cloud formation feb 2014 final
 
Search at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, TwitterSearch at Twitter: Presented by Michael Busch, Twitter
Search at Twitter: Presented by Michael Busch, Twitter
 
OSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js TutorialOSCON 2011 - Node.js Tutorial
OSCON 2011 - Node.js Tutorial
 
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytesWindows Kernel Exploitation : This Time Font hunt you down in 4 bytes
Windows Kernel Exploitation : This Time Font hunt you down in 4 bytes
 
Voltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. TorshynVoltdb: Shard It by V. Torshyn
Voltdb: Shard It by V. Torshyn
 
Celery: The Distributed Task Queue
Celery: The Distributed Task QueueCelery: The Distributed Task Queue
Celery: The Distributed Task Queue
 
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 BostonScaling Rails With Torquebox Presented at JUDCon:2011 Boston
Scaling Rails With Torquebox Presented at JUDCon:2011 Boston
 
Demystfying container-networking
Demystfying container-networkingDemystfying container-networking
Demystfying container-networking
 
DCSF19 Containers for Beginners
DCSF19 Containers for BeginnersDCSF19 Containers for Beginners
DCSF19 Containers for Beginners
 
Post Metasploitation
Post MetasploitationPost Metasploitation
Post Metasploitation
 

Dernier

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Building a Highly Scalable, Open Source Twitter Clone

  • 1. Building a Highly Scalable, Open Source Twitter Clone Dan Diephouse (dan@netzooid.com) Paul Brown (prb@mult.ifario.us)
  • 2. Motivation ★ Wide (and growing) variety of non-relational databases. (viz. NoSQL — http://bit.ly/pLhqQ, http://bit.ly/17MmTk) ★ Twitter application model presents interesting challenges of scope and scale. (viz. “Fixing Twitter” http://bit.ly/2VmZdz)
  • 3. Storage Metaphors ★ Key/Value Store Opaque values; fast and simple. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ Dynomite — http://bit.ly/12AYmf ★ Redis — http://bit.ly/LBtCh ★ Tokyo Tyrant — http://bit.ly/oU4uV ★ Voldemort – http://bit.ly/oU4uV
  • 5. Storage Metaphors ★ Document-Oriented Unstructured content; rich queries. ★ Examples: ★ CouchDB — http://bit.ly/JAgUM ★ MongoDB — http://bit.ly/HDDOV ★ SOLR — http://bit.ly/q4gyi ★ XML databases...
  • 7. Storage Metaphors ★ Column-Oriented Organized in columns; easily scanned. ★ Examples: ★ Cassandra* — http://bit.ly/EdUEt ★ BigTable — http://bit.ly/QqMYA (available within AppEngine) ★ HBase — http://bit.ly/Zck7F ★ SimpleDB — http://bit.ly/toh0P (Typica library for Java — http://bit.ly/22kxZ4)
  • 8. Column-Oriented Name Date Tweet Text Bob 20090506 Eating dinner. Dan 20090507 Is it Friday yet? Dan 20090506 Beer me! Ralph 20090508 My bum itches. Index Name Index Date Index Tweet Text 0 Bob 0 20090506 0 Eating dinner. 1 Dan 1 20090507 1 Is it Friday yet? 2 Dan 2 20090506 2 Beer me! 3 Ralph 3 20090508 3 My bum itches. Storage Storage Storage
  • 9. Every Store is Special. ★ Lots of different little tweaks to the storage model. ★ Widely varying levels of maturity. ★ Growing communities. ★ Limited (but growing) tooling, libraries, and production adoption.
  • 10. Reliability Through Replication ★ Consistent hashing to assign keys to partitions. ★ Partitions replicated on multiple nodes for redundancy. ★ Minimum number of successful reads to consider a write complete.
  • 11. Reliability Through Replication PUT (k,v) Client
  • 13. Stores ★ Tweets Individual tweets. ★ Friends’ Timeline Fixed-length timelines. ★ Users Info and followers. ★ Command Queue Actions to perform (tweet, follow, etc.).
  • 14. Data ★ Command (Java serialization) Keyed by node name, increasing ID. ★ Tweets (Java serialization) Keyed by user name, increasing ID. ★ FriendsTimeline (Java serialization) Keyed by username. List of date, tweet ID. ★ Users (Java serialization) Keyed by username. Followers (list), Followed (list), last tweet ID.
  • 15. Life of a Tweet, Part I 1 Beer me. Users 1.User tweets. 2 2.Find next tweet ID for 3 Commands Web Tier user. 3.Store “tweet Friends Timeline for user” command. Tweets
  • 16. Life of a Tweet, Part II Where's 1. Read next command. Demi with Users my beer?!? 2. Store tweet in user’s timeline 1 (Tweets). Commands 4 Web Tier 3. Store tweet ID in friends’ 3 timelines. Friends Timeline (Requires *many* operations.) 2 4. DELETE command. Tweets
  • 17. Some Patterns ★ “Sequences” are implemented as race-for-non-collision. ★ “Joins” are common keys or keys referenced from values. ★ “Transactions” are idempotent operations with DELETE at the end.
  • 18. Operations ★ Deploy to Amazon EC2 ★ 2 nodes for Voldemort ★ 2 nodes for Tomcat ★ 1 node for Cacti ★ All “small” instances w/RightScale CentOS 5.2 image. ★ Minor inconvenience of “EBS” volume for MySQL for Cacti. (follow Eric Hammond’s tutorial — http://bit.ly/OK5LZ)
  • 19. Deployment ★ Lots of choices for automated rollout (Chef, Capistrano, etc.) ★ Took simplest path — Maven build, Ant (scp/ssh and property substitution tasks), and bash scripts. for i in vn1 vn2; do ant -Dnode=${i} setup-v-node done ★ Takes ~30 seconds to provision a Tomcat or Voldemort node.
  • 20. Dashboarding ★ As above, lots of choices (Cacti — http://bit.ly/qV4gz, Graphite — http://bit.ly/466NAx, etc.) ★ Cacti as simplest choice. yum install -y cacti ★ Vanilla SNMP on nodes for host data. ★ Minimal extensions to Voldemort for stats in Cacti-friendly format.
  • 22. Performance ★ 270 req/sec for getFriendsTimeline against web tier. ★ 21 GETs on V stores to pull data. ★ 5600 req/sec for V is similar to performance reported at NoSQL meetup (20k req/sec) when adjusted for hardware. ★ Cache on the web tier could make this faster... ★ Some hassles when hammering individual keys with rapid updates.
  • 23. Take Aways ★ Linked-list representation deserves some thought (and experiments). Dynomite + Osmos (http://bit.ly/BYMdW) ★ Additional use cases (search, rich API, replies, direct messages, etc.) might alter design. ★ BigTable/HBase approach deserves another look. ★ Source code is available; come and git it. http://github.com/prb/bigbird git://github.com/prb/bigbird.git
  • 24. Coordinates ★ Dan Diephouse (@dandiep) dan@netzooid.com http://netzooid.com ★ Paul Brown (@paulrbrown) prb@mult.ifario.us http://mult.ifario.us/a