SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
LEARNING TO BUILD
DISTRIBUTED SYSTEMS
    THE HARD WAY


        @iconara
speakerdeck.com/u/iconara
          (real time!)
Theo / @iconara
Chief Architect at
let’s make online advertising a great experience
MAKING THIS
INTO THIS
HOW HARD CAN IT BE?
TRACKING
 AD IMPRESSIONS
      track page views and all their ads
track visibility and send updates on changes
  track events, track activity, sync cookies,
                  and track visits
LOADED
                      VISIBLE
                      HIDDEN




                       LOADED
                       VISIBLE




      track page views and all their ads
track visibility and send updates on changes
  track events, track activity, sync cookies,
                  and track visits
ASSEMBLING
            SESSIONS
   assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
 mix in demographics, revenue, and third-party data
WAS
                      HIDDEN
   BECAME                                                        {
                                         A CLICK!                    "user_id": "M9L6R5TD0YXK",
   ACTIVE                                                            "session_id": "MAI3QAGNAIYT",
                                                                     "timestamp": 1347896675038,
                                                                     "placement_name": "example",
                                                                     "category": "frontpage",
                                                                     "embed_url": "http://example.com/",
                                                                     "visible_duration": 1340
                                                                     "browser": "Chrome",
                                                                     "device_type": "computer",
            BECAME             BECAME
                                                                     "click": true,
                                                                     "ad_dimensions":"980x300"
WAS         VISIBLE                                              }
                               VISIBLE
LOADED
                               AGAIN

                                              3rd PARTY DATA &
                                              OTHER GOODIES


   assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
 mix in demographics, revenue, and third-party data
ANALYTICS
precompute metrics, count uniques,
 build visitor histories for attribution
precompute metrics, count uniques,
 build visitor histories for attribution
HOW HARD CAN IT BE?
25K REQUESTS
   PER SECOND
~1 billion requests per day, 1 TB raw data
ONE VISIT CAN
      CHANGE UP TO
     100K COUNTERS
hundreds of millions of individual counters per day,
   plus counting uniques and visitor histories
IN REAL TIME
or near real time, if you want to be pedantic
START WITH TWO
OF EVERYTHING
going from one to two is the hardest
GIVE A LOT OF
THOUGHT TO YOUR
  KEYS AND IDS
   it will save you lots of pain
a timestamp
              something random



MANLO0 JME57Z
  monotonically increasing,
       sorts nicely
something random
                   a timestamp



JME57Z MANLO0
      uniformly distributed,
    works nicely with sharding
PUT BUFFERS
 BETWEEN LAYERS
           queues can even out peaks,
       let you scale layers independently,
and let you restart services without loosing data
SEPARATE
  PROCESSING
 FROM STORAGE
that way you can scale each independently
×
                   × ××
                 × ×
                    ×

PLAN HOW TO GET
RID OF YOUR DATA
deleting stuff is harder than you might think
×


   NoDB
keep things streaming
STREAM
PARTITIONING
RANDOMLY
when you have no interdependencies
between things it’s easy to scale out
         (or round robin, it’s basically the same)
CONSISTENTLY
when there are interdependencies you need
to route using some property of the objects,
but make sure you get a uniform distribution
NUMEROLOGY
12
2 | 12
3 | 12
4 | 12
6 | 12
8 | 24
5 | 60
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
for maximal flexibility partition with multiples of 12
for maximal flexibility partition with multiples of 12
A SHORT
 DIVERSION ABOUT
  COUNTING TO 60
the reason why there’s 60 seconds to a minute,
          and 360 degrees to a circle
3 SEGMENTS
ON EACH FINGER



                 = 12
3 SEGMENTS
ON EACH FINGER



                  = 12




                  FIVE FINGERS
                 ON OTHER HAND

                             = 60
log2(366) ≈ 31
$-$
(ASCII code 36)-----
log2(366) ≈ 31
log2(366) ≈ 31
six characters 0-9, A-Z can represent 31 bits,
which is kind of almost very close to four bytes
MANLO0
Time.now.to_i.to_s(36).upcase




     MANLO0
       a timestamp
DO YOU REALLY
  NEED A BACKUP?
      if you got 3x replication over multiple
availability zones, is that backup really worth it?
PRODUCTION IS THE
  ONLY REAL TEST
   ENVIRONMENT
  when thousands of things happen every second,
new, weird and unforeseen things happen all the time,
          no test can anticipate everything
        (but testing is good anyway, just don’t think you got everything covered)
KTHXBAI
        @iconara
   github.com/iconara
architecturalatrocities.com
       burtcorp.com
COME TO SWEDEN
   IN MARCH AND
TALK ABOUT BIG DATA
  scandevconf.se/2013/call-for-proposals
IDEMPOTENCE
f(f(x)) = f(x)
doing something again doesn’t change the outcome
IDEMPOTENCE
 if you don’t have to worry about things accidentally
happening twice, everything becomes much simpler
COUNTING UNIQUES
when adding to a set it doesn’t matter how many
   times you do it, the end result is the same
INC X VS SET X
increments are not idempotent, and very scary,
if you can avoid non-idempotent operations, try
KTHXBAI
        @iconara
   github.com/iconara
architecturalatrocities.com
       burtcorp.com

Contenu connexe

Similaire à Learning to Build Distributed Systems the Hard Way

Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
MongoSF
 

Similaire à Learning to Build Distributed Systems the Hard Way (20)

Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
MongoDb and Windows Azure
MongoDb and Windows AzureMongoDb and Windows Azure
MongoDb and Windows Azure
 
Life on Clouds: a forensics overview
Life on Clouds: a forensics overviewLife on Clouds: a forensics overview
Life on Clouds: a forensics overview
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
 
Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.ly
 
How We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating SystemHow We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating System
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and compositionBuild 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Bridging the future gap
Bridging the future gap Bridging the future gap
Bridging the future gap
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
Startup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformStartup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platform
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti Serverless
 
Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik
 

Plus de Theo Hultberg

Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard way
Theo Hultberg
 

Plus de Theo Hultberg (10)

AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Cassandra for all the Things
Cassandra for all the ThingsCassandra for all the Things
Cassandra for all the Things
 
Building a CQL driver
Building a CQL driverBuilding a CQL driver
Building a CQL driver
 
Chasing the elephant
Chasing the elephantChasing the elephant
Chasing the elephant
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard way
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard way
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Concurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyConcurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRuby
 
Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB
 
Standing on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyStanding on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRuby
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Learning to Build Distributed Systems the Hard Way

  • 1. LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara
  • 5. let’s make online advertising a great experience
  • 8. HOW HARD CAN IT BE?
  • 9. TRACKING AD IMPRESSIONS track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 10. LOADED VISIBLE HIDDEN LOADED VISIBLE track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 11. ASSEMBLING SESSIONS assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 12. WAS HIDDEN BECAME { A CLICK! "user_id": "M9L6R5TD0YXK", ACTIVE "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", BECAME BECAME "click": true, "ad_dimensions":"980x300" WAS VISIBLE } VISIBLE LOADED AGAIN 3rd PARTY DATA & OTHER GOODIES assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 13. ANALYTICS precompute metrics, count uniques, build visitor histories for attribution
  • 14. precompute metrics, count uniques, build visitor histories for attribution
  • 15. HOW HARD CAN IT BE?
  • 16. 25K REQUESTS PER SECOND ~1 billion requests per day, 1 TB raw data
  • 17. ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of millions of individual counters per day, plus counting uniques and visitor histories
  • 18. IN REAL TIME or near real time, if you want to be pedantic
  • 19. START WITH TWO OF EVERYTHING going from one to two is the hardest
  • 20. GIVE A LOT OF THOUGHT TO YOUR KEYS AND IDS it will save you lots of pain
  • 21. a timestamp something random MANLO0 JME57Z monotonically increasing, sorts nicely
  • 22. something random a timestamp JME57Z MANLO0 uniformly distributed, works nicely with sharding
  • 23. PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let you scale layers independently, and let you restart services without loosing data
  • 24. SEPARATE PROCESSING FROM STORAGE that way you can scale each independently
  • 25. × × ×× × × × PLAN HOW TO GET RID OF YOUR DATA deleting stuff is harder than you might think
  • 26. × NoDB keep things streaming
  • 28. RANDOMLY when you have no interdependencies between things it’s easy to scale out (or round robin, it’s basically the same)
  • 29. CONSISTENTLY when there are interdependencies you need to route using some property of the objects, but make sure you get a uniform distribution
  • 31. 12
  • 32. 2 | 12 3 | 12 4 | 12 6 | 12
  • 33. 8 | 24 5 | 60
  • 34. 12, 60, 120, 360 superior highly composite numbers
  • 35. 12, 60, 120, 360 superior highly composite numbers
  • 36. 12, 60, 120, 360 superior highly composite numbers
  • 37. 12, 60, 120, 360 superior highly composite numbers
  • 38. 12, 60, 120, 360 superior highly composite numbers
  • 39. 12, 60, 120, 360 superior highly composite numbers
  • 40. 12, 60, 120, 360 superior highly composite numbers
  • 41. 12, 60, 120, 360 superior highly composite numbers
  • 42. for maximal flexibility partition with multiples of 12
  • 43. for maximal flexibility partition with multiples of 12
  • 44. A SHORT DIVERSION ABOUT COUNTING TO 60 the reason why there’s 60 seconds to a minute, and 360 degrees to a circle
  • 45. 3 SEGMENTS ON EACH FINGER = 12
  • 46. 3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60
  • 50. log2(366) ≈ 31 six characters 0-9, A-Z can represent 31 bits, which is kind of almost very close to four bytes
  • 52. Time.now.to_i.to_s(36).upcase MANLO0 a timestamp
  • 53. DO YOU REALLY NEED A BACKUP? if you got 3x replication over multiple availability zones, is that backup really worth it?
  • 54. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second, new, weird and unforeseen things happen all the time, no test can anticipate everything (but testing is good anyway, just don’t think you got everything covered)
  • 55. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com
  • 56. COME TO SWEDEN IN MARCH AND TALK ABOUT BIG DATA scandevconf.se/2013/call-for-proposals
  • 58. f(f(x)) = f(x) doing something again doesn’t change the outcome
  • 59. IDEMPOTENCE if you don’t have to worry about things accidentally happening twice, everything becomes much simpler
  • 60. COUNTING UNIQUES when adding to a set it doesn’t matter how many times you do it, the end result is the same
  • 61. INC X VS SET X increments are not idempotent, and very scary, if you can avoid non-idempotent operations, try
  • 62. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com