SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
LEARNING TO BUILD
DISTRIBUTED SYSTEMS
    THE HARD WAY


        @iconara
speakerdeck.com/u/iconara
          (real time!)
Theo / @iconara
Chief Architect at
let’s make online advertising a great experience
MAKING THIS
INTO THIS
HOW HARD CAN IT BE?
TRACKING
 AD IMPRESSIONS
      track page views and all their ads
track visibility and send updates on changes
  track events, track activity, sync cookies,
                  and track visits
LOADED
                      VISIBLE
                      HIDDEN




                       LOADED
                       VISIBLE




      track page views and all their ads
track visibility and send updates on changes
  track events, track activity, sync cookies,
                  and track visits
ASSEMBLING
            SESSIONS
   assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
 mix in demographics, revenue, and third-party data
WAS
                      HIDDEN
   BECAME                                                        {
                                         A CLICK!                    "user_id": "M9L6R5TD0YXK",
   ACTIVE                                                            "session_id": "MAI3QAGNAIYT",
                                                                     "timestamp": 1347896675038,
                                                                     "placement_name": "example",
                                                                     "category": "frontpage",
                                                                     "embed_url": "http://example.com/",
                                                                     "visible_duration": 1340
                                                                     "browser": "Chrome",
                                                                     "device_type": "computer",
            BECAME             BECAME
                                                                     "click": true,
                                                                     "ad_dimensions":"980x300"
WAS         VISIBLE                                              }
                               VISIBLE
LOADED
                               AGAIN

                                              3rd PARTY DATA &
                                              OTHER GOODIES


   assemble ad impressions, page views and visits,
to be able to calculate things like total visible duration
 mix in demographics, revenue, and third-party data
ANALYTICS
precompute metrics, count uniques,
 build visitor histories for attribution
precompute metrics, count uniques,
 build visitor histories for attribution
HOW HARD CAN IT BE?
25K REQUESTS
   PER SECOND
~1 billion requests per day, 1 TB raw data
ONE VISIT CAN
      CHANGE UP TO
     100K COUNTERS
hundreds of millions of individual counters per day,
   plus counting uniques and visitor histories
IN REAL TIME
or near real time, if you want to be pedantic
START WITH TWO
OF EVERYTHING
going from one to two is the hardest
GIVE A LOT OF
THOUGHT TO YOUR
  KEYS AND IDS
   it will save you lots of pain
a timestamp
              something random



MANLO0 JME57Z
  monotonically increasing,
       sorts nicely
something random
                   a timestamp



JME57Z MANLO0
      uniformly distributed,
    works nicely with sharding
PUT BUFFERS
 BETWEEN LAYERS
           queues can even out peaks,
       let you scale layers independently,
and let you restart services without loosing data
SEPARATE
  PROCESSING
 FROM STORAGE
that way you can scale each independently
×
                   × ××
                 × ×
                    ×

PLAN HOW TO GET
RID OF YOUR DATA
deleting stuff is harder than you might think
×


   NoDB
keep things streaming
STREAM
PARTITIONING
RANDOMLY
when you have no interdependencies
between things it’s easy to scale out
         (or round robin, it’s basically the same)
CONSISTENTLY
when there are interdependencies you need
to route using some property of the objects,
but make sure you get a uniform distribution
NUMEROLOGY
12
2 | 12
3 | 12
4 | 12
6 | 12
8 | 24
5 | 60
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
12, 60, 120, 360
superior highly composite numbers
for maximal flexibility partition with multiples of 12
for maximal flexibility partition with multiples of 12
A SHORT
 DIVERSION ABOUT
  COUNTING TO 60
the reason why there’s 60 seconds to a minute,
          and 360 degrees to a circle
3 SEGMENTS
ON EACH FINGER



                 = 12
3 SEGMENTS
ON EACH FINGER



                  = 12




                  FIVE FINGERS
                 ON OTHER HAND

                             = 60
log2(366) ≈ 31
$-$
(ASCII code 36)-----
log2(366) ≈ 31
log2(366) ≈ 31
six characters 0-9, A-Z can represent 31 bits,
which is kind of almost very close to four bytes
MANLO0
Time.now.to_i.to_s(36).upcase




     MANLO0
       a timestamp
DO YOU REALLY
  NEED A BACKUP?
      if you got 3x replication over multiple
availability zones, is that backup really worth it?
PRODUCTION IS THE
  ONLY REAL TEST
   ENVIRONMENT
  when thousands of things happen every second,
new, weird and unforeseen things happen all the time,
          no test can anticipate everything
        (but testing is good anyway, just don’t think you got everything covered)
KTHXBAI
        @iconara
   github.com/iconara
architecturalatrocities.com
       burtcorp.com
COME TO SWEDEN
   IN MARCH AND
TALK ABOUT BIG DATA
  scandevconf.se/2013/call-for-proposals
IDEMPOTENCE
f(f(x)) = f(x)
doing something again doesn’t change the outcome
IDEMPOTENCE
 if you don’t have to worry about things accidentally
happening twice, everything becomes much simpler
COUNTING UNIQUES
when adding to a set it doesn’t matter how many
   times you do it, the end result is the same
INC X VS SET X
increments are not idempotent, and very scary,
if you can avoid non-idempotent operations, try
KTHXBAI
        @iconara
   github.com/iconara
architecturalatrocities.com
       burtcorp.com

Contenu connexe

Similaire à Learning to Build Distributed Systems the Hard Way

Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBigDataExpo
 
Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyMészáros József
 
How We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating SystemHow We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating Systemsaulius_vl
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Chris Grabosky
 
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and compositionBuild 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and compositionWindows Developer
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Microsoft
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardGeorg Sorst
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkAnirudh Todi
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...Amazon Web Services
 
Startup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformStartup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformMészáros József
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessLuciano Mammino
 
Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik MongoDB
 

Similaire à Learning to Build Distributed Systems the Hard Way (20)

Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
MongoDb and Windows Azure
MongoDb and Windows AzureMongoDb and Windows Azure
MongoDb and Windows Azure
 
Life on Clouds: a forensics overview
Life on Clouds: a forensics overviewLife on Clouds: a forensics overview
Life on Clouds: a forensics overview
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
 
Budapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.lyBudapest Spark Meetup - Apache Spark @enbrite.ly
Budapest Spark Meetup - Apache Spark @enbrite.ly
 
How We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating SystemHow We Learned To Love The Data Center Operating System
How We Learned To Love The Data Center Operating System
 
Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization Docker Summit MongoDB - Data Democratization
Docker Summit MongoDB - Data Democratization
 
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and compositionBuild 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
Build 2017 - B8100 - What's new and coming for Windows UI: XAML and composition
 
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !Elasticsearch sur Azure : Make sense of your (BIG) data !
Elasticsearch sur Azure : Make sense of your (BIG) data !
 
ELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboardELK Stack - Turn boring logfiles into sexy dashboard
ELK Stack - Turn boring logfiles into sexy dashboard
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Bridging the future gap
Bridging the future gap Bridging the future gap
Bridging the future gap
 
TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
0 to 60 with AWS AppSync: Rapid Development Techniques for Mobile APIs (MOB32...
 
Startup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platformStartup Safary | Fight against robots with enbrite.ly data platform
Startup Safary | Fight against robots with enbrite.ly data platform
 
Semplificare l'observability per progetti Serverless
Semplificare l'observability per progetti ServerlessSemplificare l'observability per progetti Serverless
Semplificare l'observability per progetti Serverless
 
Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik Keynote - Speaker: Grigori Melnik
Keynote - Speaker: Grigori Melnik
 

Plus de Theo Hultberg

AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost OptimizationTheo Hultberg
 
Cassandra for all the Things
Cassandra for all the ThingsCassandra for all the Things
Cassandra for all the ThingsTheo Hultberg
 
Building a CQL driver
Building a CQL driverBuilding a CQL driver
Building a CQL driverTheo Hultberg
 
Chasing the elephant
Chasing the elephantChasing the elephant
Chasing the elephantTheo Hultberg
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard wayTheo Hultberg
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard wayTheo Hultberg
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionTheo Hultberg
 
Concurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyConcurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyTheo Hultberg
 
Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB Theo Hultberg
 
Standing on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyStanding on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyTheo Hultberg
 

Plus de Theo Hultberg (10)

AWS Cost Optimization
AWS Cost OptimizationAWS Cost Optimization
AWS Cost Optimization
 
Cassandra for all the Things
Cassandra for all the ThingsCassandra for all the Things
Cassandra for all the Things
 
Building a CQL driver
Building a CQL driverBuilding a CQL driver
Building a CQL driver
 
Chasing the elephant
Chasing the elephantChasing the elephant
Chasing the elephant
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard way
 
Learning to build distributed systems the hard way
Learning to build distributed systems the hard wayLearning to build distributed systems the hard way
Learning to build distributed systems the hard way
 
A Guide to the Post Relational Revolution
A Guide to the Post Relational RevolutionA Guide to the Post Relational Revolution
A Guide to the Post Relational Revolution
 
Concurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRubyConcurrency and Distributed Systems Using JRuby
Concurrency and Distributed Systems Using JRuby
 
Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB Shortcuts around the mistakes I've made scaling MongoDB
Shortcuts around the mistakes I've made scaling MongoDB
 
Standing on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRubyStanding on the shoulders of giants with JRuby
Standing on the shoulders of giants with JRuby
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Learning to Build Distributed Systems the Hard Way

  • 1. LEARNING TO BUILD DISTRIBUTED SYSTEMS THE HARD WAY @iconara
  • 5. let’s make online advertising a great experience
  • 8. HOW HARD CAN IT BE?
  • 9. TRACKING AD IMPRESSIONS track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 10. LOADED VISIBLE HIDDEN LOADED VISIBLE track page views and all their ads track visibility and send updates on changes track events, track activity, sync cookies, and track visits
  • 11. ASSEMBLING SESSIONS assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 12. WAS HIDDEN BECAME { A CLICK! "user_id": "M9L6R5TD0YXK", ACTIVE "session_id": "MAI3QAGNAIYT", "timestamp": 1347896675038, "placement_name": "example", "category": "frontpage", "embed_url": "http://example.com/", "visible_duration": 1340 "browser": "Chrome", "device_type": "computer", BECAME BECAME "click": true, "ad_dimensions":"980x300" WAS VISIBLE } VISIBLE LOADED AGAIN 3rd PARTY DATA & OTHER GOODIES assemble ad impressions, page views and visits, to be able to calculate things like total visible duration mix in demographics, revenue, and third-party data
  • 13. ANALYTICS precompute metrics, count uniques, build visitor histories for attribution
  • 14. precompute metrics, count uniques, build visitor histories for attribution
  • 15. HOW HARD CAN IT BE?
  • 16. 25K REQUESTS PER SECOND ~1 billion requests per day, 1 TB raw data
  • 17. ONE VISIT CAN CHANGE UP TO 100K COUNTERS hundreds of millions of individual counters per day, plus counting uniques and visitor histories
  • 18. IN REAL TIME or near real time, if you want to be pedantic
  • 19. START WITH TWO OF EVERYTHING going from one to two is the hardest
  • 20. GIVE A LOT OF THOUGHT TO YOUR KEYS AND IDS it will save you lots of pain
  • 21. a timestamp something random MANLO0 JME57Z monotonically increasing, sorts nicely
  • 22. something random a timestamp JME57Z MANLO0 uniformly distributed, works nicely with sharding
  • 23. PUT BUFFERS BETWEEN LAYERS queues can even out peaks, let you scale layers independently, and let you restart services without loosing data
  • 24. SEPARATE PROCESSING FROM STORAGE that way you can scale each independently
  • 25. × × ×× × × × PLAN HOW TO GET RID OF YOUR DATA deleting stuff is harder than you might think
  • 26. × NoDB keep things streaming
  • 28. RANDOMLY when you have no interdependencies between things it’s easy to scale out (or round robin, it’s basically the same)
  • 29. CONSISTENTLY when there are interdependencies you need to route using some property of the objects, but make sure you get a uniform distribution
  • 31. 12
  • 32. 2 | 12 3 | 12 4 | 12 6 | 12
  • 33. 8 | 24 5 | 60
  • 34. 12, 60, 120, 360 superior highly composite numbers
  • 35. 12, 60, 120, 360 superior highly composite numbers
  • 36. 12, 60, 120, 360 superior highly composite numbers
  • 37. 12, 60, 120, 360 superior highly composite numbers
  • 38. 12, 60, 120, 360 superior highly composite numbers
  • 39. 12, 60, 120, 360 superior highly composite numbers
  • 40. 12, 60, 120, 360 superior highly composite numbers
  • 41. 12, 60, 120, 360 superior highly composite numbers
  • 42. for maximal flexibility partition with multiples of 12
  • 43. for maximal flexibility partition with multiples of 12
  • 44. A SHORT DIVERSION ABOUT COUNTING TO 60 the reason why there’s 60 seconds to a minute, and 360 degrees to a circle
  • 45. 3 SEGMENTS ON EACH FINGER = 12
  • 46. 3 SEGMENTS ON EACH FINGER = 12 FIVE FINGERS ON OTHER HAND = 60
  • 50. log2(366) ≈ 31 six characters 0-9, A-Z can represent 31 bits, which is kind of almost very close to four bytes
  • 52. Time.now.to_i.to_s(36).upcase MANLO0 a timestamp
  • 53. DO YOU REALLY NEED A BACKUP? if you got 3x replication over multiple availability zones, is that backup really worth it?
  • 54. PRODUCTION IS THE ONLY REAL TEST ENVIRONMENT when thousands of things happen every second, new, weird and unforeseen things happen all the time, no test can anticipate everything (but testing is good anyway, just don’t think you got everything covered)
  • 55. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com
  • 56. COME TO SWEDEN IN MARCH AND TALK ABOUT BIG DATA scandevconf.se/2013/call-for-proposals
  • 58. f(f(x)) = f(x) doing something again doesn’t change the outcome
  • 59. IDEMPOTENCE if you don’t have to worry about things accidentally happening twice, everything becomes much simpler
  • 60. COUNTING UNIQUES when adding to a set it doesn’t matter how many times you do it, the end result is the same
  • 61. INC X VS SET X increments are not idempotent, and very scary, if you can avoid non-idempotent operations, try
  • 62. KTHXBAI @iconara github.com/iconara architecturalatrocities.com burtcorp.com