SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
abstractions at scale
            our experiences at twitter

            marius a. eriksen
            @marius
            QConSF, November 2010




                                         TM




Sunday, November 14, 2010
twitter
         ‣     real-time information network
               ‣    70M tweets/day (800/s)
               ‣    150M users
               ‣    70k API calls/s




Sunday, November 14, 2010
agenda
         ‣     scale & scalability
         ‣     the role of abstraction
         ‣     good abstractions, bad abstractions
         ‣     abstractions & scale
         ‣     examples
         ‣     “just right” APIs
         ‣     conclusions



Sunday, November 14, 2010
scale & scalability



         “Scalability is a desirable property of a system, a
         network, or a process, which indicates its ability to
         either handle growing amounts of work in a
         graceful manner or to be readily enlarged”

         (Wikipedia)




Sunday, November 14, 2010
scale & scalability (cont’d)

         ‣     only “horizontal” scaling allows unbounded
               growth
               ‣    not entirely true: eg. due to network effects
               ‣    not a panacea
         ‣     “vertical” scaling is often desirable & required
               ‣    contain costs
               ‣    curtail network effects


Sunday, November 14, 2010
scale & scalability (cont’d)
         ‣     the target architecture is the datacenter
               ‣     network is a critical component
               ‣     deeper storage hierarchy
               ‣     higher performance variance
               ‣     complex failure modes
               ‣     but our programming models don’t account
                     for these resource & failure models explicitly




Sunday, November 14, 2010
abstraction
         “freedom from representational qualities”


         ‣     the chief purpose of abstraction is to manage
               complexity & provide composability
         ‣     in software, abstraction is manifested through
               common interfaces
               ‣     explicit semantics
               ‣     implicit “contracts”


Sunday, November 14, 2010
abstraction (cont’d)



         ‣     as systems become more complex, abstraction
               becomes increasingly important
               ‣    especially as number of engineers grow
         ‣     modern systems are highly complex and are
               highly abstracted




Sunday, November 14, 2010
type systems
         ‣     [static] type systems can encode some of the
               contracts for us
               ‣     giving us static guarantees
               ‣     academia is pushing the envelope here with
                     dependent types
               ‣     they also compose


         ‣     the line between type & program becomes
               blurred


Sunday, November 14, 2010
good abstraction? your CPU
         ‣     x86-64 is a spec
               ‣     you don’t care if it’s provided by AMD or Intel
         ‣     excepting a few compiler & OS authors, most of
               you don’t think about
               ‣     pipelining
               ‣     out of order & speculative execution
               ‣     branch prediction
               ‣     cache coherency
               ‣     etc…

Sunday, November 14, 2010
good …? your memory hierarchy
         ‣     you don’t interface with it directly
         ‣     purist view: addressable memory cells
         ‣     reality: has scary-good optimizations for
               common access patterns. highly optimized.
         ‣     you don’t think (often) about:
               ‣     cache locality
               ‣     TLB effects
               ‣     MMU ops scheduling


Sunday, November 14, 2010
bad abstraction? ia64
         ‣     (at least initially) compilers couldn’t live up to it
               ‣     hardware promise was delegated to the
                     compiler
               ‣     compilers failed to reliably produce sufficiently
                     fast code
               ‣     abstraction was broken
               ‣     good for certain scientific computing domains




Sunday, November 14, 2010
a lens
         ‣     scaling issues occur when abstractions become
               leaky
               ‣     RDBMS fails to perform sophisticated queries
                     on highly normalized data
               ‣     your GC thrashes after a certain allocation
                     volume
               ‣     OS thread scheduling becomes unviable after
                     N × 1000 threads are created




Sunday, November 14, 2010
¶ threads
         ‣     threads offer a familiar and linear model of execution
               ‣     scheduling overhead becomes important after a
                     certain amount of parallelism
               ‣     stack allocation can become troublesome
               ‣     fails to be explicit about latency, backpressure
         ‣     alternative: asynchronous programming
               ‣     makes queuing, latency explicit
               ‣     allows SEDA-style control
         ‣     a compromise? LWT

Sunday, November 14, 2010
¶ sequence abstractions
         ‣     produces concise, beautiful, composable code


              trait Places extends Seq[Place]
              places.chunks(5000).map(_.toList).parForeach { chunk =>
                   …
              }


         ‣     access patterns aren’t propagated down the
               stack
         ‣     missed optimizations

Sunday, November 14, 2010
¶ RDBMS
         ‣     are [by definition] generic
         ‣     encourage normalized data storage
               ‣     very powerful data model
               ‣     little need to know access patterns a priori
         ‣     provide general (magical) querying mechanics
               ‣     bag of tricks: query planning, table statistics,
                     covering indices




Sunday, November 14, 2010
¶ RDBMS
         ‣     at scale, the most viable strategy is: What You
               Serve Is What You Store (WYSIWYS)
               ‣     or at least very close
         ‣     this brings about a whole host of new problems
               ‣     data (in)consistency
               ‣     multiple indices
               ‣     “re-normalization”




Sunday, November 14, 2010
¶ RDBMS
         ‣     at-scale, querying is highly predictable, most of
               the time:
               ‣     don’t need fancy query planning
               ‣     don’t need statistics
         ‣     in fact, we know a-priori how to efficiently query
               the underlying datastructures
               ‣     wish: don’t give me a query engine, give me
                     primitives!
               ‣     maybe there’s a “just right” API here


Sunday, November 14, 2010
¶ in-memory representations
         ‣     having tight control over representation is often
               crucial to resource utilization
               ‣     [space vs. time] memory bandwidth is
                     precious, CPU is plentiful
               ‣     cache locality can often make an enormous
                     difference — even to the point of less code is
                     better than more efficient code(!)
         ‣     at odds with modern GC’d languages automatic
               memory management & layout



Sunday, November 14, 2010
¶ in-memory representations
         ‣     optimize memory layout
         ‣     pack data
         ‣     compression
               ‣     varint, difference, zigzag, etc.
         ‣     L1:main memory latency ≈ 1:200 (!)


         ‣     example: geometry of Canada ~ jts normalized,
               vs. WKB
               ‣     wkb is ≈ 600 KB, JTS representation ≈ 2-3MB

Sunday, November 14, 2010
¶ garbage collection
         ‣     we love garbage collection
         ‣     attempts to encode common patterns:
               generational hypothesis
               ‣     not always quite right
               ‣     the application almost always has some idea
                     about object lifetime & semantics
         ‣     proposal: talk to each other!
               ‣     backpressure, thresholding, application-guided
                     GC


Sunday, November 14, 2010
¶ virtual memory


                               “You’re Doing it Wrong”
                      Poul-Henning Kamp, ACM Queue, June 2010




          "… Varnish does not ignore the fact that memory is virtual;
                           it actively exploits it”




Sunday, November 14, 2010
¶ virtual memory


         ‣     maybe he is doing it wrong?
         ‣     varnish uses data structures designed to
               anticipate virtual memory layout & behavior
               ‣    translates application semantics (eg. LRU)
         ‣     instead, you could have direct control over those
               resources




Sunday, November 14, 2010
“just right” abstractions
         ‣     high level abstractions are absolutely necessary
               to deal with today’s complex systems
         ‣     but providing good abstractions is hard
         ‣     what are the “just right” abstractions?
               ‣     exploit common patterns
               ‣     give enough degrees of freedom to the
                     underlying platform
               ‣     usually target a narrow(er) domain
               ‣     retain high level interfaces

Sunday, November 14, 2010
¶ mapreduce
               def map(datum):
                 words = {}
                 for word in parse_words(datum):
                   word[word] += 1
                 for (word, count) in words.items():
                   output(word, count)

               def reduce(key, values):
                 output(key, mean(values))
              ‣     much freedom is given to the scheduler
              ‣     exploits data locality (predictably)
Sunday, November 14, 2010
¶ shared-nothing web apps



               def handle(request):
                 return Response(
                   “hello %s!” % request.get_user())


              ‣     eg: google’s app engine, django, rails, etc




Sunday, November 14, 2010
¶ bigtable
         ‣     very simple data model
               ‣     but composable — effectively every other
                     database squeezes (more) sophisticated data
                     models down to 1 dimensional storage(s)
         ‣     explicit memory hierarchy (pinning column
               families to memory)
         ‣     provides load balancer/scheduler much freedom
         ‣     only magic: compactions. challenge: resource
               isolation.


Sunday, November 14, 2010
¶ LWT
                 lwt ai = Lwt_lib.getaddrinfo
                   "localhost" "8080"
                   [Unix.AI_FAMILY Unix.PF_INET;
                    Unix.AI_SOCKTYPE Unix.SOCK_STREAM] in

                 lwt (input, output) =
                   match ai with
                     | [] -> fail Not_found
                     | a :: _ -> Lwt_io.open_connection
                                   a.Unix.ai_addr in

                 Lwt_io.write output "GET / HTTP/1.1rnrn" >>
                 Lwt_io.read input




Sunday, November 14, 2010
theme
         ‣     provide a programming model that provide a
               narrow (but flexible) interface to resources
               ‣     mapreduce
               ‣     shared-nothing web apps
         ‣     provide a programming model that make
               resources explicit
               ‣     bigtable
               ‣     LWT



Sunday, November 14, 2010
meta pattern(s)


         ‣     addressing separation of concerns:
               ‣    (asynchronous) execution policy vs.
                    (synchronous) application logic
               ‣    data locality vs. data operations
               ‣    data model vs. data distribution
               ‣    data locality vs. data model



Sunday, November 14, 2010
the future?
         ‣     database systems
         ‣     search systems
         ‣     ... or any online query system?


         ‣     some academic work already in this area:
               ‣     OPIS (distributed arrows w/ combinators)
               ‣     ypnos (grid compiler)
               ‣     skywriting (scripted dataflow)


Sunday, November 14, 2010
conclusions
         ‣     we need high level abstractions
               ‣     they are simply necessary
               ‣     allows us to develop faster and safer
         ‣     many high level abstractions aren’t “just right”
               ‣     can become highly inoptimal (often orders of
                     magnitudes can be reclaimed)
         ‣     some systems do provide good compromises
               ‣     makes resources explicit
         ‣     the future is exciting!

Sunday, November 14, 2010
that’s it!



         ‣     follow me: @marius
         ‣     marius@twitter.com




Sunday, November 14, 2010

Contenu connexe

Similaire à Abstractions at Scale – Our Experiences at Twitter

AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionDaniel Norman
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsAndy Gross
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Javalucenerevolution
 
Mark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa GeekMark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa Geekdeimos
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackRyan Aydelott
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudAtlassian
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleMayaData
 
An Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesAn Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesConFoo
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL DatabaseSteve Min
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Alluxio, Inc.
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupMayaData Inc
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupGianmario Spacagna
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsPiyush Katariya
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Onyxfish
 

Similaire à Abstractions at Scale – Our Experiences at Twitter (20)

AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
Riak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard ProblemsRiak Use Cases : Dissecting The Solutions To Hard Problems
Riak Use Cases : Dissecting The Solutions To Hard Problems
 
DEVIEW 2013
DEVIEW 2013DEVIEW 2013
DEVIEW 2013
 
Challenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in JavaChallenges in Maintaining a High Performance Search Engine Written in Java
Challenges in Maintaining a High Performance Search Engine Written in Java
 
Mark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa GeekMark Little Fence Sitting Soa Geek
Mark Little Fence Sitting Soa Geek
 
At the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with OpenstackAt the Crossroads of HPC and Cloud Computing with Openstack
At the Crossroads of HPC and Cloud Computing with Openstack
 
!Chaos Control System
!Chaos Control System!Chaos Control System
!Chaos Control System
 
Inside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private CloudInside the Atlassian OnDemand Private Cloud
Inside the Atlassian OnDemand Private Cloud
 
Solving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps styleSolving k8s persistent workloads using k8s DevOps style
Solving k8s persistent workloads using k8s DevOps style
 
An Overview of Flash Storage for Databases
An Overview of Flash Storage for DatabasesAn Overview of Flash Storage for Databases
An Overview of Flash Storage for Databases
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
Nuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo ApplicationsNuxeo World Session: Scaling Nuxeo Applications
Nuxeo World Session: Scaling Nuxeo Applications
 
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
Accelerating Machine Learning Pipelines with Alluxio at Alluxio Meetup 2016
 
Container Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris MeetupContainer Attached Storage with OpenEBS - CNCF Paris Meetup
Container Attached Storage with OpenEBS - CNCF Paris Meetup
 
Logical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetupLogical-DataWarehouse-Alluxio-meetup
Logical-DataWarehouse-Alluxio-meetup
 
JavaScript for Enterprise Applications
JavaScript for Enterprise ApplicationsJavaScript for Enterprise Applications
JavaScript for Enterprise Applications
 
Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.Non-Relational Databases: This hurts. I like it.
Non-Relational Databases: This hurts. I like it.
 

Dernier

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Dernier (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Abstractions at Scale – Our Experiences at Twitter

  • 1. abstractions at scale our experiences at twitter marius a. eriksen @marius QConSF, November 2010 TM Sunday, November 14, 2010
  • 2. twitter ‣ real-time information network ‣ 70M tweets/day (800/s) ‣ 150M users ‣ 70k API calls/s Sunday, November 14, 2010
  • 3. agenda ‣ scale & scalability ‣ the role of abstraction ‣ good abstractions, bad abstractions ‣ abstractions & scale ‣ examples ‣ “just right” APIs ‣ conclusions Sunday, November 14, 2010
  • 4. scale & scalability “Scalability is a desirable property of a system, a network, or a process, which indicates its ability to either handle growing amounts of work in a graceful manner or to be readily enlarged” (Wikipedia) Sunday, November 14, 2010
  • 5. scale & scalability (cont’d) ‣ only “horizontal” scaling allows unbounded growth ‣ not entirely true: eg. due to network effects ‣ not a panacea ‣ “vertical” scaling is often desirable & required ‣ contain costs ‣ curtail network effects Sunday, November 14, 2010
  • 6. scale & scalability (cont’d) ‣ the target architecture is the datacenter ‣ network is a critical component ‣ deeper storage hierarchy ‣ higher performance variance ‣ complex failure modes ‣ but our programming models don’t account for these resource & failure models explicitly Sunday, November 14, 2010
  • 7. abstraction “freedom from representational qualities” ‣ the chief purpose of abstraction is to manage complexity & provide composability ‣ in software, abstraction is manifested through common interfaces ‣ explicit semantics ‣ implicit “contracts” Sunday, November 14, 2010
  • 8. abstraction (cont’d) ‣ as systems become more complex, abstraction becomes increasingly important ‣ especially as number of engineers grow ‣ modern systems are highly complex and are highly abstracted Sunday, November 14, 2010
  • 9. type systems ‣ [static] type systems can encode some of the contracts for us ‣ giving us static guarantees ‣ academia is pushing the envelope here with dependent types ‣ they also compose ‣ the line between type & program becomes blurred Sunday, November 14, 2010
  • 10. good abstraction? your CPU ‣ x86-64 is a spec ‣ you don’t care if it’s provided by AMD or Intel ‣ excepting a few compiler & OS authors, most of you don’t think about ‣ pipelining ‣ out of order & speculative execution ‣ branch prediction ‣ cache coherency ‣ etc… Sunday, November 14, 2010
  • 11. good …? your memory hierarchy ‣ you don’t interface with it directly ‣ purist view: addressable memory cells ‣ reality: has scary-good optimizations for common access patterns. highly optimized. ‣ you don’t think (often) about: ‣ cache locality ‣ TLB effects ‣ MMU ops scheduling Sunday, November 14, 2010
  • 12. bad abstraction? ia64 ‣ (at least initially) compilers couldn’t live up to it ‣ hardware promise was delegated to the compiler ‣ compilers failed to reliably produce sufficiently fast code ‣ abstraction was broken ‣ good for certain scientific computing domains Sunday, November 14, 2010
  • 13. a lens ‣ scaling issues occur when abstractions become leaky ‣ RDBMS fails to perform sophisticated queries on highly normalized data ‣ your GC thrashes after a certain allocation volume ‣ OS thread scheduling becomes unviable after N × 1000 threads are created Sunday, November 14, 2010
  • 14. ¶ threads ‣ threads offer a familiar and linear model of execution ‣ scheduling overhead becomes important after a certain amount of parallelism ‣ stack allocation can become troublesome ‣ fails to be explicit about latency, backpressure ‣ alternative: asynchronous programming ‣ makes queuing, latency explicit ‣ allows SEDA-style control ‣ a compromise? LWT Sunday, November 14, 2010
  • 15. ¶ sequence abstractions ‣ produces concise, beautiful, composable code trait Places extends Seq[Place] places.chunks(5000).map(_.toList).parForeach { chunk => … } ‣ access patterns aren’t propagated down the stack ‣ missed optimizations Sunday, November 14, 2010
  • 16. ¶ RDBMS ‣ are [by definition] generic ‣ encourage normalized data storage ‣ very powerful data model ‣ little need to know access patterns a priori ‣ provide general (magical) querying mechanics ‣ bag of tricks: query planning, table statistics, covering indices Sunday, November 14, 2010
  • 17. ¶ RDBMS ‣ at scale, the most viable strategy is: What You Serve Is What You Store (WYSIWYS) ‣ or at least very close ‣ this brings about a whole host of new problems ‣ data (in)consistency ‣ multiple indices ‣ “re-normalization” Sunday, November 14, 2010
  • 18. ¶ RDBMS ‣ at-scale, querying is highly predictable, most of the time: ‣ don’t need fancy query planning ‣ don’t need statistics ‣ in fact, we know a-priori how to efficiently query the underlying datastructures ‣ wish: don’t give me a query engine, give me primitives! ‣ maybe there’s a “just right” API here Sunday, November 14, 2010
  • 19. ¶ in-memory representations ‣ having tight control over representation is often crucial to resource utilization ‣ [space vs. time] memory bandwidth is precious, CPU is plentiful ‣ cache locality can often make an enormous difference — even to the point of less code is better than more efficient code(!) ‣ at odds with modern GC’d languages automatic memory management & layout Sunday, November 14, 2010
  • 20. ¶ in-memory representations ‣ optimize memory layout ‣ pack data ‣ compression ‣ varint, difference, zigzag, etc. ‣ L1:main memory latency ≈ 1:200 (!) ‣ example: geometry of Canada ~ jts normalized, vs. WKB ‣ wkb is ≈ 600 KB, JTS representation ≈ 2-3MB Sunday, November 14, 2010
  • 21. ¶ garbage collection ‣ we love garbage collection ‣ attempts to encode common patterns: generational hypothesis ‣ not always quite right ‣ the application almost always has some idea about object lifetime & semantics ‣ proposal: talk to each other! ‣ backpressure, thresholding, application-guided GC Sunday, November 14, 2010
  • 22. ¶ virtual memory “You’re Doing it Wrong” Poul-Henning Kamp, ACM Queue, June 2010 "… Varnish does not ignore the fact that memory is virtual; it actively exploits it” Sunday, November 14, 2010
  • 23. ¶ virtual memory ‣ maybe he is doing it wrong? ‣ varnish uses data structures designed to anticipate virtual memory layout & behavior ‣ translates application semantics (eg. LRU) ‣ instead, you could have direct control over those resources Sunday, November 14, 2010
  • 24. “just right” abstractions ‣ high level abstractions are absolutely necessary to deal with today’s complex systems ‣ but providing good abstractions is hard ‣ what are the “just right” abstractions? ‣ exploit common patterns ‣ give enough degrees of freedom to the underlying platform ‣ usually target a narrow(er) domain ‣ retain high level interfaces Sunday, November 14, 2010
  • 25. ¶ mapreduce def map(datum): words = {} for word in parse_words(datum): word[word] += 1 for (word, count) in words.items(): output(word, count) def reduce(key, values): output(key, mean(values)) ‣ much freedom is given to the scheduler ‣ exploits data locality (predictably) Sunday, November 14, 2010
  • 26. ¶ shared-nothing web apps def handle(request): return Response( “hello %s!” % request.get_user()) ‣ eg: google’s app engine, django, rails, etc Sunday, November 14, 2010
  • 27. ¶ bigtable ‣ very simple data model ‣ but composable — effectively every other database squeezes (more) sophisticated data models down to 1 dimensional storage(s) ‣ explicit memory hierarchy (pinning column families to memory) ‣ provides load balancer/scheduler much freedom ‣ only magic: compactions. challenge: resource isolation. Sunday, November 14, 2010
  • 28. ¶ LWT lwt ai = Lwt_lib.getaddrinfo "localhost" "8080" [Unix.AI_FAMILY Unix.PF_INET; Unix.AI_SOCKTYPE Unix.SOCK_STREAM] in lwt (input, output) = match ai with | [] -> fail Not_found | a :: _ -> Lwt_io.open_connection a.Unix.ai_addr in Lwt_io.write output "GET / HTTP/1.1rnrn" >> Lwt_io.read input Sunday, November 14, 2010
  • 29. theme ‣ provide a programming model that provide a narrow (but flexible) interface to resources ‣ mapreduce ‣ shared-nothing web apps ‣ provide a programming model that make resources explicit ‣ bigtable ‣ LWT Sunday, November 14, 2010
  • 30. meta pattern(s) ‣ addressing separation of concerns: ‣ (asynchronous) execution policy vs. (synchronous) application logic ‣ data locality vs. data operations ‣ data model vs. data distribution ‣ data locality vs. data model Sunday, November 14, 2010
  • 31. the future? ‣ database systems ‣ search systems ‣ ... or any online query system? ‣ some academic work already in this area: ‣ OPIS (distributed arrows w/ combinators) ‣ ypnos (grid compiler) ‣ skywriting (scripted dataflow) Sunday, November 14, 2010
  • 32. conclusions ‣ we need high level abstractions ‣ they are simply necessary ‣ allows us to develop faster and safer ‣ many high level abstractions aren’t “just right” ‣ can become highly inoptimal (often orders of magnitudes can be reclaimed) ‣ some systems do provide good compromises ‣ makes resources explicit ‣ the future is exciting! Sunday, November 14, 2010
  • 33. that’s it! ‣ follow me: @marius ‣ marius@twitter.com Sunday, November 14, 2010