SlideShare une entreprise Scribd logo
1  sur  26
Kafka
A little introduction
Pub-Sub Messaging System
Distributed
Performance
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000
                                             Sequential disk read
                      10,000
                                             faster than random
                       1,000

                        100
                                                memory read
                         10

                          1           Disk          SSD                   Memory


                               Random access
                               Sequential Access          Source: http://queue.acm.org/detail.cfm?id=1563874
Persistent
Length    Magic Value Checksum   Payload


4 bytes     1 byte     4 bytes   n bytes
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Useful Things


• http://incubator.apache.org/kafka/
• https://github.com/pingles/clj-kafka

Contenu connexe

Tendances

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
Manusia Tenan
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
Tom Croucher
 

Tendances (10)

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
 
Iscsi
IscsiIscsi
Iscsi
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
 
ubunturef
ubunturefubunturef
ubunturef
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installation
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPS
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
 
Unixtoolbox
UnixtoolboxUnixtoolbox
Unixtoolbox
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Kafka - A little introduction

  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 11. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 12. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 13. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 14. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 Sequential disk read 10,000 faster than random 1,000 100 memory read 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 16.
  • 17.
  • 18.
  • 19. Length Magic Value Checksum Payload 4 bytes 1 byte 4 bytes n bytes
  • 20.
  • 21.
  • 22.
  • 23. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 24. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 25.
  • 26. Useful Things • http://incubator.apache.org/kafka/ • https://github.com/pingles/clj-kafka

Notes de l'éditeur

  1. \n
  2. built by linkedin to process + store high-volume activity stream data, but its really a general use messaging system...\n\n
  3. at it’s heart, its a pub-sub messaging system...\n
  4. It starts with a broker\n
  5. Publishers connect to the broker\n
  6. and send their messages, \n
  7. So we connect some consumers and they can pull messages.\n\nnote when they connect, we’ll receive all messages for a topic, not just since they’ve connected more on that later...\n
  8. but its also distributed, which is to say...\n
  9. we can have multiple brokers in multiple places and aggregate together...\n\ninternally we can also partition within topics to allow parallel consumption, but thats for another talk...\n
  10. before we get into what makes it particularly different (persistence), its useful to understand some of the engineering decisions behind how it works.\n\nperformance is interesting because the behaviour of disks / memory has informed the way kafka has been built to embrace disk persistence\n
  11. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  12. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  13. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  14. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  15. \n
  16. it starts with a topic, a text description for the messages contained within. we use it to describe how to deserialize the message bytes\n
  17. so we send a message to the topic, what happens?\n
  18. kafka creates a file\nand it persists the message, which is to say it hands it off to the O/S to write\n\nfiles are just sets of bytes, nothing clever\n\ninternally it abstracts the collection of message bytes into a messageset, which is then backed by a file\n\nso what does each message look like...\n
  19. so, our message length is n - 9 bytes\n\nwith a 91 byte payload we have a 100 byte message.\n\nwhich means our next message would start at offset 100\n
  20. and we can see our offsets at the bottom...\n
  21. so we have the offsets which lets us send all messages to consumers, not just those that were sent after they connected... \n
  22. up to the consumer to remember what they’ve consumed, but means you can re-consume an entire set of messages easily, which is very useful when integrating with long-term storage like HDFS...\n\nquick look at the way it works\n
  23. \nour input to the hadoop job is a token file that specifies the offset to read from, the topic etc.\n\nhaving read the token, the mapper connects, and consumes messages from a given offset\n\nthe mapper outputs 2 sets of data- the mapped output, such as the message payloads, and an updated token file with the last read offset.\n\nthis is the key, successful completion of the job results in new metadata for the next run and the output data\n\nmeans that if the job fails we can re-run and it’ll consume from the last consumed offset\n
  24. the newly created output becomes the next input\n
  25. and this is why kafka is an interesting messaging system\n\nsuitable for batch and realtime\n
  26. \n