SlideShare une entreprise Scribd logo
1  sur  26
Kafka
A little introduction
Pub-Sub Messaging System
Distributed
Performance
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000

                      10,000

                       1,000

                        100

                         10

                          1           Disk         SSD                   Memory


                               Random access
                               Sequential Access         Source: http://queue.acm.org/detail.cfm?id=1563874
Disk/Memory Performance
                     1000M

                       100M

                        10M

                         1M
Read values/second




                     100,000
                                             Sequential disk read
                      10,000
                                             faster than random
                       1,000

                        100
                                                memory read
                         10

                          1           Disk          SSD                   Memory


                               Random access
                               Sequential Access          Source: http://queue.acm.org/detail.cfm?id=1563874
Persistent
Length    Magic Value Checksum   Payload


4 bytes     1 byte     4 bytes   n bytes
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Token
Offset: 0             Input
Broker: kafka.local
Topic: Testing


                                       MR Job
                        Output                  Output


                      Offset: 130098
                      Broker: kafka.local
                      Topic: Testing

                                                 Sequence File
Useful Things


• http://incubator.apache.org/kafka/
• https://github.com/pingles/clj-kafka

Contenu connexe

Tendances

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent osManusia Tenan
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservicesSalo Shp
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespacesKiratech
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...Tom Croucher
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installationppratish
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSRyo ONODERA
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilAmir Langer
 

Tendances (10)

Instal vnc in cent os
Instal vnc in cent osInstal vnc in cent os
Instal vnc in cent os
 
Iscsi
IscsiIscsi
Iscsi
 
Scaling IO-bound microservices
Scaling IO-bound microservicesScaling IO-bound microservices
Scaling IO-bound microservices
 
ubunturef
ubunturefubunturef
ubunturef
 
Container security: seccomp, network e namespaces
Container security: seccomp, network e namespacesContainer security: seccomp, network e namespaces
Container security: seccomp, network e namespaces
 
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
JavaScript is the new black - Why Node.js is going to rock your world - Web 2...
 
Disk suit 4 setup and installation
Disk suit 4 setup and installationDisk suit 4 setup and installation
Disk suit 4 setup and installation
 
FreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPSFreeBSD under DigitalOcean VPS
FreeBSD under DigitalOcean VPS
 
Disruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.ilDisruptor 2015-12-22 @ java.il
Disruptor 2015-12-22 @ java.il
 
Unixtoolbox
UnixtoolboxUnixtoolbox
Unixtoolbox
 

Dernier

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Kafka - A little introduction

  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9.
  • 11. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 12. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 13. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 10,000 1,000 100 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 14. Disk/Memory Performance 1000M 100M 10M 1M Read values/second 100,000 Sequential disk read 10,000 faster than random 1,000 100 memory read 10 1 Disk SSD Memory Random access Sequential Access Source: http://queue.acm.org/detail.cfm?id=1563874
  • 16.
  • 17.
  • 18.
  • 19. Length Magic Value Checksum Payload 4 bytes 1 byte 4 bytes n bytes
  • 20.
  • 21.
  • 22.
  • 23. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 24. Token Offset: 0 Input Broker: kafka.local Topic: Testing MR Job Output Output Offset: 130098 Broker: kafka.local Topic: Testing Sequence File
  • 25.
  • 26. Useful Things • http://incubator.apache.org/kafka/ • https://github.com/pingles/clj-kafka

Notes de l'éditeur

  1. \n
  2. built by linkedin to process + store high-volume activity stream data, but its really a general use messaging system...\n\n
  3. at it’s heart, its a pub-sub messaging system...\n
  4. It starts with a broker\n
  5. Publishers connect to the broker\n
  6. and send their messages, \n
  7. So we connect some consumers and they can pull messages.\n\nnote when they connect, we’ll receive all messages for a topic, not just since they’ve connected more on that later...\n
  8. but its also distributed, which is to say...\n
  9. we can have multiple brokers in multiple places and aggregate together...\n\ninternally we can also partition within topics to allow parallel consumption, but thats for another talk...\n
  10. before we get into what makes it particularly different (persistence), its useful to understand some of the engineering decisions behind how it works.\n\nperformance is interesting because the behaviour of disks / memory has informed the way kafka has been built to embrace disk persistence\n
  11. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  12. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  13. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  14. research from an ACM paper\n\nvalues/sec is the number of 4-byte integer values read per second from a 1-billion-long array on disk and in memory\n\nnumber of four-byte integer values read per second from a 1-billion-long (4 GB) array on disk or in memory\n\nuses the OS’s default page caching, rather than using custom in-memory stores\ngiven all disk writes/reads will be cached\nmeans we can avoid paying the caching overhead of objects within the JVM\n\nrather than maintaining everything in memory and flush when necessary\neverything is written immediately\n\nconfigurable flushing determines how much data is at risk\n\nsimilar to varnish\n
  15. \n
  16. it starts with a topic, a text description for the messages contained within. we use it to describe how to deserialize the message bytes\n
  17. so we send a message to the topic, what happens?\n
  18. kafka creates a file\nand it persists the message, which is to say it hands it off to the O/S to write\n\nfiles are just sets of bytes, nothing clever\n\ninternally it abstracts the collection of message bytes into a messageset, which is then backed by a file\n\nso what does each message look like...\n
  19. so, our message length is n - 9 bytes\n\nwith a 91 byte payload we have a 100 byte message.\n\nwhich means our next message would start at offset 100\n
  20. and we can see our offsets at the bottom...\n
  21. so we have the offsets which lets us send all messages to consumers, not just those that were sent after they connected... \n
  22. up to the consumer to remember what they’ve consumed, but means you can re-consume an entire set of messages easily, which is very useful when integrating with long-term storage like HDFS...\n\nquick look at the way it works\n
  23. \nour input to the hadoop job is a token file that specifies the offset to read from, the topic etc.\n\nhaving read the token, the mapper connects, and consumes messages from a given offset\n\nthe mapper outputs 2 sets of data- the mapped output, such as the message payloads, and an updated token file with the last read offset.\n\nthis is the key, successful completion of the job results in new metadata for the next run and the output data\n\nmeans that if the job fails we can re-run and it’ll consume from the last consumed offset\n
  24. the newly created output becomes the next input\n
  25. and this is why kafka is an interesting messaging system\n\nsuitable for batch and realtime\n
  26. \n