Kafka. seattle data science and data engineering meetup

•Télécharger en tant que PPTX, PDF•

0 j'aime•290 vues

Kafka is a distributed, partitioned, replicated commit-log service that provides functionality of a messaging system. It allows for high throughput and scalability of data and guarantees ordering of messages. The four core APIs allow sending and receiving data streams and implementing connectors. Internally, Kafka uses logs and ZooKeeper for cluster membership, electing controllers, and topic configuration. It is open source software available on GitHub.

Données & analyses

Seattle Data Science And Data Engineering Meetup
Abhishek Goswami.
12/14/2016
abgoswam@gmail.com
https://www.linkedin.com/in/abgoswam

Table Of Content
Introduction
Motivation
What is Kafka
Characteristics
APIs
Demos
Internals
Logs
Logs in Distributed Systems
Design Fundamentals
ZooKeeper Dependency
Replication
Source Code
Summary, Q&A
2

● Introduction
○ Motivation
○ What is Kafka?
○ Characteristics
○ APIs
○ Demos
● Internals
● Summary, Q&A
3

Introduction: Motivation
4
Data integration.

Introduction: What is Kafka ?
Distributed, partitioned, replicated commit-log service
Provides the functionality of a messaging system, but with a unique-design
5
Competitive Landscape:
● AWS Kinesis, Azure EventHub
Use Cases:
● Messaging
● Website Activity Tracking
● Logging
● Stream Processing

Introduction: Characteristics
6
Scalability of a filesystem
High Throughput
Many TB per server
Guarantees of a database
Messages strictly ordered
All data persistent
Distributed by default
Replication
Partitioning

Introduction: APIs
Four core APIs:
Producer API
allows applications to send streams of data to topics in the Kafka cluster.
Consumer API
allows applications to read streams of data from topics in the Kafka cluster.
Connect API
allows implementing connectors that continually pull from some source system or application into
Kafka or push from Kafka into some sink system or application.
Streams API
generalization of batch processing in a real time environment, low latency requirements.
7

● Introduction
● Internals
○ Log
○ Logs in Distributed Systems
○ Design Fundamentals
○ ZooKeeper Dependency
○ Replication
○ Source Code
● Summary, Q&A
9

Internals: Logs in Distributed Systems
11

Internals: Logs in Distributed Systems
12

Internals: ZooKeeper Dependency
Kafka requires ZooKeeper
Kafka uses ZooKeeper to do things like:
Cluster membership
Electing a controller
Topic Configuration (which topic exists, who’s the leader etc)
14

Internals: Source Code
Github Repo
https://github.com/apache/kafka
16

● Introduction
● Internals
● Summary, Q&A
17

Summary
18
Kafka solves data integration needs.
Distributed, partitioned, replicated commit-log service

Q&A
19
References:
1. Simplifying data pipelines with Apache Kafka
2. Learning Apache Kafka, 2nd Edition
3. https://www.tutorialspoint.com/apache_kafka/index.htm
4. https://www.infoq.com/articles/apache-kafka
5. https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-
should-know-about-real-time-datas-unifying
abgoswam@gmail.com
https://www.linkedin.com/in/abgoswam

Contenu connexe

Tendances

Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...

confluent

A stream processing platform is not an island unto itself; it must be connected to all of your existing data systems, applications, and sources. In this talk we will provide different options for integrating systems and applications with Apache Kafka, with a focus on the Kafka Connect framework and the ecosystem of Kafka connectors. We will discuss the intended use cases for Kafka Connect and share our experience and best practices for building large-scale data pipelines using Apache Kafka.

Data integration with Apache Kafka

confluent

MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...

MongoDB

Braze is a customer engagement platform that delivers more than a billion messaging experiences across push, email, apps and more each day. In this session, Jon Hyman will describe the company's challenges during an inflection point in 2015 when the company reached the limitation of their physical networking equipment, and how Braze has since grown more than 7x on Fastly. Jon will also discuss how Braze uses Fastly's Layer 7 load balancing to improve stability and uptime of its APIs.

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...

Fastly

Kafka Connect by Datio

Datio Big Data

Spark Streaming makes it easy to build scalable, robust stream processing applications — but only once you’ve made your data accessible to the framework. Spark Streaming solves the realtime data processing problem, but to build large scale data pipeline we need to combine it with another tool that addresses data integration challenges. The Apache Kafka project recently introduced a new tool, Kafka Connect, to make data import/export to and from Kafka easier.

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Guozhang Wang

Change data capture with MongoDB and Kafka.

Dan Harvey

(Stephen Parente + Jeff Field, Blizzard) Kafka Summit SF 2018 Blizzard’s global data platform has become a driving force in both business and operational analytics. As more internal customers onboard with the system, there is increasing demand for custom applications to access this data in near real time. In order to avoid many independent teams with varying levels of Kafka expertise all accessing the firehose from our critical production Kafkas, we developed our own pub-sub system on top of Kafka to provide specific datasets to customers on their own cloud deployed Kafka clusters.

You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard

confluent

Bootstrap SaaS startup using Open Source Tools

botsplash.com

There was a time not long ago when we used relational databases for everything. Even if the data wasn’t particularly relational, we shoehorned it into relational tables, often because that was the only database we had. Thank god these dark times are over and now we have many different kinds of NoSQL databases: Document, realtime, graph, column, but that does not solve the problem that the same data might be a graph from one perspective, but a collection of documents from another. It would be really nice if we can access that same data in many different ways, depending on the context of what we want to achieve in our current task. As software architects this is not easy to solve but definitely possible: We can design an architecture using Event Sourcing: Capture the data with Debezium, post it to a Kafka queue, use Kafka Streams to model the data the way we like, and store the data in various different data sources, so we can synchronize data between data sources.

Embracing Database Diversity with Kafka and Debezium

Frank Lyaruu

Kafka 탄생과 생태계

Gee Yeol Nahm

Devops Days, 2019 - Charlotte

botsplash.com

In this talk, we are going to tell you the story of building the Connection Platform (CoPa). This is an endeavor undertaken at Generali Switzerland over the course of the last year, in a collaboration with Innovation Process Technology. The goal was to design a general purpose, state of the art integration platform, which covers all integration needs of the enterprise. The central data distribution and integration layer are powered by Confluent Kafka. We will throw a spotlight on three different aspects of this platform that, all in their own right, are essential for agile data integration. First of all, the platform is hosted on the container platform Redhat Openshift. Everything is set up in flexible Docker containers. Automated pipelines are used to build, provision and deploy everything on the platform from infrastructure to data pipeline

Agile Data Integration: How is it possible?

confluent

Course Objectives In this three-day hands-on course, you will learn how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka experts. You will learn how Kafka and the Confluent Platform work, their main subsystems, how they interact, and how to set up, manage, monitor, and tune your cluster. For more information, please visit www.confluent.io/training/

Confluent Operations Training for Apache Kafka

confluent

Having started with classic monolith applications in the late 90s and adopting a new microservice architecture in 2015, our organization needed a convenient, reliable, and low-cost way to push changes back and forth between them. One that preferably utilized technology already on hand and could exchange information between multiple data stores. In this session we will explore how Kafka Connect and its various connectors satisfied this need. We will review the two disparate tech stacks we needed to integrate, and the strategies and connectors we used to exchange information. Finally, we will cover some enhancements we made to our own processes including integrating Kafka Connect and its connectors into our CI/CD pipeline and writing tools to monitor connectors in our production environment.

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...

HostedbyConfluent

Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...

Lightbend

Milano Apache Kafka Meetup by Confluent (First Italian Kafka Meetup) on Wednesday, November 29th 2017. Il talk introduce Apache Kafka (incluse le APIs Kafka Connect e Kafka Streams), Confluent (la società creata dai creatori di Kafka) e spiega perché Kafka è un'ottima e semplice soluzione per la gestione di stream di dati nel contesto di due delle principali forze trainanti e trend industriali: Internet of Things (IoT) e Microservices.

Introduction to Apache Kafka and Confluent... and why they matter

confluent

You have learned about Kafka event sourcing with streams and using Kafka as a database, but you may be having a tough time wrapping your head around what that means and what challenges you will face. Kafka’s exactly once semantics, data retention rules, and stream DSL make it a great database for real-time transaction processing. This talk will focus on how to use Kafka events as a database. We will talk about using KTables vs GlobalKTables, and how to apply them to patterns we use with traditional databases. We will go over a real-world example of joining events against existing data and some issues to be aware of. We will finish covering some important things to remember about state stores, partitions, and streams to help you avoid problems when your data sets become large.

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...

HostedbyConfluent

Column and hadoop

Alex Jiang

EventHub for kafka ecosystems kafka meetup

Nitin Kumar

Tendances (20)

Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...

Data integration with Apache Kafka

MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...

Altitude San Francisco 2018: Scale and Stability at the Edge with 1.4 Billion...

Kafka Connect by Datio

Building Realtim Data Pipelines with Kafka Connect and Spark Streaming

Change data capture with MongoDB and Kafka.

You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard

Bootstrap SaaS startup using Open Source Tools

Embracing Database Diversity with Kafka and Debezium

Kafka 탄생과 생태계

Devops Days, 2019 - Charlotte

Agile Data Integration: How is it possible?

Confluent Operations Training for Apache Kafka

Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...

Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...

Introduction to Apache Kafka and Confluent... and why they matter

Using Kafka as a Database For Real-Time Transaction Processing | Chad Preisle...

Column and hadoop

EventHub for kafka ecosystems kafka meetup

En vedette

Talk on Parallel Computing at IGWA

Dishant Ailawadi

что делать если ребенок боится врачей

virtualtaganrog

у нас прошёл месячник пожарной безопасности !

virtualtaganrog

Press Release - YFC

Magdalene Tan

1 день в детском саду

Ksenya Petrunina

Ce mois-ci, notre « Rendez-vous de l’économie FTI Consulting en partenariat avec Odoxa - Les Echos et Radio Classique » porte sur la loi santé. Les résultats de ce sondage ont été publiés ce matin dans Les Echos et diffusés sur Radio Classique. On y apprend notamment que : • Sept Français sur dix soutiennent la généralisation du tiers-payant, disposition phare du PLS de Marisol Touraine • Pourtant, les Français comprennent aussi l’opposition des médecins à cette généralisation du tiers payant • Santé publique : l’assouplissement de la loi Evin et l’instauration des paquets neutres divisent les Français, suscitant un assez fort clivage gauche-droite

Rendez-vous de l’économie FTI Consulting en partenariat avec Odoxa - Les Echo...

FTI Consulting FR

Hadoop Fundamentals

its_skm

Calendario Diciembre 2016

moonmentum

For developers new to MongoDB and Node.js, however, some the common design patterns are very different than those of a RDBMS and traditional synchronous languages. Developers learning these technologies together may find it a bit bewildering. In reality, however, these tools fit perfectly together and enable I high degree of developer productivity and application performance. This webinar will walk developers through common MongoDB development patterns in Node.js, such as efficiently loading data into MongoDB using MongoDB's bulk API, iterating through query results, and managing simultaneous asynchronous MongoDB queries to provide the best possible application performance. Working Node.js and MongoDB examples will be used throughout the presentation.

Getting Started with MongoDB and NodeJS

MongoDB

La ingeniería en la edad media

David Jimenez

MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector

MongoDB

Comparto con ustedes una presentación que elaboré para un seminario sobre móviles, donde explico que involucra usar la tecnología móvil en celulares y tabletas dentro de un ecosistema lleno de necesidades en los diferentes contexto de uso. Demostrando que, no solo el tamaño del dispositivo y la forma de interactuar con este son suficientes para resolver una adecuada experiencia usuaria, si no, que el contexto de uso y las consideraciones a tomar a partir de este en el comportamiento de nuestros usuarios (motivaciones y/o frenos) son temas relevantes al momento de conceptualizar nuestro producto y definir sus objetivos rentables. Temas en la presentación: - ¿Qué ha remplazado el móvil? - Contexto Móvil =/ Contexto Tablet =/ Contexto PC - Marcos mentales - Contextos de uso - Consideraciones contextuales para movilizar - Algunos estudios a nivel mundial sobre el uso de celulares inteligentes Fuente: www.blog.pucp.edu.pe/ux

Movilidad y su contexto de uso

Percy Negrete

Webinar: Simplifying the Database Experience with MongoDB Atlas

MongoDB

Consejo 6 octubre 2016

CEPTENERIFESUR

El color en la navegación puede ayudar al usuario a encontrar lo que busca con mayor rapidez agrupando contenidos. Aquí revelamos el poder del color y su influencia en los usuarios. En esta presentación podrá encontrar: - Introducción. - ¿Cómo podemos usar los colores adecuadamente?. - La sobrecarga cognitiva. - Problemas con los colores. - Interpretaciones de los colores para personas. - Efecto Stroop. Fuente: www.blog.pucp.edu.pe/ux

Importancia del color en la experiencia de uso

Percy Negrete

Magic quadrant for data warehouse database management systems

divjeev

Otras realidades, otros impactos, otras métricas: la nueva bibliometría 1. La medida de la ciencia 2. Hitos históricos evaluación bibliométrica De la Bibliometrics: la evaluación de unos pocos, por unos pocos y para unos pocos A la Webmetrics y a la Altmetrics: La popularización y democratización de la evaluación científica La evaluación de todos, por todos, para todos, de todo, a todas horas y en todos los lugares 3.La nueva bibliometría: 3.1 Otras realidades - Nuevos medios de comunicación Los sitios web - Nuevos medios de comunicación Blogs - Nuevos medios de comunicación Twitter - Nuevos medios de comunicación Presentaciones - Nuevos almacenes de información bibliográfica: los repositorios - Nuevos almacenes de información bibliográfica : los gestores de referencias bibliográficas - Las redes sociales científicas 3.2 Otros impactos Otras métricas - Midiendo el impacto de los sitios web - Midiendo el impacto de un Blog - Midiendo el impacto en Twitter - Midiendo el impacto de las presentaciones - Midiendo el impacto de los documentos indizados en los repositorios - Midiendo el impacto de los documentos indizados en los nuevos almacenes de información bibliográfica : los gestores de referencias bibliográficas - Midiendo en las redes sociales científicas 3.3 Otras herramientas - Construyendo rankings web. Nivel macro, Nivel micro (Google Analytics) - Google Scholar: la nueva "casa de citas" - LOS DERIVADOS BIBLIOMÉTRICOS DE GOOGLE SCHOLAR Google Scholar Metrics, Google Scholar Citations 3.4 ¿Qué futuro aguarda a los nuevas métricas?, ¿Cuál es el futuro de los nuevos medios de comunicación?, ¿Cuántos documentos posee las nuevas métricas? - ¿Qué sabemos de las nuevas métricas? El sentido común Evidencias empíricas - ¿Para qué los nuevos indicadores? - ¿Qué impacto miden? Científico Profesional Educativo Social - ¿Qué sabemos de Google Scholar como fuente de evaluación científica? 4. Los riesgos de la nueva bibliometría - Problemas: La FUGACIDAD - The Googledependency Problemas: La dependencia tecnológica - El gran peligro: La MANIPULACIÓN - ¿Se convertirá la métrica en un fin en sí mismo? ¿la medida alterará el fin mismo de la ciencia? ¿Un inquietante futuro?

Otras realidades, otros impactos, otras métricas: la nueva bibliometría

Emilio Delgado Lopez-Cozar, Universidad de Granada

Big Data Paris - Air France: Stratégie BigData et Use Cases

MongoDB

El neurodiseño es un nuevo proceso de trabajo y de investigación en el diseño UX donde el usuario es visto como un todo integrado, es decir, que hay que considerar al usuario no solo como un sujeto físico y mental a nivel de sensaciones y emociones, sino, también considerar su subconsciente que es el que realmente toma las decisiones de comportamiento. Por eso, es importante que el diseñador de experiencia sea un poco psicólogo ya que conociendo no solo cuestiones de ergonomía del diseño sino también los principios cognitivos en los que se basa esta disciplina, como funciona el cerebro humano y como siguiendo los principios del neurodiseño podemos crear productos mas persuasible y poder influir en el comportamiento. Hay que aclarar es que estos estudios evalúan las reacciones fisiológicas y biológicas del cerebro ante los estímulos planteados y son lo más cercano a evaluar el inconsciente. ------------ Esta presentación fué elaborada por Sandra Vilchez y presentada en famoso Encuentro Americano de Diseño 2012 celebrado en Palermo, Buenos Aires, Argentina Más información sobre esta presentación en: http://blog.pucp.edu.pe/item/165446/neuro-design

Neurodiseño, una tendencia en el diseño de experiencia

Percy Negrete

Design principles of scalable, distributed systems

Tinniam V Ganesh (TV)

En vedette (20)

Talk on Parallel Computing at IGWA

что делать если ребенок боится врачей

у нас прошёл месячник пожарной безопасности !

Press Release - YFC

1 день в детском саду

Rendez-vous de l’économie FTI Consulting en partenariat avec Odoxa - Les Echo...

Hadoop Fundamentals

Calendario Diciembre 2016

Getting Started with MongoDB and NodeJS

La ingeniería en la edad media

MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector

Movilidad y su contexto de uso

Webinar: Simplifying the Database Experience with MongoDB Atlas

Consejo 6 octubre 2016

Importancia del color en la experiencia de uso

Magic quadrant for data warehouse database management systems

Otras realidades, otros impactos, otras métricas: la nueva bibliometría

Big Data Paris - Air France: Stratégie BigData et Use Cases

Neurodiseño, una tendencia en el diseño de experiencia

Design principles of scalable, distributed systems

Similaire à Kafka. seattle data science and data engineering meetup

A Short Presentation on Kafka

Mostafa Jubayer Khan

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and Kafka Apache NiFi, Apache Flink, Apache Kafka Timothy Spann Principal Developer Advocate Cloudera Data in Motion https://budapestdata.hu/2023/en/speakers/timothy-spann/ Timothy Spann Principal Developer Advocate Cloudera (US) LinkedIn · GitHub · datainmotion.dev June 8 · Online · English talk Building Modern Data Streaming Apps with NiFi, Flink and Kafka In my session, I will show you some best practices I have discovered over the last 7 years in building data streaming applications including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize the best features of all. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Kafka. From there we build streaming ETL with Apache Flink SQL. We will stream data into Apache Iceberg. We use the best streaming tools for the current applications with FLaNK. flankstack.dev BIO Tim Spann is a Principal Developer Advocate in Data In Motion for Cloudera. He works with Apache NiFi, Apache Pulsar, Apache Kafka, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...

Timothy Spann

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...

Timothy Spann

Distributed messaging through Kafka

Dileep Kalidindi

Abstract:- Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing. In this talk you will learn more about: A quick introduction to Kafka Core, Kafka Connect and Kafka Streams through code examples, key concepts and key features. A reference architecture for building such Kafka-based streaming data applications. A demo of an end-to-end Kafka-based streaming data application.

Building streaming data applications using Kafka*[Connect + Core + Streams] b...

Data Con LA

As a data professional, you are the glue that makes cross-platform integrations possible. With the increase in adoption of hybrid cloud architectures, Kafka is an increasingly relevant tool for building data pipelines between platforms and accelerating delivery on cloud projects. Early exposure to Kafka on Azure capabilities gives you an edge to build better mousetraps at the design phase. Customers already running Kafka on premises and are looking to extend Kafka systems to Azure can get started quickly with Confluent Cloud. Additionally, DevOps for self-managed options can be easily scalable with Ansible for Virtual Machines or containers via Azure Kubernetes Services or Azure Container Instances. This session is presented from the Microsoft Solution Architect perspective by Israel Ekpo, Microsoft Cloud Solution Architect and Alicia Moniz, Microsoft MVP. They will cover use cases and scenarios, along with key Azure integration points and architecture patterns.

Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...

HostedbyConfluent

Introduction to Kafka and Zookeeper

Rahul Jain

Apache Kafka evolved from an enterprise messaging system to a fully distributed streaming data platform for building real-time streaming data pipelines and streaming data applications without the need for other tools/clusters for data ingestion, storage and stream processing. In this talk you will learn more about: 1. A quick introduction to Kafka Core, Kafka Connect and Kafka Streams: What is and why? 2. Code and step-by-step instructions to build an end-to-end streaming data application using Apache Kafka

Building Streaming Data Applications Using Apache Kafka

Slim Baltagi

Apache Arrow: Open Source Standard Becomes an Enterprise Necessity

Wes McKinney

AWS API Framework Overview

API Talent

OSSNA Building Modern Data Streaming Apps https://ossna2023.sched.com/event/1Jt05/virtual-building-modern-data-streaming-apps-with-open-source-timothy-spann-streamnative Timothy Spann Cloudera Principal Developer Advocate Data in Motion In my session, I will show you some best practices I have discovered over the last seven years in building data streaming applications, including IoT, CDC, Logs, and more. In my modern approach, we utilize several open-source frameworks to maximize all the best features. We often start with Apache NiFi as the orchestrator of streams flowing into Apache Pulsar. From there, we build streaming ETL with Apache Spark and enhance events with Pulsar Functions for ML and enrichment. We make continuous queries against our topics with Flink SQL. We will stream data into various open-source data stores, including Apache Iceberg, Apache Pinot, and others. We use the best streaming tools for the current applications with the open source stack - FLiPN. https://www.flipn.app/ Updates: This will be in-person with live coding based on feedback from the crowd. This will also include new data stores, new sources, and data relevant to and from the Vancouver area. This will also include updates to the platforms and inclusion of Apache Iceberg, Apache Pinot and some other new tech. https://github.com/tspannhw/SpeakerProfile Tim Spann is a Principal Developer Advocate for Cloudera. He works with Apache Kafka, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science. Timothy J Spann Cloudera Principal Developer Advocate Hightstown, NJ Websitehttps://datainmotion.dev/

OSSNA Building Modern Data Streaming Apps

Timothy Spann

Open Marketing Meeting 03/27/2013

OpenStack

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud

Andrew Schofield

Music city data Hail Hydrate! from stream to lake

Timothy Spann

AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running. With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app. In this session, we dive deep into AWS Lambda to learn about capabilities, features and benefits. Learning Objectives: • Dive deep into AWS Lambda • Learn about the capabilities, features and benefits of AWS Lambda • Learn about the different use cases • Learn how to get started using AWS Lambda

Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks

Amazon Web Services

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

confluent

Logging in Scala

John Nestor

Watch this webcast here: https://www.confluent.io/online-talks/whats-new-in-confluent-platform-55/ Join the Confluent Product Marketing team as we provide an overview of Confluent Platform 5.5, which makes Apache Kafka and event streaming more broadly accessible to developers with enhancements to data compatibility, multi-language development, and ksqlDB. Building an event-driven architecture with Apache Kafka allows you to transition from traditional silos and monolithic applications to modern microservices and event streaming applications. With these benefits has come an increased demand for Kafka developers from a wide range of industries. The Dice Tech Salary Report recently ranked Kafka as the highest-paid technological skill of 2019, a year removed from ranking it second. With Confluent Platform 5.5, we are making it even simpler for developers to connect to Kafka and start building event streaming applications, regardless of their preferred programming languages or the underlying data formats used in their applications. This session will cover the key features of this latest release, including: -Support for Protobuf and JSON schemas in Confluent Schema Registry and throughout our entire platform -Exactly once semantics for non-Java clients -Admin functions in REST Proxy (preview) -ksqlDB 0.7 and ksqlDB Flow View in Confluent Control Center

What's New in Confluent Platform 5.5

confluent

apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...

apidays

Building distributed systems is challenging. Luckily, Apache Kafka provides a powerful toolkit for putting together big services as a set of scalable, decoupled components. In this talk, I'll describe some of the design tradeoffs when building microservices, and how Kafka's powerful abstractions can help. I'll also talk a little bit about what the community has been up to with Kafka Streams, Kafka Connect, and exactly-once semantics. Presentation by Colin McCabe, Confluent, Big Data Day LA

Building Microservices with Apache Kafka

confluent

Similaire à Kafka. seattle data science and data engineering meetup (20)

A Short Presentation on Kafka

Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...

Distributed messaging through Kafka

Building streaming data applications using Kafka*[Connect + Core + Streams] b...

Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...

Introduction to Kafka and Zookeeper

Building Streaming Data Applications Using Apache Kafka

Apache Arrow: Open Source Standard Becomes an Enterprise Necessity

AWS API Framework Overview

OSSNA Building Modern Data Streaming Apps

Open Marketing Meeting 03/27/2013

IBM Message Hub service in Bluemix - Apache Kafka in a public cloud

Music city data Hail Hydrate! from stream to lake

Deep Dive on AWS Lambda - January 2017 AWS Online Tech Talks

Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME

Logging in Scala

What's New in Confluent Platform 5.5

apidays LIVE Hong Kong 2021 - Multi-Protocol APIs at Scale in Adidas by Jesus...

Building Microservices with Apache Kafka

Dernier

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

Call Girl In Dwarka ☎92055#41914 ¶¶ Indian,Russian Best Quality full Educated And Full Cooperative Independent Call Girls Escort Services In New Delhi- I Have Extremely Beautiful Broad Minded Cute Sexy & Hot Call Girls and Escorts, We Are Located in 3* 4* 5* Hotels in Delhi. Safe & Secure High Class Services Affordable Rate 100% Satisfaction, Unlimited Enjoyment. Any Time for Model/Teens Escort in Delhi High class luxury and premium escorts agency Indian Russian Call Girls In Delhi Booking Good High Profile Escorts (Call Girls) In Delhi 5 Star Hotel ,Incall Service,OutCall Service, We provide services by Call Girls,College Girls,Modals Get High Profile queens,Well Educated,Good Looking,Full Cooperative Model, Russian Models,Punjabi Girls Kashmeri Girls Services etc… We Provide Hottest Female With Safe And Consensual With Most Limits Respected Complete Satisfaction Guaranteed…Service. Call Me Spacial For Including Incall//outcall Service In New Delhi Indian Russian Escorts Service

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

Delhi Call girls

BabyOno dropshipping via API with DroFx.pptx

olyaivanovalion

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore Escorts Service Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

amitlee9823

Smarteg dropshipping via API with DroFx.pptx

olyaivanovalion

ALSO dropshipping via API with DroFx.pptx

olyaivanovalion

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

amitlee9823

BigBuy dropshipping via API with DroFx.pptx

olyaivanovalion

Discover Why Less is More in B2B Research

michael115558

Model Call Girl Services in Delhi reach out to us at 🔝 9953056974 🔝✔️✔️ Our agency presents a selection of young, charming call girls available for bookings at Oyo Hotels. Experience high-class escort services at pocket-friendly rates, with our female escorts exuding both beauty and a delightful personality, ready to meet your desires. Whether it's Housewives, College girls, Russian girls, Muslim girls, or any other preference, we offer a diverse range of options to cater to your tastes. We provide both in-call and out-call services for your convenience. Our in-call location in Delhi ensures cleanliness, hygiene, and 100% safety, while our out-call services offer doorstep delivery for added ease. We value your time and money, hence we kindly request pic collectors, time-passers, and bargain hunters to refrain from contacting us. Our services feature various packages at competitive rates: One shot: ₹2000/in-call, ₹5000/out-call Two shots with one girl: ₹3500/in-call, ₹6000/out-call Body to body massage with sex: ₹3000/in-call Full night for one person: ₹7000/in-call, ₹10000/out-call Full night for more than 1 person: Contact us at 🔝 9953056974 🔝. for details Operating 24/7, we serve various locations in Delhi, including Green Park, Lajpat Nagar, Saket, and Hauz Khas near metro stations. For premium call girl services in Delhi 🔝 9953056974 🔝. Thank you for considering us!

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

9953056974 Low Rate Call Girls In Saket, Delhi NCR

VidaXL dropshipping via API with DroFx.pptx

olyaivanovalion

Klinik_ Apotek Onlin 085657271886 Solusi Menggugurkan Masalah Kehamilan Anda Jual Obat Aborsi Asli KLINIK ABORSI TERPEECAYA _ Jual Obat Aborsi Cytotec Misoprostol Asli 100% Ampuh Hanya 3 Jam Langsung Gugur || OBAT PENGGUGUR KANDUNGAN AMPUH MANJUR OBAT ABORSI OLINE" APOTIK Jual Obat Cytotec, Gastrul, Gynecoside Asli Ampuh. JUAL ” Obat Aborsi Tuntas | Obat Aborsi Manjur | Obat Aborsi Ampuh | Obat Penggugur Janin | Obat Pencegah Kehamilan | Obat Pelancar Haid | Obat terlambat Bulan | Ciri Obat Aborsi Asli | Obat Telat Bulan | Pil Aborsi Asli | Cara Menggugurkan Konten | Cara Aborsi Tuntas | Harga Obat Aborsi Asli | Pil Aborsi | Jual Obat Aborsi Cytotec | Cara Aborsi Sendiri | Cara Aborsi Usia 1 Bulan | Cara Aborsi Usia 2 Tahun | Cara Aborsi Usia 3 Bulan | Obat Aborsi Usia 4 Bulan | Cara Abrasi Usia 5 Bulan | Cara Menggugurkan Konten | Kandungan Obat Penggugur | Cara Menghitung Usia Konten | Cara Mengatasi Terlambat Bulan | Penjual Obat Aborsi Asli | Obat Aborsi Garansi | Kandungan Obat Peluntur | Obat Telat Datang Bulan | Obat Telat Haid | Obat Aborsi Paling Murah | Klinik Jual Obat Aborsi | Jual Pil Cytotec | Apotik Jual Obat Aborsi | Kandungan Dokter Abrasi | Cara Aborsi Cepat | Jual Obat Aborsi Bergaransi | Jual Obat Cytotec Asli | Obat Aborsi Aman Manjur | Obat Misoprostol Cytotec Asli. "APA ITU ABORSI" “Aborsi Adalah dengan membendung hormon yang di perlukan untuk mempertahankan kehamilan yaitu hormon progesteron, karena hormon ini dibendung, maka jalur kehamilan mulai membuka dan leher rahim menjadi melunak,sehingga mengeluarkan darah yang merupakan tanda bahwa obat telah bekerja || maksimal 1 jam obat diminum || PENJELASAN OBAT ABORSI USIA 1 _7 BULAN Pada usia kandungan ini, pasien akan merasakan sakit yang sedikit tidak berlebihan || sekitar 1 jam ||. namun hanya akan terjadi pada saatdarah keluar merupakan pertanda menstruasi. Hal ini dikarenakan pada usiakandungan 3 bulan,janin sudah terbentuk sebesar kepalan tangan orang dewasa. Cara kerja obat aborsi : JUAL OBAT ABORSI AMPUH dosis 3 bulan secara umum sama dengan cara kerja || DOSIS OBAT ABORSI 2 bulan”, hanya berbedanya selain mengisolasijanin juga menghancurkan janin dengan formula methotrexate dikandungdidalamnya. Formula methotrexate ini sangat ampuh untuk menghancurkan janinmenjadi serpihan-serpihan kecil akan sangat berguna pada saat dikeluarkan nanti. APA ALASAN WANITA MELAKUKAN ABORSI? Aborsi di lakukan wanita hamil baik yang sudah menikah maupun belum menikah dengan berbagai alasan , akan tetapi alasan yang utama adalah alasan-alasan non medis (termasuk aborsi sendiri / di sengaja/ buatan] MELAYANI PEMESANAN OBAT ABORSI SETIAP HARI, SIAP KIRIM KESELURUH KOTA BESAR DI INDONESIA DAN LUAR NEGERI. HUBUNGI PEMESANAN LEBIH NYAMAN VIA WA/: 085657271886

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

ZurliaSoop

Sampling (random) method and Non random.ppt

Dr. Soumendra Kumar Patra

Mature dropshipping via API with DroFx.pptx

olyaivanovalion

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

adriantubila

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

amitlee9823

Gen AI on Enterprise Cloud Apache NiFi Milvus Apache Kafka Apache Flink Cloudera Machine Learning Cloudera DataFlow https://medium.com/@tspann/building-a-milvus-connector-for-nifi-34372cb3c7fa https://www.meetup.com/futureofdata-princeton/events/300737266/ https://lu.ma/q7pcfyjn?source=post_page-----34372cb3c7fa--------------------------------&tk=TTyakY If you're interested in working with Generative AI on the cloud, this virtual workshop is for you. Tim Spann from Cloudera and Yujian Tang from Zilliz will cover how you can implement your own GenAI workflows on the cloud at enterprise scale. 9:00 - 9:05: Intro 9:05 - 9:15: What is Milvus 9:15 - 9:25: Cloudera Development Platform 9:25 - 10:00: Demo Location https://www.youtube.com/watch?v=IfWIzKsoHnA https://github.com/tspannhw/SpeakerProfile https://www.linkedin.com/in/yujiantang/

Generative AI on Enterprise Cloud with NiFi and Milvus

Timothy Spann

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore Booking Contact Details :- WhatsApp Chat :- +91-7737669865 2-May-2024(SMW) Call Girls In Model Towh Bangalore +91-7737669865 !! Best Woman Seeking Man Call Girls Service, Escorts Service in Home Hotel in Bangalore NCR 24 Hours Available Service Call Girls, Contact Us +91-7737669865 (Any Time. Any Where) Call Girls in Bangalore, Noida, Gurgaon, Ghaziabad,Sexy Indian Female Escorts Service Bangalore NCRWelcome To Bangalore Escorts Service – An All Over New Bangalore Very Sexy Hot Call Girls Agency Service Escorts In South BangaloreNCRBangalore’s No. 1 High Profile Independent Female Escorts Service. We Provide Good Quality Educated Profile At Very Regnebal Price 100% Safe And Original.We Are Provide Escorts Service All OYO Hotels ,3*,4*,5* Star Hotel And Home Flat, Apartment. Guest-House. Services In -Call And Out – Call Both Are Services Available. 24Hrs. Any Time Any Where. In All Over Bangalore Noida Gurgaon Ghaziabad Faridabad.More Information And Contact Profile Real Pic Visit Our Website City Wise Escorts Service Agency.Good Looking Cheap And Best Models Girls U Can Get Best Click On Link……Night Call Girls Now In Hotel Le Meridien Gurgaon Near Female Escort One Shot — 5000/in call (time 1 hour), 6000/out call Two shot with one girl — 8000/in call (time 2 hour), 10000/out call Body to body massage with sex- 8000/in call (time 1 hour) Full night Service for one person– 12000/in call, 13000/out call (shot limit 3-4 shots) Full night Service for more than 1 person — please contact Us —7737669865 We are available 24*7 all days of the year. Call us — 7737669865 Thank you for Visiting.

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

amitlee9823

Carero dropshipping via API with DroFx.pptx

olyaivanovalion

ELKO dropshipping via API with DroFx.pptx

olyaivanovalion

Dernier (20)

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...

BabyOno dropshipping via API with DroFx.pptx

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...

Smarteg dropshipping via API with DroFx.pptx

ALSO dropshipping via API with DroFx.pptx

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

BigBuy dropshipping via API with DroFx.pptx

Discover Why Less is More in B2B Research

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

VidaXL dropshipping via API with DroFx.pptx

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

Sampling (random) method and Non random.ppt

Mature dropshipping via API with DroFx.pptx

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...

Generative AI on Enterprise Cloud with NiFi and Milvus

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...

Carero dropshipping via API with DroFx.pptx

ELKO dropshipping via API with DroFx.pptx

Kafka. seattle data science and data engineering meetup

1. Seattle Data Science And Data Engineering Meetup Abhishek Goswami. 12/14/2016 abgoswam@gmail.com https://www.linkedin.com/in/abgoswam

2. Table Of Content Introduction Motivation What is Kafka Characteristics APIs Demos Internals Logs Logs in Distributed Systems Design Fundamentals ZooKeeper Dependency Replication Source Code Summary, Q&A 2

3. ● Introduction ○ Motivation ○ What is Kafka? ○ Characteristics ○ APIs ○ Demos ● Internals ● Summary, Q&A 3

4. Introduction: Motivation 4 Data integration.

5. Introduction: What is Kafka ? Distributed, partitioned, replicated commit-log service Provides the functionality of a messaging system, but with a unique-design 5 Competitive Landscape: ● AWS Kinesis, Azure EventHub Use Cases: ● Messaging ● Website Activity Tracking ● Logging ● Stream Processing

6. Introduction: Characteristics 6 Scalability of a filesystem High Throughput Many TB per server Guarantees of a database Messages strictly ordered All data persistent Distributed by default Replication Partitioning

7. Introduction: APIs Four core APIs: Producer API allows applications to send streams of data to topics in the Kafka cluster. Consumer API allows applications to read streams of data from topics in the Kafka cluster. Connect API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application. Streams API generalization of batch processing in a real time environment, low latency requirements. 7

8. Introduction: Demos 8

9. ● Introduction ● Internals ○ Log ○ Logs in Distributed Systems ○ Design Fundamentals ○ ZooKeeper Dependency ○ Replication ○ Source Code ● Summary, Q&A 9

10. Internals: Log 10

11. Internals: Logs in Distributed Systems 11

12. Internals: Logs in Distributed Systems 12

13. Internals: Design Fundamentals 13

14. Internals: ZooKeeper Dependency Kafka requires ZooKeeper Kafka uses ZooKeeper to do things like: Cluster membership Electing a controller Topic Configuration (which topic exists, who’s the leader etc) 14

15. Internals: Replication 15

16. Internals: Source Code Github Repo https://github.com/apache/kafka 16

17. ● Introduction ● Internals ● Summary, Q&A 17

18. Summary 18 Kafka solves data integration needs. Distributed, partitioned, replicated commit-log service

19. Q&A 19 References: 1. Simplifying data pipelines with Apache Kafka 2. Learning Apache Kafka, 2nd Edition 3. https://www.tutorialspoint.com/apache_kafka/index.htm 4. https://www.infoq.com/articles/apache-kafka 5. https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer- should-know-about-real-time-datas-unifying abgoswam@gmail.com https://www.linkedin.com/in/abgoswam

Notes de l'éditeur

Two main challenges. Large volume of data Different sources and destinations (and the second challenge is to analyze the collected data. To overcome those challenges, you must need a messaging system) Kafka is designed for distributed high throughput systems. Kafka tends to work very well as a replacement for a more traditional message broker. In comparison to other messaging systems, Kafka has better throughput, built-in partitioning, replication and inherent fault-tolerance, which makes it a good fit for large-scale message processing applications. What is a Messaging System? A Messaging System is responsible for transferring data from one application to another, so the applications can focus on data, but not worry about how to share it. Distributed messaging is based on the concept of reliable message queuing. Messages are queued asynchronously between client applications and messaging system. Two types of messaging patterns are available − one is point to point and the other is publish-subscribe (pub-sub) messaging system. Most of the messaging patterns follow pub-sub. In a point-to-point system, messages are persisted in a queue. One or more consumers can consume the messages in the queue, but a particular message can be consumed by a maximum of one consumer only. Once a consumer reads a message in the queue, it disappears from that queue In the publish-subscribe system, messages are persisted in a topic. Unlike point-to-point system, consumers can subscribe to one or more topic and consume all the messages in that topic. In the Publish-Subscribe system, message producers are called publishers and message consumers are called subscribers Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another. Kafka is suitable for both offline and online message consumption. Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. Kafka is built on top of the ZooKeeper synchronization service. It integrates very well with Apache Storm and Spark for real-time streaming data analysis. Benefits Following are a few benefits of Kafka − - Reliability − Kafka is distributed, partitioned, replicated and fault tolerance. - Scalability − Kafka messaging system scales easily without down time.. - Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable.. - Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored. Kafka is very fast and guarantees zero downtime and zero data loss. Use Cases Kafka can be used in many Use Cases. Some of them are listed below − - Metrics − Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of operational data.Log Aggregation Solution − Kafka can be used across an organization to collect logs from multiple services and make them available in a standard format to multiple con-sumers. - Stream Processing − Popular frameworks such as Storm and Spark Streaming read data from a topic, processes it, and write processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing. Need for Kafka Kafka is a unified platform for handling all the real-time data feeds. Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures. It has the ability to handle a large number of diverse consumers. Kafka is very fast, performs 2 million writes/sec. Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM). This makes it very efficient to transfer data from page cache to a network socket -----------
Kafka includes four core apis: The Producer API allows applications to send streams of data to topics in the Kafka cluster. The Consumer API allows applications to read streams of data from topics in the Kafka cluster. The Streams API allows transforming streams of data from input topics to output topics. The Connect API allows implementing connectors that continually pull from some source system or application into Kafka or push from Kafka into some sink system or application. Kafka exposes all its functionality over a language independent protocol which has clients available in many programming languages. However only the Java clients are maintained as part of the main Kafka project, the others are available as independent open source projects. A list of non-Java clients is available here. 2.1 Producer API The Producer API allows applications to send streams of data to topics in the Kafka cluster. Examples showing how to use the producer are given in the javadocs. To use the producer, you can use the following maven dependency: <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.10.1.0</version> </dependency> 2.2 Consumer API The Consumer API allows applications to read streams of data from topics in the Kafka cluster. Examples showing how to use the consumer are given in the javadocs. To use the consumer, you can use the following maven dependency: <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-clients</artifactId> <version>0.10.1.0</version> </dependency> 2.3 Streams API The Streams API allows transforming streams of data from input topics to output topics. Examples showing how to use this library are given in the javadocs Additional documentation on using the Streams API is available here. To use Kafka Streams you can use the following maven dependency: <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>0.10.1.0</version> </dependency> 2.4 Connect API The Connect API allows implementing connectors that continually pull from some source data system into Kafka or push from Kafka into some sink data system. Many users of Connect won't need to use this API directly, though, they can use pre-built connectors without needing to write any code. Additional information on using Connect is available here. Those who want to implement custom connectors can see the javadoc.
Kafka design fundamentals Kafka is neither a queuing platform where messages are received by a single consumer out of the consumer pool, nor a publisher-subscriber platform where messages are published to all the consumers. In a very basic structure, a producer publishes messages to a Kafka topic (synonymous with "messaging queue"). A topic is also considered as a message category or feed name to which messages are published. Kafka topics are created on a Kafka broker acting as a Kafka server. Kafka brokers also store the messages if required. Consumers then subscribe to the Kafka topic (one or more) to get the messages. Here, brokers and consumers use Zookeeper to get the state information and to track message offsets, respectively. This is described in the following diagram: In the preceding diagram, a single node—single broker architecture is shown with a topic having four partitions. In terms of the components, the preceding diagram shows all the five components of the Kafka cluster: Zookeeper, Broker, Topic, Producer, and Consumer. In Kafka topics, every partition is mapped to a logical log file that is represented as a set of segment files of equal sizes. Every partition is an ordered, immutable sequence of messages; each time a message is published to a partition, the broker appends the message to the last segment file. These segment files are flushed to disk after configurable numbers of messages have been published or after a certain amount of time has elapsed. Once the segment file is flushed, messages are made available to the consumers for consumption. All the message partitions are assigned a unique sequential number called the offset, which is used to identify each message within the partition. Each partition is optionally replicated across a configurable number of servers for fault tolerance. Each partition available on either of the servers acts as the leader and has zero or more servers acting as followers. Here the leader is responsible for handling all read and write requests for the partition while the followers asynchronously replicate data from the leader. Kafka dynamically maintains a set of in-sync replicas (ISR) that are caught-up to the leader and always persist the latest ISR set to ZooKeeper. In if the leader fails, one of the followers (in-sync replicas) will automatically become the new leader. In a Kafka cluster, each server plays a dual role; it acts as a leader for some of its partitions and also a follower for other partitions. This ensures the load balance within the Kafka cluster. The Kafka platform is built based on what has been learned from both the traditional platforms and has the concept of consumer groups. Here, each consumer is represented as a process and these processes are organized within groups called consumer groups. A message within a topic is consumed by a single process (consumer) within the consumer group and, if the requirement is such that a single message is to be consumed by multiple consumers, all these consumers need to be kept in different consumer groups. Consumers always consume messages from a particular partition sequentially and also acknowledge the message offset. This acknowledgement implies that the consumer has consumed all prior messages. Consumers issue an asynchronous pull request containing the offset of the message to be consumed to the broker and get the buffer of bytes. In line with Kafka's design, brokers are stateless, which means the message state of any consumed message is maintained within the message consumer, and the Kafka broker does not maintain a record of what is consumed by whom. If this is poorly implemented, the consumer ends up in reading the same message multiple times. If the message is deleted from the broker (as the broker doesn't know whether the message is consumed or not), Kafka defines the time-based SLA (service level agreement) as a message retention policy. In line with this policy, a message will be automatically deleted if it has been retained in the broker longer than the defined SLA period. This message retention policy empowers consumers to deliberately rewind to an old offset and re-consume data although, as with traditional messaging systems, this is a violation of the queuing contract with consumers. Let's discuss the message delivery semantic Kafka provides between producer and consumer. There are multiple possible ways to deliver messages, such as: Messages are never redelivered but may be lost Messages may be redelivered but never lost Messages are delivered once and only once When publishing, a message is committed to the log. If a producer experiences a network error while publishing, it can never be sure if this error happened before or after the message was committed. Once committed, the message will not be lost as long as either of the brokers that replicate the partition to which this message was written remains available. For guaranteed message publishing, configurations such as getting acknowledgements and the waiting time for messages being committed are provided at the producer's end. From the consumer point-of-view, replicas have exactly the same log with the same offsets, and the consumer controls its position in this log. For consumers, Kafka guarantees that the message will be delivered at least once by reading the messages, processing the messages, and finally saving their position. If the consumer process crashes after processing messages but before saving their position, another consumer process takes over the topic partition and may receive the first few messages, which are already processed. ------------------- Kafka Storage Kafka has a very simple storage layout. Each partition of a topic corresponds to a logical log. Physically, a log is implemented as a set of segment files of equal sizes. Every time a producer publishes a message to a partition, the broker simply appends the message to the last segment file. Segment file is flushed to disk after configurable numbers of messages have been published or after a certain amount of time elapsed. Messages are exposed to consumer after it gets flushed. Unlike traditional message system, a message stored in Kafka system doesn’t have explicit message ids. Messages are exposed by the logical offset in the log. This avoids the overhead of maintaining auxiliary, seek-intensive random-access index structures that map the message ids to the actual message locations. Messages ids are incremental but not consecutive. To compute the id of next message adds a length of the current message to its logical offset. Consumer always consumes messages from a particular partition sequentially and if the consumer acknowledges particular message offset, it implies that the consumer has consumed all prior messages. Consumer issues asynchronous pull request to the broker to have a buffer of bytes ready to consume. Each asynchronous pull request contains the offset of the message to consume. Kafka exploits the sendfile API to efficiently deliver bytes in a log segment file from a broker to a consumer. ---------------------- Kafka Broker Unlike other message system, Kafka brokers are stateless. This means that the consumer has to maintain how much it has consumed. Consumer maintains it by itself and broker would not do anything. Such design is very tricky and innovative in itself. It is very tricky to delete message from the broker as broker doesn't know whether consumer consumed the message or not. Kafka innovatively solves this problem by using a simple time-based SLA for the retention policy. A message is automatically deleted if it has been retained in the broker longer than a certain period. This innovative design has a big benefit, as consumer can deliberately rewind back to an old offset and re-consume data. This violates the common contract of a queue, but proves to be an essential feature for many consumers.
Role of ZooKeeper. A critical dependency of Apache Kafka is Apache Zookeeper, which is a distributed configuration and synchronization service. Zookeeper serves as the coordination interface between the Kafka brokers and consumers. The Kafka servers share information via a Zookeeper cluster. Kafka stores basic metadata in Zookeeper such as information about topics, brokers, consumer offsets (queue readers) and so on. Since all the critical information is stored in the Zookeeper and it normally replicates this data across its ensemble, failure of Kafka broker / Zookeeper does not affect the state of the Kafka cluster. Kafka will restore the state, once the Zookeeper restarts. This gives zero downtime for Kafka. The leader election between the Kafka broker is also done by using Zookeeper in the event of leader failure. ------------- Zookeeper: ZooKeeper serves as the coordination interface between the Kafka broker and consumers. The ZooKeeper overview given on the Hadoop Wiki site is as follows (http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription):"ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical name space of data registers (we call these registers znodes), much like a file system."The main differences between ZooKeeper and standard filesystems are that every znode can have data associated with it and znodes are limited to the amount of data that they can have. ZooKeeper was designed to store coordination data: status information, configuration, location information, and so on. ------------- Zookeeper and Kafka Consider a distributed system with multiple servers, each of which is responsible for holding data and performing operations on that data. Some potential examples are distributed search engine, distributed build system or known system like Apache Hadoop. One common problem with all these distributed systems is how would you determine which servers are alive and operating at any given point of time? Most importantly, how would you do these things reliably in the face of the difficulties of distributed computing such as network failures, bandwidth limitations, variable latency connections, security concerns, and anything else that can go wrong in a networked environment, perhaps even across multiple data centers? These types of questions are the focus of Apache ZooKeeper, which is a fast, highly available, fault tolerant, distributed coordination service. Using ZooKeeper you can build reliable, distributed data structures for group membership, leader election, coordinated workflow, and configuration services, as well as generalized distributed data structures like locks, queues, barriers, and latches. Many well-known and successful projects already rely on ZooKeeper. Just a few of them include HBase, Hadoop 2.0, Solr Cloud, Neo4J, Apache Blur (incubating), and Accumulo. ZooKeeper is a distributed, hierarchical file system that facilitates loose coupling between clients and provides an eventually consistent view of its znodes, which are like files and directories in a traditional file system. It provides basic operations such as creating, deleting, and checking existence of znodes. It provides an event-driven model in which clients can watch for changes to specific znodes, for example if a new child is added to an existing znode. ZooKeeper achieves high availability by running multiple ZooKeeper servers, called an ensemble, with each server holding an in-memory copy of the distributed file system to service client read requests. Figure 4 above shows typical ZooKeeper ensemble in which one server acting as a leader while the rest are followers. On start of ensemble leader is elected first and all followers replicate their state with leader. All write requests are routed through leader and changes are broadcast to all followers. Change broadcast is termed as atomic broadcast. Usage of Zookepper in Kafka: As for coordination and facilitation of distributed system ZooKeeper is used, for the same reason Kafka is using it. ZooKeeper is used for managing, coordinating Kafka broker. Each Kafka broker is coordinating with other Kafka brokers using ZooKeeper. Producer and consumer are notified by ZooKeeper service about the presence of new broker in Kafka system or failure of the broker in Kafka system. As per the notification received by the Zookeeper regarding presence or failure of the broker producer and consumer takes decision and start coordinating its work with some other broker. Overall Kafka system architecture is shown below in Figure 5 below.

Kafka. seattle data science and data engineering meetup

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Kafka. seattle data science and data engineering meetup

Similaire à Kafka. seattle data science and data engineering meetup (20)

Dernier

Dernier (20)

Kafka. seattle data science and data engineering meetup

Notes de l'éditeur