SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Getting Pulsar Spinning
Our story with adopting and deploying Apache Pulsar
About Me
SW Architect at Instructure
github.com/addisonj
twitter.com/addisonjh
Today’s Agenda
● How we built consensus around Pulsar
● How we built our Pulsar clusters
● How we continue to foster adoption
● A deep dive into one primary use case
Winning hearts and
minds
About
Instructure
● Company behind Canvas, a
Learning Management System
used by millions of students in
K-12 and Higher-Ed
● 2+ million concurrent active
users, Alexa top 100 site
● AWS shop with a wide variety
of languages/technologies
used
● Lots and lots of features
The problems
● Sprawl: As we began to use micro-services, we ended up with multiple
messaging (Kinesis and SQS) systems being used inconsistently
● Feature gaps: Kinesis doesn’t provide long term retention and is difficult to get
ordering to scale. SQS doesn’t provide replay and ordering is difficult
● Long delivery time: Kinesis really only worked for Lambda architecture, which
takes a long time to build. We want Kappa as much as possible
● Cost: We expected usage of messaging to increase drastically. Kinesis was
getting expensive
● Difficult to use: Outside of AWS Lambda, Kinesis is very difficult to use and not
ideal for most of our use cases
Why Pulsar?
● Most teams don’t need “log” capabilities (yet). They just want to send and
receive messages with a nice API (pub/sub)
● However, some use cases require order and high throughput, but we didn’t
want two systems
● Pulsar’s unified messaging model is not just marketing, it really provides us the
best of both worlds
● We want minimal management of any solution. No need to re-balance after
scaling up was very attractive
● We wanted low cost and long-term retention, tiered storage makes that
possible
● Multi-tenancy is built in, minimize the amount of tooling we need
The pitch
Three main talking points:
Capabilities
Unified pub/sub and log, infinite retention, multi-tenancy, functions,geo-replication, and schema
management, and integrations made engineers excited
Cost
Native k8s support to speed-up implementation, no rebalancing and tiered storage making it easy and low
cost to run and operate were compelling to management
Ecosystem
Pulsar is the backbone of an ecosystem of tools that make adoption easier and quickly enable new
capabilities helped establish a vision that everyone believed in
The plan
● Work iteratively
○ MVP release in 2 months with most major features but no SLAs
○ Near daily releases to focus on biggest immediate pains/biggest feature gaps
● Be transparent
○ We focused on speed as opposed to stability, repeatedly communicating that helped
○ We very clearly communicated risks and then showed regular progress on the most risky
aspects
● Focus on adoption/ecosystem
○ We continually communicated with teams to find the things that prevented them moving
forward (kinesis integration, easier login, etc)
○ We prioritized building the ecosystem over solving all operational challenges
The results
Timeline
● End of May 2019: Project approved
● July 3rd 2019: MVP release (pulsar + POC auth)
● July 17th 2019: Beta v1 release (improved tooling, ready for internal usage)
● Nov 20th 2019: First prod release to 8 AWS regions
Stats
● 4 applications in production, 4-5 more in development
● Heavy use of Pulsar IO (sources)
● 50k msgs/sec for largest cluster
● Team is 2 FT engineers + 2 split engineers
How we did it
Our Approach
● Pulsar is very full featured out
of the box, we wanted to build
as little as possible around it
● We wanted to use off-the-shelf
tools as much as possible and
focus on high-value “glue” to
integrate into our ecosystem
● Iterate, iterate, iterate
K8S
● We didn’t yet use K8S
● We made a bet that Pulsar on
K8S (EKS) would be quicker
than raw EC2
● It has mostly proved correct
and helped us make the most
of pulsar functions
● EKS Clusters provisioned with terraform and a few high value services
○ https://github.com/cloudposse/terraform-aws-eks-cluster
○ Core K8S services: EKS built-ins, datadog, fluentd, kiam, cert-manager, external-dns, k8s-
snapshots
● Started with Pulsar provided Helm charts
○ We migrated to Kustomize
● EBS PIOPS volumes for bookkeeper journal, GP2 for ledger
● Dedicated pool of compute (by using node selectors and taints) just for Pulsar
● A few bash cron jobs to keep things in sync with our auth service
● Make use of K8S services with NLB annotations + external DNS
K8S details
Auth
● Use built-in token auth with
minimal API for managing
tenants and role associations
● Core concept: Make an
association between a AWS
IAM principal and pulsar role.
Drop off credentials in a shared
location in Hashicorp Vault
● Accompanying CLI tool
(mb_util) to help teams manage
associations and fetch creds
● Implemented using gRPC
Workflows
Adding a new tenant:
● Jira ticket with an okta group
● mb_util superapi --server prod-iad create-tenant --tenant new_team --
groups new_team_eng
Logging in for management:
● mb_util pulsar-api-login --server prod-iad --tenant new_team
● Pops a browser window to login via okta
● Copy/Paste a JWT from the web app
● Generates a pulsar config file with creds and gives an env var to export for use with pulsar-admin
Workflows (cont.)
Associating an IAM principal
● mb_util api --server prod-iad associate-role --tenant my_team --iam-role
my_app_role --pulsar-role my_app
● The API generates a JWT with the given role + prefixed with tenant name, stores it in a shared
location (based on IAM principal) in hashicorp vault
● K8S cron job periodically updates the token
Fetching creds for clients:
● mb_util is a simple golang binary, easy to distribute and use in app startup script
● PULSAR_AUTH_TOKEN=$(mb_util pulsar-vault-login --tenant my_team -pulsar-
role my_app --vault-server https://vault_url)
● Grabs IAM creds, auth against vault, determines shared location, grabs credential
● Future version will support running in background and keeping a file up to date
Networking
● For internal communication, we
use EKS networking + K8S
services
● Clusters are not exposed over
public internet nor do we share
the VPC to apps. NLBs + VPC
endpoints expose a single
endpoint into other VPCs
● For geo-replication, we use
NLBs, VPC peering, route53
private zones, and external-dns
MessageBus Architecture
Global MessageBus
NOTE:
We have some issues
with how we run
global zookeeper, not
currently
recommended!
● Clusters are very easy to provision and scale up
● Pulsar has great performance even with small JVMs
● EBS volumes (via K8S PersistentVolumes) have worked great
● Managing certs/secrets is simple and easy to implement with a few lines of
bash
● We can experiment much quicker
● Just terraform and K8S (no config management!)
● Pulsar functions on K8S + cluster autoscaling is magical
What worked
What needed (and still needs) work
● Pulsar (and dependencies) aren’t perfect at dealing with dynamic names
○ When a bookie gets rescheduled, it get a new IP, Pulsar can cache that
■ FIX: tweak TCP keepalive /sbin/sysctl -w net.ipv4.tcp_keepalive_time=1
net.ipv4.tcp_keepalive_intvl=11 net.ipv4.tcp_keepalive_probes=3
○ When a broker gets rescheduled, a proxy can get out of sync. Clients can get confused as well
■ FIX: Don’t use zookeeper discovery, use a API discovery
■ FIX: Don’t use a deployment for brokers, use a statefulset
○ Pulsar function worker in broker can cause extra broker restarts
■ FIX: run separate function worker deployment
● Manual restarts sometimes needed
○ Use probes. if you want to optimize write availability, can use health-check
● Pulsar Proxy was not as “battle hardened”
● Large K8S deployments are still rare vs bare metal
● Zookeeper (especially global zookeeper) is a unique challenge
● Many random bugs in admin commands
The good news
Since we have deployed on K8S, lots of improvements have landed:
● Default helm templates are much better
● Lots of issues and bugs fixed that made running on K8S less stable
● Improvements in error-handling/refactoring are decreasing random bugs in
admin commands (more on the way)
● Docs are getting better
● Best practices are getting established
● A virtuous cycle is developing: Adoption -> Improvements from growing
community -> Easier to run -> Makes adoption easier
Continuing to build
and grow
Adoption
● Adoption within our company is
our primary KPI
● We focus on adding new
integrations and improving
tooling to make on-boarding
easy
● Pulsar is the core, but the
ecosystem built around it is the
real value
Pulsar Functions
● Pulsar Functions on K8S + cluster autoscaler really is a great capability that we
didn’t expect to be as popular
● Teams really like the model and gravitate to the feature
○ “Lambda like” functionality that is easier to use and more cost effective
● Sources/Sinks allow for much easier migration and on-boarding
● Per function K8S runtime options (https://pulsar.apache.org/docs/en/functions-runtime/#kubernetes-
customruntimeoptions) allow for us to isolate compute and give teams visibility into the
runtime but without having to manage any infrastructure
Kinesis Source/Sink
● We can easily help teams migrate from Kinesis
● Two options:
○ Change producer to write to Pulsar, Kinesis sink to existing Kinesis stream, slowly migrate
consumers (lambda functions)
○ Kinesis Source replicates to Pulsar, migrate consumers and then producers when finished
● Same concept works for SQS/SNS
Change Data Capture
● The CDC connector (https://pulsar.apache.org/docs/en/io-cdc/) is a really great way to start
moving towards more “evented” architectures
● We have an internal version (with plans to contribute) that we will use to get
CDC feeds for 60+ tables from 150+ Postgres clusters
● A very convenient way to implement the “outbox” pattern for consistent
application events
● Dynamodb streams connector also works well for CDC
Flink
● With K8S and zookeeper in place, running HA Flink (via
https://github.com/lyft/flinkk8soperator) is not much effort
● Flink + Pulsar is an extremely powerful stream processing tool that can replace
ETLs, Lambda architectures, and ( 🤞) entire applications
Bringing it all
together
Near real-time
Datalake
● Building a datalake for Canvas
that is latent only by a few
minutes is our largest, most
ambitious use case
● Pulsar offers a unique and
compelling set of of features to
make that all possible
Current data processing
● Like many companies, most of our data processes are currently batch with
some “speed layers” for lower latency
● Building speed layers and batch layers is very difficult and time consuming,
many jobs just have batch only, which means latency is high
● In many cases, building “incremental batch” jobs to only processes deltas is
very challenging, meaning many of our batch jobs rebuild the world which is
very expensive
● Extracting snapshots from the DBs is expensive and slow, need to store
multiple copies
● We allow customers to get extracts of their data every 24 hours with a rebuild-
the-world batch job, which is expensive and doesn’t meet their needs
Where we are headed
● CDC via a pulsar source allows teams to easily get their data into Pulsar with
low latency and no additional complexity in app
● Pulsar functions on K8S allows us to run 150+ of these sources with a very
clean API for teams that doesn’t require them to worry about compute or know
K8S
● Tiered storage, offloading, and compaction allow us to keep a complete view
of a table in Pulsar for a very low cost and low operational overhead
● Flink (especially with forthcoming batch reads!) allows for efficient “view”
building
● The same system that helps us do unified batch and streaming is also great for
those same teams to start adopting it for other messaging needs
Takeaways
Pulsar is
compelling
● Pulsar offers a suite of features
that is unique among its
competitors
● It may be less mature, but it is
improving rapidly
● It allows for rapid iteration and
deployment (especially via K8S)
● It truly can lower TCO while
offering more features and
better DX
Pulsar is getting
better
● Pulsar continues to add new
features and make
improvements each release
● Adoption among companies
continues
● The community is excellent and
growing each day
Questions?

Contenu connexe

Tendances

Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarStreamNative
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiStreamNative
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...StreamNative
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache PulsarStreamNative
 
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...StreamNative
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
 
Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)StreamNative
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...StreamNative
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemStreamNative
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStreamlio
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...StreamNative
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarKarthik Ramasamy
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streamingdatamantra
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionStreamNative
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
 

Tendances (20)

Scaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsarScaling customer engagement with apache pulsar
Scaling customer engagement with apache pulsar
 
Large scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_NozomiLarge scale log pipeline using Apache Pulsar_Nozomi
Large scale log pipeline using Apache Pulsar_Nozomi
 
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
Unify Storage Backend for Batch and Streaming Computation with Apache Pulsar_...
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
Transaction preview of Apache Pulsar
Transaction preview of Apache PulsarTransaction preview of Apache Pulsar
Transaction preview of Apache Pulsar
 
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
Exactly-Once Made Easy: Transactional Messaging in Apache Pulsar - Pulsar Sum...
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)Lessons from managing a Pulsar cluster (Nutanix)
Lessons from managing a Pulsar cluster (Nutanix)
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
 
Integrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data EcosystemIntegrating Apache Pulsar with Big Data Ecosystem
Integrating Apache Pulsar with Big Data Ecosystem
 
Stream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar FunctionsStream-Native Processing with Pulsar Functions
Stream-Native Processing with Pulsar Functions
 
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Pulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless EvolutionPulsar Storage on BookKeeper _Seamless Evolution
Pulsar Storage on BookKeeper _Seamless Evolution
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 

Similaire à Getting Pulsar Spinning_Addison Higham

Successful DevOps implementation for small teams a true story
Successful DevOps implementation for small teams  a true storySuccessful DevOps implementation for small teams  a true story
Successful DevOps implementation for small teams a true storyJakub Paweł Głazik
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformNagaraj Shenoy
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015aspyker
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for BeginnersDigitalOcean
 
Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesObjectRocket
 
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesDeploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesAll Things Open
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
MRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference ArchitectureMRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference ArchitectureNGINX, Inc.
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesVishal Biyani
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetesKasun Rajapakse
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanAmbassador Labs
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_NETWAYS
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka Hua Chu
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetesRishabh Indoria
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsRightScale
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Mathieu Herbert
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Dave Holland
 
Microservices Architecture with AWS @ AnyMind Group
Microservices Architecture with AWS @ AnyMind GroupMicroservices Architecture with AWS @ AnyMind Group
Microservices Architecture with AWS @ AnyMind GroupGiang Tran
 

Similaire à Getting Pulsar Spinning_Addison Higham (20)

Successful DevOps implementation for small teams a true story
Successful DevOps implementation for small teams  a true storySuccessful DevOps implementation for small teams  a true story
Successful DevOps implementation for small teams a true story
 
Openstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platformOpenstack - Enterprise cloud management platform
Openstack - Enterprise cloud management platform
 
Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015Triangle Devops Meetup 10/2015
Triangle Devops Meetup 10/2015
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 
Database as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on KubernetesDatabase as a Service (DBaaS) on Kubernetes
Database as a Service (DBaaS) on Kubernetes
 
Introduction to Kubernetes
Introduction to KubernetesIntroduction to Kubernetes
Introduction to Kubernetes
 
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on KubernetesDeploying Anything as a Service (XaaS) Using Operators on Kubernetes
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
 
Kubernetes intro
Kubernetes introKubernetes intro
Kubernetes intro
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
MRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference ArchitectureMRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
MRA AMA Part 10: Kubernetes and the Microservices Reference Architecture
 
Container Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher KubernetesContainer Conf 2017: Rancher Kubernetes
Container Conf 2017: Rancher Kubernetes
 
Container management with docker & kubernetes
Container management with docker & kubernetesContainer management with docker & kubernetes
Container management with docker & kubernetes
 
Dark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill MonkmanDark launching with Consul at Hootsuite - Bill Monkman
Dark launching with Consul at Hootsuite - Bill Monkman
 
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
OSDC 2018 | From Monolith to Microservices by Paul Puschmann_
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
 
Introduction to kubernetes
Introduction to kubernetesIntroduction to kubernetes
Introduction to kubernetes
 
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOpsKubernetes and Terraform in the Cloud: How RightScale Does DevOps
Kubernetes and Terraform in the Cloud: How RightScale Does DevOps
 
Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?Kubernetes: Managed or Not Managed?
Kubernetes: Managed or Not Managed?
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Microservices Architecture with AWS @ AnyMind Group
Microservices Architecture with AWS @ AnyMind GroupMicroservices Architecture with AWS @ AnyMind Group
Microservices Architecture with AWS @ AnyMind Group
 

Plus de StreamNative

Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...StreamNative
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...StreamNative
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022StreamNative
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...StreamNative
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...StreamNative
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022StreamNative
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...StreamNative
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022StreamNative
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...StreamNative
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022StreamNative
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022StreamNative
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021StreamNative
 

Plus de StreamNative (20)

Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
Blue-green deploys with Pulsar & Envoy in an event-driven microservice ecosys...
 
Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...Distributed Database Design Decisions to Support High Performance Event Strea...
Distributed Database Design Decisions to Support High Performance Event Strea...
 
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
Validating Apache Pulsar’s Behavior under Failure Conditions - Pulsar Summit ...
 
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
Cross the Streams! Creating Streaming Data Pipelines with Apache Flink + Apac...
 
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
Message Redelivery: An Unexpected Journey - Pulsar Summit SF 2022
 
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
Unlocking the Power of Lakehouse Architectures with Apache Pulsar and Apache ...
 
Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022Understanding Broker Load Balancing - Pulsar Summit SF 2022
Understanding Broker Load Balancing - Pulsar Summit SF 2022
 
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
Building an Asynchronous Application Framework with Python and Pulsar - Pulsa...
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
Pulsar @ Scale. 200M RPM and 1K instances - Pulsar Summit SF 2022
 
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022Welcome and Opening Remarks - Pulsar Summit SF 2022
Welcome and Opening Remarks - Pulsar Summit SF 2022
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021Improvements Made in KoP 2.9.0  - Pulsar Summit Asia 2021
Improvements Made in KoP 2.9.0 - Pulsar Summit Asia 2021
 

Dernier

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 

Dernier (17)

The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 

Getting Pulsar Spinning_Addison Higham

  • 1. Getting Pulsar Spinning Our story with adopting and deploying Apache Pulsar
  • 2. About Me SW Architect at Instructure github.com/addisonj twitter.com/addisonjh
  • 3. Today’s Agenda ● How we built consensus around Pulsar ● How we built our Pulsar clusters ● How we continue to foster adoption ● A deep dive into one primary use case
  • 5. About Instructure ● Company behind Canvas, a Learning Management System used by millions of students in K-12 and Higher-Ed ● 2+ million concurrent active users, Alexa top 100 site ● AWS shop with a wide variety of languages/technologies used ● Lots and lots of features
  • 6. The problems ● Sprawl: As we began to use micro-services, we ended up with multiple messaging (Kinesis and SQS) systems being used inconsistently ● Feature gaps: Kinesis doesn’t provide long term retention and is difficult to get ordering to scale. SQS doesn’t provide replay and ordering is difficult ● Long delivery time: Kinesis really only worked for Lambda architecture, which takes a long time to build. We want Kappa as much as possible ● Cost: We expected usage of messaging to increase drastically. Kinesis was getting expensive ● Difficult to use: Outside of AWS Lambda, Kinesis is very difficult to use and not ideal for most of our use cases
  • 7. Why Pulsar? ● Most teams don’t need “log” capabilities (yet). They just want to send and receive messages with a nice API (pub/sub) ● However, some use cases require order and high throughput, but we didn’t want two systems ● Pulsar’s unified messaging model is not just marketing, it really provides us the best of both worlds ● We want minimal management of any solution. No need to re-balance after scaling up was very attractive ● We wanted low cost and long-term retention, tiered storage makes that possible ● Multi-tenancy is built in, minimize the amount of tooling we need
  • 8. The pitch Three main talking points: Capabilities Unified pub/sub and log, infinite retention, multi-tenancy, functions,geo-replication, and schema management, and integrations made engineers excited Cost Native k8s support to speed-up implementation, no rebalancing and tiered storage making it easy and low cost to run and operate were compelling to management Ecosystem Pulsar is the backbone of an ecosystem of tools that make adoption easier and quickly enable new capabilities helped establish a vision that everyone believed in
  • 9. The plan ● Work iteratively ○ MVP release in 2 months with most major features but no SLAs ○ Near daily releases to focus on biggest immediate pains/biggest feature gaps ● Be transparent ○ We focused on speed as opposed to stability, repeatedly communicating that helped ○ We very clearly communicated risks and then showed regular progress on the most risky aspects ● Focus on adoption/ecosystem ○ We continually communicated with teams to find the things that prevented them moving forward (kinesis integration, easier login, etc) ○ We prioritized building the ecosystem over solving all operational challenges
  • 10. The results Timeline ● End of May 2019: Project approved ● July 3rd 2019: MVP release (pulsar + POC auth) ● July 17th 2019: Beta v1 release (improved tooling, ready for internal usage) ● Nov 20th 2019: First prod release to 8 AWS regions Stats ● 4 applications in production, 4-5 more in development ● Heavy use of Pulsar IO (sources) ● 50k msgs/sec for largest cluster ● Team is 2 FT engineers + 2 split engineers
  • 12. Our Approach ● Pulsar is very full featured out of the box, we wanted to build as little as possible around it ● We wanted to use off-the-shelf tools as much as possible and focus on high-value “glue” to integrate into our ecosystem ● Iterate, iterate, iterate
  • 13. K8S ● We didn’t yet use K8S ● We made a bet that Pulsar on K8S (EKS) would be quicker than raw EC2 ● It has mostly proved correct and helped us make the most of pulsar functions
  • 14. ● EKS Clusters provisioned with terraform and a few high value services ○ https://github.com/cloudposse/terraform-aws-eks-cluster ○ Core K8S services: EKS built-ins, datadog, fluentd, kiam, cert-manager, external-dns, k8s- snapshots ● Started with Pulsar provided Helm charts ○ We migrated to Kustomize ● EBS PIOPS volumes for bookkeeper journal, GP2 for ledger ● Dedicated pool of compute (by using node selectors and taints) just for Pulsar ● A few bash cron jobs to keep things in sync with our auth service ● Make use of K8S services with NLB annotations + external DNS K8S details
  • 15. Auth ● Use built-in token auth with minimal API for managing tenants and role associations ● Core concept: Make an association between a AWS IAM principal and pulsar role. Drop off credentials in a shared location in Hashicorp Vault ● Accompanying CLI tool (mb_util) to help teams manage associations and fetch creds ● Implemented using gRPC
  • 16. Workflows Adding a new tenant: ● Jira ticket with an okta group ● mb_util superapi --server prod-iad create-tenant --tenant new_team -- groups new_team_eng Logging in for management: ● mb_util pulsar-api-login --server prod-iad --tenant new_team ● Pops a browser window to login via okta ● Copy/Paste a JWT from the web app ● Generates a pulsar config file with creds and gives an env var to export for use with pulsar-admin
  • 17. Workflows (cont.) Associating an IAM principal ● mb_util api --server prod-iad associate-role --tenant my_team --iam-role my_app_role --pulsar-role my_app ● The API generates a JWT with the given role + prefixed with tenant name, stores it in a shared location (based on IAM principal) in hashicorp vault ● K8S cron job periodically updates the token Fetching creds for clients: ● mb_util is a simple golang binary, easy to distribute and use in app startup script ● PULSAR_AUTH_TOKEN=$(mb_util pulsar-vault-login --tenant my_team -pulsar- role my_app --vault-server https://vault_url) ● Grabs IAM creds, auth against vault, determines shared location, grabs credential ● Future version will support running in background and keeping a file up to date
  • 18. Networking ● For internal communication, we use EKS networking + K8S services ● Clusters are not exposed over public internet nor do we share the VPC to apps. NLBs + VPC endpoints expose a single endpoint into other VPCs ● For geo-replication, we use NLBs, VPC peering, route53 private zones, and external-dns
  • 20. Global MessageBus NOTE: We have some issues with how we run global zookeeper, not currently recommended!
  • 21. ● Clusters are very easy to provision and scale up ● Pulsar has great performance even with small JVMs ● EBS volumes (via K8S PersistentVolumes) have worked great ● Managing certs/secrets is simple and easy to implement with a few lines of bash ● We can experiment much quicker ● Just terraform and K8S (no config management!) ● Pulsar functions on K8S + cluster autoscaling is magical What worked
  • 22. What needed (and still needs) work ● Pulsar (and dependencies) aren’t perfect at dealing with dynamic names ○ When a bookie gets rescheduled, it get a new IP, Pulsar can cache that ■ FIX: tweak TCP keepalive /sbin/sysctl -w net.ipv4.tcp_keepalive_time=1 net.ipv4.tcp_keepalive_intvl=11 net.ipv4.tcp_keepalive_probes=3 ○ When a broker gets rescheduled, a proxy can get out of sync. Clients can get confused as well ■ FIX: Don’t use zookeeper discovery, use a API discovery ■ FIX: Don’t use a deployment for brokers, use a statefulset ○ Pulsar function worker in broker can cause extra broker restarts ■ FIX: run separate function worker deployment ● Manual restarts sometimes needed ○ Use probes. if you want to optimize write availability, can use health-check ● Pulsar Proxy was not as “battle hardened” ● Large K8S deployments are still rare vs bare metal ● Zookeeper (especially global zookeeper) is a unique challenge ● Many random bugs in admin commands
  • 23. The good news Since we have deployed on K8S, lots of improvements have landed: ● Default helm templates are much better ● Lots of issues and bugs fixed that made running on K8S less stable ● Improvements in error-handling/refactoring are decreasing random bugs in admin commands (more on the way) ● Docs are getting better ● Best practices are getting established ● A virtuous cycle is developing: Adoption -> Improvements from growing community -> Easier to run -> Makes adoption easier
  • 25. Adoption ● Adoption within our company is our primary KPI ● We focus on adding new integrations and improving tooling to make on-boarding easy ● Pulsar is the core, but the ecosystem built around it is the real value
  • 26. Pulsar Functions ● Pulsar Functions on K8S + cluster autoscaler really is a great capability that we didn’t expect to be as popular ● Teams really like the model and gravitate to the feature ○ “Lambda like” functionality that is easier to use and more cost effective ● Sources/Sinks allow for much easier migration and on-boarding ● Per function K8S runtime options (https://pulsar.apache.org/docs/en/functions-runtime/#kubernetes- customruntimeoptions) allow for us to isolate compute and give teams visibility into the runtime but without having to manage any infrastructure
  • 27. Kinesis Source/Sink ● We can easily help teams migrate from Kinesis ● Two options: ○ Change producer to write to Pulsar, Kinesis sink to existing Kinesis stream, slowly migrate consumers (lambda functions) ○ Kinesis Source replicates to Pulsar, migrate consumers and then producers when finished ● Same concept works for SQS/SNS
  • 28. Change Data Capture ● The CDC connector (https://pulsar.apache.org/docs/en/io-cdc/) is a really great way to start moving towards more “evented” architectures ● We have an internal version (with plans to contribute) that we will use to get CDC feeds for 60+ tables from 150+ Postgres clusters ● A very convenient way to implement the “outbox” pattern for consistent application events ● Dynamodb streams connector also works well for CDC
  • 29. Flink ● With K8S and zookeeper in place, running HA Flink (via https://github.com/lyft/flinkk8soperator) is not much effort ● Flink + Pulsar is an extremely powerful stream processing tool that can replace ETLs, Lambda architectures, and ( 🤞) entire applications
  • 31. Near real-time Datalake ● Building a datalake for Canvas that is latent only by a few minutes is our largest, most ambitious use case ● Pulsar offers a unique and compelling set of of features to make that all possible
  • 32. Current data processing ● Like many companies, most of our data processes are currently batch with some “speed layers” for lower latency ● Building speed layers and batch layers is very difficult and time consuming, many jobs just have batch only, which means latency is high ● In many cases, building “incremental batch” jobs to only processes deltas is very challenging, meaning many of our batch jobs rebuild the world which is very expensive ● Extracting snapshots from the DBs is expensive and slow, need to store multiple copies ● We allow customers to get extracts of their data every 24 hours with a rebuild- the-world batch job, which is expensive and doesn’t meet their needs
  • 33. Where we are headed ● CDC via a pulsar source allows teams to easily get their data into Pulsar with low latency and no additional complexity in app ● Pulsar functions on K8S allows us to run 150+ of these sources with a very clean API for teams that doesn’t require them to worry about compute or know K8S ● Tiered storage, offloading, and compaction allow us to keep a complete view of a table in Pulsar for a very low cost and low operational overhead ● Flink (especially with forthcoming batch reads!) allows for efficient “view” building ● The same system that helps us do unified batch and streaming is also great for those same teams to start adopting it for other messaging needs
  • 35. Pulsar is compelling ● Pulsar offers a suite of features that is unique among its competitors ● It may be less mature, but it is improving rapidly ● It allows for rapid iteration and deployment (especially via K8S) ● It truly can lower TCO while offering more features and better DX
  • 36. Pulsar is getting better ● Pulsar continues to add new features and make improvements each release ● Adoption among companies continues ● The community is excellent and growing each day