SlideShare a Scribd company logo
1 of 22
Download to read offline
Geode is Not a Cache,
it's an Analytics Engine!
By Evan Benoit (evan.benoit@resonate.com)
and Sharif Ghazzawi (sharif.ghazzawi@resonate.com)
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Who is Resonate?
Marketing and Advertising Technology Company
Located in Reston, VA
Give our clients insights into their customersโ€™ values and motivations
Hiring Spring and Big Data Engineers!
http://www.resonate.com
2
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Kind of Data Do We Have?
We model predictions for thousands of different attributes
โ€ข Likes, Dislikes, Motivations, Behavior, Sentiments
3
1000โ€™s of attributes
200 million
cookies
1.7 trillion total
predictions!
1000โ€™s of sites
21.9 billion total
site hits
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Do We Do With the Data?
Our SaaS platform computes thousands of insights for our clientsโ€™ sites
โ€ข Example: โ€œHow many cookies hit my homepage yesterday that weโ€™ve modeled
as female democrats, and how does that compare to the general population?โ€
4
Women Dems
Home
page
Women Dems
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ 5
Each Insights Report requires
thousands of set operations
to be performed ad hoc,
within seconds!
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Key Take-aways
Geode can be used as more than a simple Key-Value cache; it can run functions on
data in-memory.
Probabilistic Data Structures can be used in many industries to perform set
operations at scale.
A Spring/Geode architecture can improve performance and scalability.
6
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Didnโ€™t Work? HBase
Brute force approach: HBase co-processors sequentially scanning bitmaps
Completely inappropriate use of HBase!
40-node cluster, 30 second queries
Essentially using HBase as an in-memory database
7
1000โ€™s of attributes
200 million
cookies
1000โ€™s of sites
Sequential scan
Sequential scan
Sequential scan
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Probabilistic Data Structures for Estimating
Cardinality of a Set
We have a counting problem. You probably do, too.
Our users donโ€™t require exact precision. Weโ€™re not a bank!
Probabilistic data structures can estimate the cardinality of a set
Data uses in fixed amount of Time and Space
8
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Yahoo Theta Sketch
Yahooโ€™s Theta Sketches give you estimated counts in a fixed amount of spaceโ€ฆ
โ€ฆ and they also support set operations!!
9
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Sketches Begin Multiplying Like Rabbits
Sketches canโ€™t contain any additional metadata
We need a sketch for each attribute, for each tag
Next thing we know, we have 150 Million sketches, 2 Terabytes total
We need a place to store all these sketches
11
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
1000โ€™s of attributes
1000โ€™s of
sites
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
We Need a Distributed In-memory Databaseโ€ฆ
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Architecture
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Data Locality
โ€ข We register Java methods built with the Theta Sketch library into Geode
โ€ข These set operations run close to the data. No need to shuffle data between
nodes. The sketches never leave Geode; Geode just returns the final count.
Performance
โ€ข Computing the cardinality of a set is now an O(1) lookup instead of O(n) full
table scan
โ€ข Output of a set operation is a sketch rather than a number, allowing multiple
set operations to be chained together efficiently
14
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Fault Tolerance/Resiliency
โ€ข Geode Locators and Servers can be added/removed with zero downtime
โ€ข AWS Elastic Load Balancer (ELB) detects when a Geode ECS node is
unhealthy, kills the Docker container, spawns a new one
โ€ข Nodes are distributed across multiple AWS Availability Zones
Scalability
โ€ข Just add more servers and rebalance
15
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode Regions
Geode gives a lot of options for how to persist and replicate your data
Original design called for persistent, replicated, partitioned Geode regions
โ€ข But persistence and replication made it difficult to swap out bad Geode nodes
โ€ข It checks filesystem to ensure that no data was lost โ€“ Slow!
โ€ข Data is shuffled to honor the replication config โ€“ Slow!
Solution: We use AWS S3 as our persistent, replicated layer, not Geode
โ€ข Geode reads-through from S3 whenever it doesnโ€™t have the data
โ€ข We read-through "parcels" containing thousands of sketches instead of
individually one at-a-time
16
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Docker
Geode doesnโ€™t easily support Docker/ECS
โ€ข Recommended way of starting locators and servers is via Gfsh
โ€ข Gfsh starts locator/server in the background then exits
โ€ข Docker container exits/dies once there is no process running in the foreground
Solution: We added a dummy foreground process to keep Docker container up
17
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and ECS
Geode Locators keep state on local disk, which is transient in AWS ECS
โ€ข Don't assume existence of a local disk
โ€ข Makes it difficult to honor "12 factor app" principles
Solution: We deploy and associate Locator docker instances to EC2 nodes with
storage
18
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Spring Boot
Spring-data-geode didnโ€™t fit our production architecture
โ€ข Initially we tried embedding Geode in Spring Boot
โ€ข No lifecycle hooks for Spring apps to tap into for heath checks
โ€ข Makes designing fault tolerance/resiliency and scalability difficult
Solution: We run Geode as a standalone process, not embedded in spring
19
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Configuration Data
Improved configuration management flexibility
โ€ข Geode comes with a tightly integrated configuration management sub-system
โ€ข Configs are uploaded to locators, distributed to servers
Many organizations already have a configuration management system
โ€ข e.g. consul, zookeeper, spring-cloud-config
Weโ€™d love to see Geodeโ€™s configuration system be pluggable/swappable
20
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Testing Distributed Systems
As with any distributed system, make sure you understand the consistency,
availability and partition-tolerance guarantees provided by your tools, and
ultimately your system
โ€ข Identify what parts of your system will provide redundancy
โ€ข How does your system respond to various failure scenarios?
โ€ข Test, Test, Test those scenarios
21
U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Summary
We Deployed a Spring and Geode Architecture
Containing Yahoo Theta Sketches
Significantly improved our main reportโ€™s performance
Reduced operating costs by 95% over our previous HBase implementation
Increased scalability
Simplified operations
Increased resiliency
22
Resonate is HIRING in RESTON!
Spring Engineers
Big Data Engineers (Spark, Geode, Hadoop, Kafka)
Dev Ops Engineers (AWS)
UX Engineers (Ember.js)
https://www.resonate.com/about/careers/
#springone@s1p

More Related Content

What's hot

Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
DataWorks Summit
ย 
Big Data
Big DataBig Data
Big Data
Ben Duan
ย 
Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_data
xband
ย 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
DataWorks Summit
ย 
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
DataWorks Summit
ย 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
DataWorks Summit
ย 
Hadoop in the cloud โ€“ The what, why and how from the experts
Hadoop in the cloud โ€“ The what, why and how from the expertsHadoop in the cloud โ€“ The what, why and how from the experts
Hadoop in the cloud โ€“ The what, why and how from the experts
DataWorks Summit
ย 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
DataWorks Summit
ย 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
DataWorks Summit
ย 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
DataWorks Summit
ย 

What's hot (20)

Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
ย 
Big Data
Big DataBig Data
Big Data
ย 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
ย 
Cloudwatt pioneers big_data
Cloudwatt pioneers big_dataCloudwatt pioneers big_data
Cloudwatt pioneers big_data
ย 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
ย 
The Business Advantage of Hadoop: Lessons from the Field โ€“ Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field โ€“ Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field โ€“ Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field โ€“ Cloudera Summer We...
ย 
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
Building intelligent applications, experimental ML with Uberโ€™s Data Science W...
ย 
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the EnterpriseData Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
Data Science: Driving Smarter Finance and Workforce Decsions for the Enterprise
ย 
Hadoop in the cloud โ€“ The what, why and how from the experts
Hadoop in the cloud โ€“ The what, why and how from the expertsHadoop in the cloud โ€“ The what, why and how from the experts
Hadoop in the cloud โ€“ The what, why and how from the experts
ย 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
ย 
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
C* Summit EU 2013: Leveraging the Power of Cassandra: Operational Reporting a...
ย 
Complex Analytics using Open Source Technologies
Complex Analytics using Open Source TechnologiesComplex Analytics using Open Source Technologies
Complex Analytics using Open Source Technologies
ย 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
ย 
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
How to use flash drives with Apache Hadoop 3.x: Real world use cases and proo...
ย 
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
Real World Use Case with Cassandra (Eddie Satterly, DataNexus) | C* Summit 2016
ย 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
ย 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousing
ย 
Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
ย 
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
Realizing the Promise of Big Data with Hadoop - Cloudera Summer Webinar Serie...
ย 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
ย 

Similar to Geode is Not a Cache, it's an Analytics Engine

Similar to Geode is Not a Cache, it's an Analytics Engine (20)

YugaByte DBโ€”A Planet-Scale Database for Low Latency Transactional Apps
YugaByte DBโ€”A Planet-Scale Database for Low Latency Transactional AppsYugaByte DBโ€”A Planet-Scale Database for Low Latency Transactional Apps
YugaByte DBโ€”A Planet-Scale Database for Low Latency Transactional Apps
ย 
Achieving High Throughput With Reliability In Transactional Systems
Achieving High Throughput With Reliability In Transactional SystemsAchieving High Throughput With Reliability In Transactional Systems
Achieving High Throughput With Reliability In Transactional Systems
ย 
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...
Buckets, Funnels, Mobs and Cats or: How We Learned to Love Scaling Apps To Th...
ย 
Building Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data GridsBuilding Highly Scalable Spring Applications using In-Memory Data Grids
Building Highly Scalable Spring Applications using In-Memory Data Grids
ย 
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...Implementing a highly scalable stock prediction system with R, Geode, SpringX...
Implementing a highly scalable stock prediction system with R, Geode, SpringX...
ย 
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache CalciteEnable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
ย 
Migrating from Big Data Architecture to Spring Cloud
Migrating from Big Data Architecture to Spring CloudMigrating from Big Data Architecture to Spring Cloud
Migrating from Big Data Architecture to Spring Cloud
ย 
Developer Secure Containers for the Cyberspace Battlefield
Developer Secure Containers for the Cyberspace BattlefieldDeveloper Secure Containers for the Cyberspace Battlefield
Developer Secure Containers for the Cyberspace Battlefield
ย 
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache CalciteEnable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
Enable SQL/JDBC Access to Apache Geode/GemFire Using Apache Calcite
ย 
Fast and Furious: Searching in a Distributed World with Highly Available Spri...
Fast and Furious: Searching in a Distributed World with Highly Available Spri...Fast and Furious: Searching in a Distributed World with Highly Available Spri...
Fast and Furious: Searching in a Distributed World with Highly Available Spri...
ย 
Building Data Environments for Production Microservices with Geode
Building Data Environments for Production Microservices with GeodeBuilding Data Environments for Production Microservices with Geode
Building Data Environments for Production Microservices with Geode
ย 
Itโ€™s a Multi-Cloud World, But What About The Data?
Itโ€™s a Multi-Cloud World, But What About The Data?Itโ€™s a Multi-Cloud World, But What About The Data?
Itโ€™s a Multi-Cloud World, But What About The Data?
ย 
P to V to C: The Value of Bringing โ€œEverythingโ€ to Containers
P to V to C: The Value of Bringing โ€œEverythingโ€ to ContainersP to V to C: The Value of Bringing โ€œEverythingโ€ to Containers
P to V to C: The Value of Bringing โ€œEverythingโ€ to Containers
ย 
Federated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and BeyondFederated Queries with HAWQ - SQL on Hadoop and Beyond
Federated Queries with HAWQ - SQL on Hadoop and Beyond
ย 
Beyond Caching: Extending Redis Enterprise for Real-Time Streams Processing
Beyond Caching: Extending Redis Enterprise for Real-Time Streams ProcessingBeyond Caching: Extending Redis Enterprise for Real-Time Streams Processing
Beyond Caching: Extending Redis Enterprise for Real-Time Streams Processing
ย 
Kubernetes for the Spring Developer
Kubernetes for the Spring DeveloperKubernetes for the Spring Developer
Kubernetes for the Spring Developer
ย 
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
Machines Can Learn - a Practical Take on Machine Intelligence Using Spring Cl...
ย 
Hitting the Enterprise Sweet Spotโ€”A Real-World View of PKS Deployment and Suc...
Hitting the Enterprise Sweet Spotโ€”A Real-World View of PKS Deployment and Suc...Hitting the Enterprise Sweet Spotโ€”A Real-World View of PKS Deployment and Suc...
Hitting the Enterprise Sweet Spotโ€”A Real-World View of PKS Deployment and Suc...
ย 
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerceWhat We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
What We're Learning Adopting Spring Boot and PCF for Dell.com's eCommerce
ย 
Data Driven Action : A Primer on Data Science
Data Driven Action : A Primer on Data ScienceData Driven Action : A Primer on Data Science
Data Driven Action : A Primer on Data Science
ย 

More from VMware Tanzu

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
ย 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
ย 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
ย 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
ย 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
ย 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
ย 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
ย 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
ย 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
ย 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
ย 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
ย 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
ย 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
ย 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
ย 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
ย 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
ย 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
ย 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
ย 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
ย 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
anilsa9823
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
ย 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
ย 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
ย 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
ย 
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spacesย - and Epistemic Querying of RDF-...
ย 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
ย 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
ย 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ย 
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย 

Geode is Not a Cache, it's an Analytics Engine

  • 1. Geode is Not a Cache, it's an Analytics Engine! By Evan Benoit (evan.benoit@resonate.com) and Sharif Ghazzawi (sharif.ghazzawi@resonate.com)
  • 2. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Who is Resonate? Marketing and Advertising Technology Company Located in Reston, VA Give our clients insights into their customersโ€™ values and motivations Hiring Spring and Big Data Engineers! http://www.resonate.com 2
  • 3. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Kind of Data Do We Have? We model predictions for thousands of different attributes โ€ข Likes, Dislikes, Motivations, Behavior, Sentiments 3 1000โ€™s of attributes 200 million cookies 1.7 trillion total predictions! 1000โ€™s of sites 21.9 billion total site hits
  • 4. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Do We Do With the Data? Our SaaS platform computes thousands of insights for our clientsโ€™ sites โ€ข Example: โ€œHow many cookies hit my homepage yesterday that weโ€™ve modeled as female democrats, and how does that compare to the general population?โ€ 4 Women Dems Home page Women Dems
  • 5. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ 5 Each Insights Report requires thousands of set operations to be performed ad hoc, within seconds!
  • 6. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Key Take-aways Geode can be used as more than a simple Key-Value cache; it can run functions on data in-memory. Probabilistic Data Structures can be used in many industries to perform set operations at scale. A Spring/Geode architecture can improve performance and scalability. 6
  • 7. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ What Didnโ€™t Work? HBase Brute force approach: HBase co-processors sequentially scanning bitmaps Completely inappropriate use of HBase! 40-node cluster, 30 second queries Essentially using HBase as an in-memory database 7 1000โ€™s of attributes 200 million cookies 1000โ€™s of sites Sequential scan Sequential scan Sequential scan
  • 8. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Probabilistic Data Structures for Estimating Cardinality of a Set We have a counting problem. You probably do, too. Our users donโ€™t require exact precision. Weโ€™re not a bank! Probabilistic data structures can estimate the cardinality of a set Data uses in fixed amount of Time and Space 8
  • 9. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Yahoo Theta Sketch Yahooโ€™s Theta Sketches give you estimated counts in a fixed amount of spaceโ€ฆ โ€ฆ and they also support set operations!! 9 Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
  • 10. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Sketches Begin Multiplying Like Rabbits Sketches canโ€™t contain any additional metadata We need a sketch for each attribute, for each tag Next thing we know, we have 150 Million sketches, 2 Terabytes total We need a place to store all these sketches 11 Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html 1000โ€™s of attributes 1000โ€™s of sites
  • 11. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ We Need a Distributed In-memory Databaseโ€ฆ
  • 12. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Architecture
  • 13. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Characteristics Data Locality โ€ข We register Java methods built with the Theta Sketch library into Geode โ€ข These set operations run close to the data. No need to shuffle data between nodes. The sketches never leave Geode; Geode just returns the final count. Performance โ€ข Computing the cardinality of a set is now an O(1) lookup instead of O(n) full table scan โ€ข Output of a set operation is a sketch rather than a number, allowing multiple set operations to be chained together efficiently 14
  • 14. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ System Characteristics Fault Tolerance/Resiliency โ€ข Geode Locators and Servers can be added/removed with zero downtime โ€ข AWS Elastic Load Balancer (ELB) detects when a Geode ECS node is unhealthy, kills the Docker container, spawns a new one โ€ข Nodes are distributed across multiple AWS Availability Zones Scalability โ€ข Just add more servers and rebalance 15
  • 15. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode Regions Geode gives a lot of options for how to persist and replicate your data Original design called for persistent, replicated, partitioned Geode regions โ€ข But persistence and replication made it difficult to swap out bad Geode nodes โ€ข It checks filesystem to ensure that no data was lost โ€“ Slow! โ€ข Data is shuffled to honor the replication config โ€“ Slow! Solution: We use AWS S3 as our persistent, replicated layer, not Geode โ€ข Geode reads-through from S3 whenever it doesnโ€™t have the data โ€ข We read-through "parcels" containing thousands of sketches instead of individually one at-a-time 16
  • 16. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Docker Geode doesnโ€™t easily support Docker/ECS โ€ข Recommended way of starting locators and servers is via Gfsh โ€ข Gfsh starts locator/server in the background then exits โ€ข Docker container exits/dies once there is no process running in the foreground Solution: We added a dummy foreground process to keep Docker container up 17
  • 17. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and ECS Geode Locators keep state on local disk, which is transient in AWS ECS โ€ข Don't assume existence of a local disk โ€ข Makes it difficult to honor "12 factor app" principles Solution: We deploy and associate Locator docker instances to EC2 nodes with storage 18
  • 18. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Spring Boot Spring-data-geode didnโ€™t fit our production architecture โ€ข Initially we tried embedding Geode in Spring Boot โ€ข No lifecycle hooks for Spring apps to tap into for heath checks โ€ข Makes designing fault tolerance/resiliency and scalability difficult Solution: We run Geode as a standalone process, not embedded in spring 19
  • 19. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Geode and Configuration Data Improved configuration management flexibility โ€ข Geode comes with a tightly integrated configuration management sub-system โ€ข Configs are uploaded to locators, distributed to servers Many organizations already have a configuration management system โ€ข e.g. consul, zookeeper, spring-cloud-config Weโ€™d love to see Geodeโ€™s configuration system be pluggable/swappable 20
  • 20. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Testing Distributed Systems As with any distributed system, make sure you understand the consistency, availability and partition-tolerance guarantees provided by your tools, and ultimately your system โ€ข Identify what parts of your system will provide redundancy โ€ข How does your system respond to various failure scenarios? โ€ข Test, Test, Test those scenarios 21
  • 21. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ Summary We Deployed a Spring and Geode Architecture Containing Yahoo Theta Sketches Significantly improved our main reportโ€™s performance Reduced operating costs by 95% over our previous HBase implementation Increased scalability Simplified operations Increased resiliency 22
  • 22. Resonate is HIRING in RESTON! Spring Engineers Big Data Engineers (Spark, Geode, Hadoop, Kafka) Dev Ops Engineers (AWS) UX Engineers (Ember.js) https://www.resonate.com/about/careers/ #springone@s1p