+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย
Geode is Not a Cache, it's an Analytics Engine
1. Geode is Not a Cache,
it's an Analytics Engine!
By Evan Benoit (evan.benoit@resonate.com)
and Sharif Ghazzawi (sharif.ghazzawi@resonate.com)
2. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Who is Resonate?
Marketing and Advertising Technology Company
Located in Reston, VA
Give our clients insights into their customersโ values and motivations
Hiring Spring and Big Data Engineers!
http://www.resonate.com
2
3. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Kind of Data Do We Have?
We model predictions for thousands of different attributes
โข Likes, Dislikes, Motivations, Behavior, Sentiments
3
1000โs of attributes
200 million
cookies
1.7 trillion total
predictions!
1000โs of sites
21.9 billion total
site hits
4. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Do We Do With the Data?
Our SaaS platform computes thousands of insights for our clientsโ sites
โข Example: โHow many cookies hit my homepage yesterday that weโve modeled
as female democrats, and how does that compare to the general population?โ
4
Women Dems
Home
page
Women Dems
5. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/ 5
Each Insights Report requires
thousands of set operations
to be performed ad hoc,
within seconds!
6. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Key Take-aways
Geode can be used as more than a simple Key-Value cache; it can run functions on
data in-memory.
Probabilistic Data Structures can be used in many industries to perform set
operations at scale.
A Spring/Geode architecture can improve performance and scalability.
6
7. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
What Didnโt Work? HBase
Brute force approach: HBase co-processors sequentially scanning bitmaps
Completely inappropriate use of HBase!
40-node cluster, 30 second queries
Essentially using HBase as an in-memory database
7
1000โs of attributes
200 million
cookies
1000โs of sites
Sequential scan
Sequential scan
Sequential scan
8. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Probabilistic Data Structures for Estimating
Cardinality of a Set
We have a counting problem. You probably do, too.
Our users donโt require exact precision. Weโre not a bank!
Probabilistic data structures can estimate the cardinality of a set
Data uses in fixed amount of Time and Space
8
9. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Yahoo Theta Sketch
Yahooโs Theta Sketches give you estimated counts in a fixed amount of spaceโฆ
โฆ and they also support set operations!!
9
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
10. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Sketches Begin Multiplying Like Rabbits
Sketches canโt contain any additional metadata
We need a sketch for each attribute, for each tag
Next thing we know, we have 150 Million sketches, 2 Terabytes total
We need a place to store all these sketches
11
Example from https://datasketches.github.io/docs/Theta/ThetaJavaExample.html
1000โs of attributes
1000โs of
sites
11. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
We Need a Distributed In-memory Databaseโฆ
12. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Architecture
13. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Data Locality
โข We register Java methods built with the Theta Sketch library into Geode
โข These set operations run close to the data. No need to shuffle data between
nodes. The sketches never leave Geode; Geode just returns the final count.
Performance
โข Computing the cardinality of a set is now an O(1) lookup instead of O(n) full
table scan
โข Output of a set operation is a sketch rather than a number, allowing multiple
set operations to be chained together efficiently
14
14. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
System Characteristics
Fault Tolerance/Resiliency
โข Geode Locators and Servers can be added/removed with zero downtime
โข AWS Elastic Load Balancer (ELB) detects when a Geode ECS node is
unhealthy, kills the Docker container, spawns a new one
โข Nodes are distributed across multiple AWS Availability Zones
Scalability
โข Just add more servers and rebalance
15
15. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode Regions
Geode gives a lot of options for how to persist and replicate your data
Original design called for persistent, replicated, partitioned Geode regions
โข But persistence and replication made it difficult to swap out bad Geode nodes
โข It checks filesystem to ensure that no data was lost โ Slow!
โข Data is shuffled to honor the replication config โ Slow!
Solution: We use AWS S3 as our persistent, replicated layer, not Geode
โข Geode reads-through from S3 whenever it doesnโt have the data
โข We read-through "parcels" containing thousands of sketches instead of
individually one at-a-time
16
16. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Docker
Geode doesnโt easily support Docker/ECS
โข Recommended way of starting locators and servers is via Gfsh
โข Gfsh starts locator/server in the background then exits
โข Docker container exits/dies once there is no process running in the foreground
Solution: We added a dummy foreground process to keep Docker container up
17
17. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and ECS
Geode Locators keep state on local disk, which is transient in AWS ECS
โข Don't assume existence of a local disk
โข Makes it difficult to honor "12 factor app" principles
Solution: We deploy and associate Locator docker instances to EC2 nodes with
storage
18
18. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Spring Boot
Spring-data-geode didnโt fit our production architecture
โข Initially we tried embedding Geode in Spring Boot
โข No lifecycle hooks for Spring apps to tap into for heath checks
โข Makes designing fault tolerance/resiliency and scalability difficult
Solution: We run Geode as a standalone process, not embedded in spring
19
19. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Geode and Configuration Data
Improved configuration management flexibility
โข Geode comes with a tightly integrated configuration management sub-system
โข Configs are uploaded to locators, distributed to servers
Many organizations already have a configuration management system
โข e.g. consul, zookeeper, spring-cloud-config
Weโd love to see Geodeโs configuration system be pluggable/swappable
20
20. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Testing Distributed Systems
As with any distributed system, make sure you understand the consistency,
availability and partition-tolerance guarantees provided by your tools, and
ultimately your system
โข Identify what parts of your system will provide redundancy
โข How does your system respond to various failure scenarios?
โข Test, Test, Test those scenarios
21
21. U nless otherwise indicated, these slides are ยฉ 2013 -2018 Pivotal Software, Inc. and licensed under a Creative Com m ons
A ttribution-NonCom mercial license: http://creativecom m ons.org/licenses/by -nc/3.0/
Summary
We Deployed a Spring and Geode Architecture
Containing Yahoo Theta Sketches
Significantly improved our main reportโs performance
Reduced operating costs by 95% over our previous HBase implementation
Increased scalability
Simplified operations
Increased resiliency
22
22. Resonate is HIRING in RESTON!
Spring Engineers
Big Data Engineers (Spark, Geode, Hadoop, Kafka)
Dev Ops Engineers (AWS)
UX Engineers (Ember.js)
https://www.resonate.com/about/careers/
#springone@s1p