SlideShare une entreprise Scribd logo
1  sur  34
• Sr. Consultant at Inmeta Consulting
• Current project: Skattetaten Grid POC
• Previous projects involving grid technologies:
• Mattilsynet food authority system.
• FrameSolution BPM framework used in Lovisa National Court
Authority(Norway), Mattilsynet Food Authority
• Other noteworthy projects
• Coca Cola Basis ERP system – Coca Cola Bottler factories
• mPower Mobilitec 300 million subscribers worldwide, and delivers
over 500,000 pieces of content every day.
• Big data, Databases are slow. Memory is FAST!
• Provides huge computing power.
• Tax calculation 
• Financial organizations
• Government organizations use it for communication and
data sharing between the different departments.
• Scientific computations
• MMORPG games
• General terminology relevant to Distributed Caching
• Challenges related to introducing distributed caching to
existing system
• Metrics and tuning
• Cache JSR – 107
• Java Data Grid JSR - 347
• In memory Data Grid
• Cluster
• Distribution
• Node – a member of a cluster
• Transaction awareness
• Colocation
• Map / Reduce
• Consistency
• Transaction scope
• Lockingdeadlocking
• Flushing policies
• Mixing the technology
stack.
• Performance
• Wow we did it!
• Our Custom cache is super fast, but its cache hit ratio is
rather low.
• Our custom cache has a tendency of getting dirty as the
updates to the shared data can not be propagated. At the
same time the separation of the data regions is not full.
• Marshaling is a rather slow and heavy process.
• We are facing a technological cocktail and we need to keep
integrity.
• Write through
• Write Behind
• Replication Queue
• Eviction
• Least Recently Used
• First In First Out
• LIRS
• Custom
• Expiration
• Invalidation
• Ref. Data vs Transactional
• Reference data: Good.
Max 30000 reads/sec 1k size
• Transactional data: Good.
Max 25000 writes/sec 1k size
.
• Reference data: Good.
30000 reads/sec per server.
Grow linearly by adding servers.
• Transactional data: Not so
good. Max 20000writes/second.
Drops if you add 3rd server to
2500.
• Ref. Data vs Transactional
• Reference data: Good.
Max 30000 reads/sec 1k size
• Transactional data: Good.
Max 25000 writes/sec 1k size
• Reference data(1kb):Good.
30000 reads/sec per server.
Grow linearly by adding servers.
• Transactional data(1kb):Good.
20000 writes/sec per server.
Grow linearly by adding servers.
• What is the size of our cluster? Reads vs. Writes
• Communication inside our grid
• UDP,TCP
• Synchronous vs. Asynchronous.
• What about the transaction isolation?
• Repeatable Reads vs. Read Committed
• What is the nature of our application?
• Read intensive data
• CMS systems
• Write Intensive Data
• Document Management System
• Level1 cache is
Supported only for
Distribution mode
• Level 1 cache might
have a performance
Impact in certain
systems
• Passivation
• Activation
• Hibernate
• Long running transactions need to be avoided.
• What is a long running transaction? How long is actually long.
• Read Committed vs Repeatable Reads
begin Update(A) Update(B) Update(C) Update(B)
Begin Update(C) Update(B) Release(A) Lock(A)
TX1 (Wants update A,B,C)
TX2 (Wants to update C,B,A)
C is locked
by
TX2
A is locked
by
TX1
begin get(k) - - Get(k)
Begin Get(k) put(k, v2) commit
What is returned??
TX1
TX2
• Java serialization
• Java externalization
• Impact on performance
• Generic domain.
• Transaction scope
• Lockingdeadlocking
• Flushing policies
• Mixing the technology
stack.
• Performance
• Wow we did it!
• Thank you for your attention
http://www.alachisoft.com/ncache/caching-topology.html
http://www.infoq.com/news/2011/10/java-data-grid
https://github.com/datagrids/spec/wiki
http://www.jboss.org/infinispan/documentation
http://code.google.com/p/thrift-protobuf-
compare/wiki/Benchmarking

Contenu connexe

Tendances

Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Gavin Heavyside
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceWSO2
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Felix GV
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsAshnikbiz
 
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Felix GV
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014Avinash Ramineni
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOLradiocats
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911Ines Sombra
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagationRegunath B
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Toolsbotsplash.com
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]Rainforest QA
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...rhatr
 

Tendances (20)

Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011Non-Relational Databases at ACCU2011
Non-Relational Databases at ACCU2011
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Application Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a ServiceApplication Development with Apache Cassandra as a Service
Application Development with Apache Cassandra as a Service
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016Fast Online Access to Massive Offline Data - SECR 2016
Fast Online Access to Massive Offline Data - SECR 2016
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017Introducing Venice - Strata NYC 2017
Introducing Venice - Strata NYC 2017
 
MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014MongoDB Replication fundamentals - Desert Code Camp - October 2014
MongoDB Replication fundamentals - Desert Code Camp - October 2014
 
Operationalizing MongoDB at AOL
Operationalizing MongoDB at AOLOperationalizing MongoDB at AOL
Operationalizing MongoDB at AOL
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Bootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source ToolsBootstrap SaaS startup using Open Source Tools
Bootstrap SaaS startup using Open Source Tools
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
Building Google-in-a-box: using Apache SolrCloud and Bigtop to index your big...
 

Similaire à Infinispan, transactional key value data grid and nosql database

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Dataconomy Media
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...Maya Lumbroso
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]AppFirst
 
Manage your compactions before they manage you!
Manage your compactions before they manage you!Manage your compactions before they manage you!
Manage your compactions before they manage you!Carlos Juzarte Rolo
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...Dataconomy Media
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Clustrix
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Anubhav Kale
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerR3
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Big Data Spain
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...Lucidworks
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsŁukasz Sowa
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Keynote oracle days final 16x9 v3.alain
Keynote oracle days final 16x9 v3.alainKeynote oracle days final 16x9 v3.alain
Keynote oracle days final 16x9 v3.alainDoina Draganescu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 

Similaire à Infinispan, transactional key value data grid and nosql database (20)

"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc..."An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
 
Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]Architecture for Scale [AppFirst]
Architecture for Scale [AppFirst]
 
Manage your compactions before they manage you!
Manage your compactions before they manage you!Manage your compactions before they manage you!
Manage your compactions before they manage you!
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Geode - Day 1
Geode - Day 1Geode - Day 1
Geode - Day 1
 
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...Dev Lakhani, Data Scientist at Batch Insights  "Real Time Big Data Applicatio...
Dev Lakhani, Data Scientist at Batch Insights "Real Time Big Data Applicatio...
 
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark Solving Office 365 Big Challenges using Cassandra + Spark
Solving Office 365 Big Challenges using Cassandra + Spark
 
Operational-Analytics
Operational-AnalyticsOperational-Analytics
Operational-Analytics
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problems
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Keynote oracle days final 16x9 v3.alain
Keynote oracle days final 16x9 v3.alainKeynote oracle days final 16x9 v3.alain
Keynote oracle days final 16x9 v3.alain
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Infinispan, transactional key value data grid and nosql database

  • 1.
  • 2. • Sr. Consultant at Inmeta Consulting • Current project: Skattetaten Grid POC • Previous projects involving grid technologies: • Mattilsynet food authority system. • FrameSolution BPM framework used in Lovisa National Court Authority(Norway), Mattilsynet Food Authority • Other noteworthy projects • Coca Cola Basis ERP system – Coca Cola Bottler factories • mPower Mobilitec 300 million subscribers worldwide, and delivers over 500,000 pieces of content every day.
  • 3. • Big data, Databases are slow. Memory is FAST! • Provides huge computing power. • Tax calculation  • Financial organizations • Government organizations use it for communication and data sharing between the different departments. • Scientific computations • MMORPG games
  • 4. • General terminology relevant to Distributed Caching • Challenges related to introducing distributed caching to existing system • Metrics and tuning
  • 5. • Cache JSR – 107 • Java Data Grid JSR - 347 • In memory Data Grid • Cluster • Distribution • Node – a member of a cluster • Transaction awareness • Colocation • Map / Reduce • Consistency
  • 6.
  • 7.
  • 8. • Transaction scope • Lockingdeadlocking • Flushing policies • Mixing the technology stack. • Performance
  • 9.
  • 10. • Wow we did it!
  • 11. • Our Custom cache is super fast, but its cache hit ratio is rather low. • Our custom cache has a tendency of getting dirty as the updates to the shared data can not be propagated. At the same time the separation of the data regions is not full. • Marshaling is a rather slow and heavy process. • We are facing a technological cocktail and we need to keep integrity.
  • 12. • Write through • Write Behind • Replication Queue
  • 13.
  • 14.
  • 15. • Eviction • Least Recently Used • First In First Out • LIRS • Custom • Expiration • Invalidation
  • 16. • Ref. Data vs Transactional • Reference data: Good. Max 30000 reads/sec 1k size • Transactional data: Good. Max 25000 writes/sec 1k size .
  • 17. • Reference data: Good. 30000 reads/sec per server. Grow linearly by adding servers. • Transactional data: Not so good. Max 20000writes/second. Drops if you add 3rd server to 2500.
  • 18. • Ref. Data vs Transactional • Reference data: Good. Max 30000 reads/sec 1k size • Transactional data: Good. Max 25000 writes/sec 1k size
  • 19. • Reference data(1kb):Good. 30000 reads/sec per server. Grow linearly by adding servers. • Transactional data(1kb):Good. 20000 writes/sec per server. Grow linearly by adding servers.
  • 20. • What is the size of our cluster? Reads vs. Writes • Communication inside our grid • UDP,TCP • Synchronous vs. Asynchronous. • What about the transaction isolation? • Repeatable Reads vs. Read Committed • What is the nature of our application? • Read intensive data • CMS systems • Write Intensive Data • Document Management System
  • 21. • Level1 cache is Supported only for Distribution mode • Level 1 cache might have a performance Impact in certain systems
  • 23. • Long running transactions need to be avoided. • What is a long running transaction? How long is actually long. • Read Committed vs Repeatable Reads
  • 24. begin Update(A) Update(B) Update(C) Update(B) Begin Update(C) Update(B) Release(A) Lock(A) TX1 (Wants update A,B,C) TX2 (Wants to update C,B,A) C is locked by TX2 A is locked by TX1
  • 25. begin get(k) - - Get(k) Begin Get(k) put(k, v2) commit What is returned?? TX1 TX2
  • 26.
  • 27.
  • 28.
  • 29. • Java serialization • Java externalization • Impact on performance • Generic domain.
  • 30.
  • 31. • Transaction scope • Lockingdeadlocking • Flushing policies • Mixing the technology stack. • Performance
  • 32. • Wow we did it!
  • 33. • Thank you for your attention

Notes de l'éditeur

  1. 10 years of experience as architect or developer. Participation in various large scale enterprise products for world leading companies like Coca Cola Computer Service, Mobilitec.Last 5 years focus on distributed system in the context of Business Process Management.
  2. A data grid is an architecture or set of services that enable individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data
  3. Data grids, or IMDGs (In-memory data grids) are, according to Gartner, defined as:IMDGs implement the notion of a "distributed, in-memory virtual data store" (typically called the data grid, but at times called the "cache" or "space" for historical reasons) by clustering the central memory (RAM) of multiple computers over a network. This allows applications to deal with very large (up to multiple petabytes in size, in some user experiences) in-memory data stores, and leverage fast and scalable access to data. IMDGs provide the mechanisms and APIs that presents to applications the memory of the clustered computers as a uniform, integrated data store. Applications don't need to know in which computer's RAM a given data object is stored to retrieve it. The IMDG runtime retrieves the required object across the data grid in a location-transparent way, while managing such issues as security, data integrity, availability and recovery, in case of system crashes.In this context there are many definitions of what a grid and a cache is, some of them more business oriented some more technical. In an attempt to answer the question“What a grid is?” I opened the unaproved yet JSR 347 specification and its glossary in order to define a fundament minimal framework of qualities that a grid corresponds to.Here is the full list.CacheA temporary in-memory store of data exhibiting high performance, threadsafe access. JSR 107 (Temporary caching for the Java Platform) covers this concept in more detail.Distributed CacheOften, data grids are used as distributed, cluster-aware in-memory caches, usually placed in front of a more expensive, slower data store such as a relational database. Standalone caches don't work in this regard, if the application tier is clustered, as caches could serve stale data. JSR 107 covers distributed caches as well to some degree.ClusterA set of servers connected via a network, usually a LAN.DistributionThe concept of a cluster spreading data across its various constituent nodes in a manner transparent to any client attempting to locate or use such data.NodeA member in a cluster. A node may be a separate physical machine, a separate virtual machine on the same physical host, or a separate JVM on the same physical or virtual machine. Typically, each node would have its own network address, such as an IP address and port, on which other nodes could connect to it.TransactionAn atomic unit of work. Transactions may be JTA and XA compliant.ColocationThe concept of ensuring data entries that are used together in the same transaction are stored on the same nodes in a cluster.Map/ReduceBased on Google's seminal paper from 2004, Map/Reduce allows computations on the entire data set to be broken down into tasks that run on each node and then aggregate results. It is a divide and conquer technique for dealing with large data sets.Eventual ConsistencyBased on Eric Brewer)'s CAP theorem which outlines desirable characteristics of distributed systems, Eventual Consistency is the result of attempting to provide high availability even during network partitions. See Coda Hale's excellent blog on the subject.
  4. We define this model of a real world application not as a blueprint. Actualy it is an application with many good sides and many flaws. Just like any average application.We have a group of several application servers forming a cluster in the Internal Network. Another set of tomcat or web servers in the demilitarized zone. A document management system and an ESB server.What we can say about this system is that it represents a classical medium to large scale application(more on the medium side). We have marshaling of the data once at the first firewall and second time at the second firewall. Although the app servers form a cluster there is no distributed caching within this system.
  5. Our backend demonstrates a very simple approach for preserving consistency of the data within the cluster. In order to avoid collisions on the heavily updated data.The architects of the system have decided to separate the data in regions and each server to point to its own region. Probably such approach might fix some of the issues in short term but it does not propose a long term solution. Heavily updated data will produce more and more Optimistic locking exceptions (If there is optimistic locking at all). Clear separation between the data is mission imposible there is always that small amount of shared data which will cause troubles.So our system expands with time and those flaws become more and more visible. At one point a distributed caching solution is offered to the client in an attempt to fix these data integrity issues.The Integration framework is a sort of anticoruption layer that keeps integrity and unifies the approach to entities with different origine. For example some Pojoes are just cached from Ephorte Document management system, some other entities in the cache originate from the DB layer. We have a system with many different sources of data that Is kept in the cache.
  6. On this slide we can observe some of the technological challenges that the system presents. We have a technological cocktail presented to us because of the 10 years history of development of this application. Many teams have worked upon this system during these 10 years. The agile approach which is more feature oriented clearly puts its tole upon this application as we have different modules written in different time period using different technologies. Here the anticorruption layer is making everything work transparently together.We can identify several problematic areas in terms of our future migration to distributed caching:Transaction scope (When we should use transaction, when we should split them in several, when we should not use transactions at all)Locking and potential deadlocking situations.Once we remove the Legacy Cache our Anticorruption layer will stop function, we will be exposed to the underlying technology.Performance, we should be ready that in the distributed cache might apear slower than our legacy cache because the old cache holds the hydrated value of the object.The access to this value is instant. At the same time the distributed cache holds its binary form which needs to be marshaled first. Our old cache at the same time is more prone to overflowing and cache misses so it is quite easy with a good test to demonstrate that actually the old cache is performing poorly under certain conditions.The mixture of the technology stack might present additional challenges. Mixing JPA code and JDBC for example should present a challenge in terms of flushing policy. In order to coexist the Entity Manager needs to be flushed every single time before and SQL query is executed. If you open the JPA specification and and the cache flush mode you may observe that this kind of behavior is default when JPA query is executed so this flushing will not present any performance issues as it is within the framework of the regular behavior. This is just a single example of low level challenges that needs to be solved. There are many others.
  7. Our Legacy cache might have many different interested parties. DTOs coming from EJB 2.1 entity beans(EJB 2.1 are not serialize able). JPA entities custom collections,Non persistent pojoes from different web services and so on….
  8. Our end goal is to remove the old legacy cache. At the same time we want the rest of our application to behave in exactly the same way as it was behaving before. Unfortunatly when we have removed the cache we have exposed the integration framework to the underlying technology so the majority of our code dealing with co-existence of the different technologies will reside exactly there. Our goal is to mimic the behavour of the old system and when we are not able to do that to minimize as much as possible the required changes.
  9. Replication can be synchronous or asynchronous(Wright through or write behind). Synchronous replication blocks the caller (e.g. on a put() ) until the modifications have been replicated successfully to all nodes in a cluster. Asynchronous replication performs replication in the background (the put() returns immediately).Infinispan offers a replication queue, where modifications are replicated periodically (i.e. interval-based), or when the queue size exceeds a number of elements, or a combination thereof. A replication queue can therefore offer much higher performance as the actual replication is performed by a background thread.Asynchronous replication is faster (no caller blocking), because synchronous replication requires acknowledgments from all nodes in a cluster that they received and applied the modification successfully (round-trip time). However, when a synchronous replication returns successfully, the caller knows for sure that all modifications have been applied to all cache instances, whereas this is not be the case with asynchronous replication. With asynchronous replication, errors are simply written to a log. Even when using transactions, a transaction may succeed but replication may not succeed on all cache instances.
  10. Invalidation is a clustered mode that does not actually share any data at all, but simply aims to remove data that may be stale from remote caches. This cache mode only makes sense if you have another, permanent store for your data such as a database and are only using Infinispan as an optimization in a read-heavy system, to prevent hitting the database every time you need some state. If a cache is configured for invalidation rather than replication, every time data is changed in a cache other caches in the cluster receive a message informing them that their data is now stale and should be evicted from memory.Invalidation too can be synchronous or asynchronous, and just as in the case of replication, synchronous invalidation blocks until all caches in the cluster receive invalidation messages and have evicted stale data while asynchronous invalidation works in a 'fire-and-forget' mode, where invalidation messages are broadcast but doesn't block and wait for responses.
  11. Distribution is a powerful clustering mode which allows Infinispan to scale linearly as more servers are added to the cluster. Hashing algorithm is configured with the number of copies each cache entry should be maintained cluster-wide. More copies, lower performance. Regardless of how many copies are maintained, distribution still scales linearly.
  12. Eviction refers to the process by which old, relatively unused, or excessively big data can be dropped from the cache, allowing the cache to remain within a memory budget.When a segment get full the eviction thread will be able to dispose its content. This is why usually eviction happens before the maximum of entries specified on a cache region is reached.We can define the lifespan of an entity. Once this lifespan is achieved the entity expires.Invalidation occurs when an entity is deleted from a cache region. This might occur for example if and entity is updated and its consistency needs to be preserved accros the different nodes.
  13. Reference data use means you cache something once and read it over and over again. So, there are a lot more reads than writes. On the other hand, transactional data use means that you're updating the data as frequently as you're reading it (or fairly close to it). A Mirrored Cache is a 2-server active/passive cache cluster. All the clients only connect to the active cache server and do their read and write operations against it. For all updates done to the cache (add, insert, and remove) the same updates are also made to the passive server but in the background and as bulk operations. This means that the clients don't have to wait for the updates to be done to the passive server. As soon as the active server is updated, the control returns to the client and then the passive server is updated by a background thread.This gives Mirrored Cache a significant performance boost over a Replicated Cache of the same size cluster. A Mirrored Cache is almost as fast as a stand-alone Local Cache which has no clustering cost. But, at the same time, a Mirrored Cache provides reliability through replication in case the active cache server goes down.If the active server ever goes down, the passive server automatically becomes active and all clients automatically connect to this new active server. All of this happens without any interruptions to your application. When we bring the previously active server back up, it joins the cluster and becomes passive since there is now another server that is already active.But, Mirrored Cache accommodates situations where you only have one dedicated cache server and the mirror server is being shared with other apps. But, if you have a need for 3 or more cache servers, then Partition-Replica Cache is the best choice for transactional use.
  14. A Replicated Cache consists of two or more cache servers in a cluster. Each cache server contains the entire cache and any updates to the cache on any server are applied synchronously to all the other servers in the cluster. Replicated Cache ensures that all updates to the cache are made as atomic operations, meaning either all cache servers are updated or none are updated. The benefit of Replicated Cache is the extremely fast GET performance. Whichever server a client is connected to always has the entire cache. As a result, all GET operations find the data locally on that cache server and this boost the GET speed. However, the cost of an update operation is not very scalable if you want to add servers to a Replicated Cache.
  15. A Partitioned Cache is intended for larger cache clusters as it is a very scalable caching topology. The cost of a GET or UPDATE operation remains fairly constant regardless of how big your cache cluster is. There are two reasons for it. First of all, the cache partitioning is based on a Hash Map algorithm (similar to a Hashtable). And, a distribution map is created and sent to all the clients that tells the clients which partition has the data or should have the data. This allows the clients to directly go to the cache server that has the data it is looking for.Secondly, all updates are made to only one server and therefore no sequencing logic is required. Obtaining a sequence adds on extra network round-trip in most cases.So, not only GET operations are as fast as Replicated Cache, the UPDATE operations are much faster and remain fast regardless of how large the cache cluster gets. This constant cost makes Partitioned Cache a highly scalable topology.However, please note that there is no replication in Partitioned Cache. So, if any cache server goes down, you lose that much cache. This may be okay in many object caching situations but is not okay when you're using the cache as your main data repository without the data existing in any master data source. A good example of this is ASP.NET Session State storage or JSP Servlet session storage in the cache.
  16. Partitioned-Replica Cache is a combination of Partitioned Cache and Replicated Cache. It gives you best of both worlds. You get reliability through replication and scalability through partitioning. Instead of replicating the cache over and over again if you have more than 2 servers in the cluster, you only replicate the cache once (meaning only two copies of the cache exist) regardless of how big the cache cluster is. This allows you to scale out through partitioning.
  17. Now this is the moment when we should take a calculator and start taking the metrics of our system such metrics are:Size of the clusterAverage size of the marshaled data.Size of the replication queue (if used)Do we have different locations.Average lock durationDo we use a persistent store (Hibernate)and more…7.What is our system read intensive, write intensive. Probably if it is write intensive we should think of Partitioned strategy.We can mix more than one topology within our system. User session data and reference data can use Replication at the same time update heavy data will use partitioning.For small clusters we will use TCP for large UDP . Why ? UDP creates smaller amount of network traffic. Probably one of the most important question is How many servers are we going to use in the grid.
  18. Level one cache is a region that may reside in every node. When asked the node for a value, if it does not exist the call will be propagated to another node. When the result is returned if L1 cache is enabled the result will be placed within this region for user defined time so that repetative calls will hit the L1 cache instead of doing remote calls everytime. Whenan entry in the cache is updated it needs to be invalidated across the whole cluster and in all L1 caches so if no repetitive calls are occurring within the system we might have a performance penalty for enabling this cashe.
  19. Invalidation, when used with a shared cache loader would cause remote caches to refer to the shared cache loader to retrieve modified data. The benefit of this is twofold: network traffic is minimized as invalidation messages are very small compared to replicating updated data, and also that other caches in the cluster look up modified data in a lazy manner, only when needed.Within Infinispan we have different CacheLaoders and CacheStore, every CacheLoader when persistent can be also called CacheStore. Through CacheLoader the laodingt process of particular value when it does not exist in memory can be detegated to a third party. The third party may be a persistent store like RDBMS or NoSQL database, it may be another cluster or something completely different. If passivation is enabled for a cachestore this means that an entity can exist either within the store the loader is pointing to or in the memory but not both. So we have an XOR condition between them.
  20. At the moment (Infinispan 5.0) two locking schemes are supported:pessimistic, in which locks are being acquired remotely on each transactional write.  This is named eager locking and is detailed here.a hybrid optimistic-pessimistic locking approach, in which local locks are being acquired as transaction progresses and remote locks are being acquired at prepare time. This is the default locking scheme.This document describes a replacement for the hybrid locking scheme with an optimistic scheme.  The rule of the thumb is that all READ operations are Lock free and all Write operations aquire a lock. This lock can be Remote(Cluster Wide) lock or local lock for the durationOf the transaction.Repeatable read is a higher isolation level, that in addition to the guarantees of the read committed level, it also guarantees that any data read cannot change, if the transaction reads the same data again, it will find the previously read data in place, unchanged, and available to read.
  21. A classicaldeadlockexample inInfinispan prior to version 5.1 whenoptimisticlockingwasimplemented is theexample from the slide.We have 2 transactions and READ_COMMITED transactionisolation. One setof 3 values and twotransactionupdatingthevalues in different order1tx. ABC, 2tx.CBA withinthetransactionlocksareobtained from A and C and theneachthread is holding a lock to a valuetheotherthreadswants to use. No thread is able to advance and so a deadlockoccures.Again in Infinispan 5.1 there is optimisticlocking. Butbecausewearetalkingabout a legacy system most probablywewill not be able to usethe latest version.
  22. One way to escape from the deadlock condition from the previous slide would be to simply elevate the transaction isolation level from READ_COMMITED to REPEATABLE_READ.This elevation might cause a slight performance decrease , but in the general case it should be so small that for most system it will be negligible. When we elevate the isolation level each thread will have its own view of the modified values which will be fixed for the given transaction span. So no matter if another transaction updates that value the original transaction view of the value will always be the same.On this slide the last get(K) method on the first transaction will return the same as the first get(K) although another transaction has updated the value in between first and the last call. If the isolation level was READ_COMMITED then the last get(K) would have returned the updated value.When we are talking of legacy applications with RDBMS store that most probably use some kind of entity framework Hibernate for example. They already behave in similar to REPEATABLE_READ isolation because of their Level 1 cache. The hibernate Level 1 cache acts per transaction, so one an entry is inserted there every repetable call will hit that value, no matter if another thread has already updated the value. So REPEATABLE_READ is 100% compatible with Hibernate application I would say that even it is recommended.
  23. Here we have jconsole coming with JDK 5,6,7 very useful tool for monitoring the registered JMX beans also the threads and the garbage collector. We can use it to monitor the \Infinispan Statistics.Important metricsAverage Read timeNumber of evictionsCache HitsCache MissesRead to Write ration – based on this we can define if a particular region needs replication strategy or partition
  24. Average Replication time. There is a timer set in the configuration if the timer is exceeded the transaction will fail. If we have such case we can either increase the timerOr we should think of custom serialization in attempt to make the entity more light wait. Or just think about how to minimize the amount of data send over the network.
  25. The number of the concurrent updates that might happen. The concurenthashmap is devided in segments based on the concurrency level. The best thing to do here Is just to read the javadoc of ConcurentHashMap.
  26. Standart Java serialization performance is very low. On the chart we can observe different serialziation frameworks and their performance. It depends on the test scenario butThe size of the marshaled data can be reduce 4 times if Externalization is used instead of Standart Java Serialization.
  27. Keep in mind that our integration framework still exists and the majority of our code dealing with co-existence of the different technologies will reside.Exactly there.