This document provides an overview of Oracle Coherence, an in-memory data grid. It discusses what a data grid is and how Coherence works, including clustering, caching, querying, and aggregating data. It also provides examples of how Coherence can be used and customer use cases, such as for user session management across brands.
6. <Insert Picture Here> “ A Data Grid is a system composed of multiple servers that work together to manage information and related operations - such as computations - in a distributed environment .” Cameron Purdy VP of Development, Oracle
18. Data Grid Uses Caching Applications request data from the Data Grid rather than backend data sources Analytics Applications ask the Data Grid questions from simple queries to advanced scenario modeling Transactions Data Grid acts as a transactional System of Record, hosting data and business logic Events Automated processing based on event
So why can’t we use database technology to bring high-performance transaction processing to Java applications? The problem is the classic mismatch between object and relational and the huge performance penalty translating back and forth between those two representations of the data. First the object data must be loaded into mid-tier memory from several relational database tables. Then the transaction (object method) is performed. Finally the data is written back to the relational database to commit the transaction and save session state. If another transaction (method call) is performed with the same object, this same process is repeated beginning to end. This performance problem is compounded in modern Event Driven Architectures where one object method call can spawn a whole succession of others.
It is a Development Library. In Java it is jars, dlls etc. We ship with other components Jars to support Spring and Groovy HTTP Session can be used for WLS, OAS. Large online retailer has unified shopping cart across multiple application servers. (WAS, .Net) WebInstaller which replaces default replication
Serialization Options Because serialization is often the most expensive part of clustered data management, Coherence provides the following options for serializing/deserializing data: java.io.Serializable – The simplest, but slowest option. com.tangosol.io.pof.PofSerializer – The Portable Object Format (also referred to as POF) is a language agnostic binary format. POF was designed to be incredibly efficient in both space and time and has become the recommended serialization option in Coherence. java.io.Externalizable – This requires developers to implement serialization manually, but can provide significant performance benefits. Compared to java.io.Serializable, this can cut serialized data size by a factor of two or more (especially helpful with Distributed caches, as they generally cache data in serialized form). Most importantly, CPU usage is dramatically reduced. com.tangosol.io.ExternalizableLite – This is very similar to java.io.Externalizable, but offers better performance and less memory usage by using a more efficient I/O stream implementation. com.tangosol.run.xml.XmlBean– A default implementation of ExternalizableLite (c) Copyright 2007. Oracle Corporation
Coherence provides several cache implementations: Local Cache—Local on-heap caching for non-clustered caching. Replicated Cache Service—Perfect for small, read-heavy caches. Partitioned Cache Service—True linear scalability for both read and write access. Data is automatically, dynamically and transparently partitioned across nodes. The distribution algorithm minimizes network traffic and avoids service pauses by incrementally shifting data. Near Cache—Provides the performance of local caching with the scalability of distributed caching. Several different near-cache strategies provide varying trade-offs between performance and synchronization guarantees. In-process caching provides the highest level of raw performance, since objects are managed within the local JVM. This benefit is most directly realized by the Local, Replicated, Optimistic and Near Cache implementations. Out-of-process (client/server) caching provides the option of using dedicated cache servers. This can be helpful when you want to partition workloads (to avoid stressing the application servers). This is accomplished by using the Partitioned cache implementation and simply disabling local storage on client nodes through a single command-line option or a one-line entry in the XML configuration. Tiered caching (using the Near Cache functionality) enables you to couple local caches on the application server with larger, partitioned caches on the cache servers, combining the raw performance of local caching with the scalability of partitioned caching. This is useful for both dedicated cache servers and co-located caching (cache partitions stored within the application server JVMs). Tech Details Appendix for Cache types/strategies Distributed Cache A distributed, or partitioned, cache is a clustered, fault-tolerant cache that has linear scalability. Data is partitioned among all the machines of the cluster. For fault-tolerance, partitioned caches can be configured to keep each piece of data on one or more unique machines within a cluster. Distributed caches are the most commonly used caches in Coherence. Replicated Cache A replicated cache is a clustered, fault tolerant cache where data is fully replicated to every member in the cluster. This cache offers the fastest read performance with linear performance scalability for reads but poor scalability for writes (as writes must be processed by every member in the cluster). Because data is replicated to all machines, adding servers does not increase aggregate cache capacity. Optimistic Cache An optimistic cache is a clustered cache implementation similar to the replicated cache implementation but without any concurrency control. This implementation offers higher write throughput than a replicated cache. It also allows an alternative underlying store for the cached data (for example, a MRU/MFU-based cache). However, if two cluster members are independently pruning or purging the underlying local stores, it is possible that a cluster member may have a different store content than that held by another cluster member. Near Cache A near cache is a hybrid cache; it typically fronts a distributed cache or a remote cache with a local cache. Near cache invalidates front cache entries, using configurable invalidation strategy, and provides excellent performance and synchronization. Near cache backed by a partitioned cache offers zero-millisecond local access for repeat data access, while enabling concurrency and ensuring coherency and fail-over, effectively combining the best attributes of replicated and partitioned caches. Local Cache A local cache is a cache that is local to (completely contained within) a particular cluster node. While it is not a clustered service, the Coherence local cache implementation is often used in combination with various clustered cache services. Remote Cache A remote cache describes any out of process cache accessed by a Coherence*Extend client. All cache requests are sent to a Coherence proxy where they are delegated to one of the other Coherence cache types (Repilcated, Optimistic, Partitioned).
Data Grids are used for different purposes. These are the four most common uses. Caching Coherence was the first technology to prove reliable distributed caching Helped many organizations alleviate data bottleneck issues and scale out application tier Analytics Enables applications to efficiently run queries across entire data grid Support for heavy query loads, while improving responsiveness of each query Server failures do not impact correctness of “in flight” queries and analytics Transactions Data Grid provides optimal platform for joining data and business logic Greater business agility by moving database stored procedures into the Data Grid Coherence reliability allows not only in-memory data processing, but provides the ability to commit transactions in-memory Reliability is key to conducting in-memory transactions. Coherence provides absolute reliability – every transaction matters. Events Oracle Coherence Data Grid manages processing state, guaranteeing once-and-only-once event processing Data Grid provides scalable management of event processing