SlideShare a Scribd company logo
1 of 4
Download to read offline
Cache and Consistency in NOSQL



                                            Peng Xiang, Ruichun Hou and Zhiming Zhou
                                       Information Engineering Center, Ocean University of China
                                                            Qingdao, China
                                                      xiangpeng200 7@1 63.com


Abstract-In recent years a new type of database (NOSQL) has                              As we all know, the network traffic problem became the
emerged in the field of persistence. June 2009, a global                             largest obstacle. It has been shown that caching is the key
gathering    of   NOSQL         Movement    triggered       the    fuse   of         solution that reduces network traffic. In recently years, some
"database    revolution",       Non-relational    database        has   now          relation databases use the cache or to guarantee the data
become an extremely popular new areas, NOSQL aims to solve                           consistency and to reduce the database server's load. Just as
the needs of high concurrent read-write, efficient mass data
                                                                                     the hibernate L2 cache (Ehcache) in 12EE, but the traditional
storage and access, database scalability and high availability.
                                                                                     single cache can't work well especially in the concurrent
In these large-scale concurrent systems, cluster cache and data
                                                                                     access. Voldemort, a distributed key-value storage system,
consistency become the focus of attention. This article proposes
                                                                                     unlike relational database, it is basically just a big,
a fast and effective approach to estimate the solution of the
                                                                                     distributed, persistent, fault-tolerant hash table, and for
data consistency. We use node split and consistent hashing
techniques to reduce the servers load by analyzing the request
                                                                                     applications that can use an OIR mapper like active-record or
from the client. Each server node can be divided into a set of
                                                                                     hibernate this will provide horizontal scalability and much
virtual nodes, which located in a ring. The complexity caused                        higher availability but at great loss of convenience. In the
by the size of the set is determined by the target system.                           traditional persistence application, there are always two or
Through physical node split, we can reduce the cache miss rate.                      three sets servers, so, the scalable is not good, if one of the
In   order   to   reduce    the   complexity     of   the   algorithm of             servers crashed, the system may face paralysis. And now, the
consistency hashing, we will introduce a football game example.                      Cluster Layer Cache become increasingly popular, and it has
                                                                                     become a trend.MemCached (engineering team operating
     Keywords-persistence;NOSQL;clustercache;consistency;                            Live Journal's), a distributed memory object caching system,
algorithm                                                                            used to reduce the database load in a dynamic system[l] and
                                                                                     improve the performance.
                           I.     INTRODUCTION                                           The paper is structured as follows. Section 2 presents
                                                                                     cluster cache, we will introduce a distribute cache model ms­
    With the coming of Web 2.0, the most serious challenge
                                                                                     mc in non-relational database with MemCached and discuss
so far to the supremacy of RDBMSs in managing data is the
                                                                                     how to read and modify data to in order to maintain data
increasing need of mass data and concurrent access [2].
                                                                                     consistency and here we will reference Google App Engine.
When we running web applications, the backend database
                                                                                     Section 3 details the data consistency, we will introduce a
systems are often the performance bottleneck and the
                                                                                     data consistency mechanism, Paxos, a very efficient
existing database products seem to write data to increase the
                                                                                     transaction processing consistency algorithm, which can deal
load to much. Recently, Twitter, Digg and Reddit and many
                                                                                     the consistency management well and Section 4 concludes
other Web2.0 companies that switch from MySQL to non­
                                                                                     the paper.
relational database (NOSQL) to provide scalable data storage
solutions, has aroused strong developers interest on NOSQL.                                        II.   CLUSTER LAYER CACHE
Joe Stump who was the former chief architect of Digg said
his own business project now is to use NOSQL, and he lists a                             High performance scalable web applications often use a
                                                                                     distributed in-memory data cache in front of or in place of
series of questions to challenge SQL camps. He said, in
                                                                                     robust persistent storage for some tasks. Traditionally,
terms of scalability, speed, cost, non-relational database have
                                                                                     replication of the database among several servers removes
great advantage over relational databases in all the way.
These "NoSQL" databases tend to originate from large                                 the performance bottleneck of a centralized database.
                                                                                     However, this solution requires low-latency communication
internet companies that have to serve simple data structures
                                                                                     protocols between servers to implement update consistency
to millions of customers daily [4]. Such as Google File
System based on BigTable and Amazon's Dynamo are very                                protocols. Memcached has two core components: the server
                                                                                     (ms) and client (mc), in a memcached query, mc first
successful business NOSQL cases. The databases specialize
for certain use cases or data structures and run on commodity                        calculated the hash value of the given key to determine the
                                                                                     kV's location in the ms.when ms is confmned, the client will
hardware, as opposed to large traditional database clusters.
Their primary advantage is that, unlike relational databases,                        send a query request to the corresponding ms, and the ms
                                                                                     will find the exact data. Because there is no interaction and
they handle unstructured data such as word-processing files,
e-mail, multimedia, and social media efficiently.                                    multicast protocol among them, so the interaction of



978-1-4244-5540-9/10/$26.00 ©2010 IEEE




                                                                               117
memcached brings the minimal impact to the internet. Fig. 1               import net.sfjsr107.Cache;
shows an abstract view of the architecture of distributed                 import net.sfjsrl0 7.CacheException;
system, which reducing the system load.                                   import net.sfjsrl0 7.CacheManager;
                                                                          II ...
          MS                    MS                      MS                    Cache cache;
                                                                              Try
                                                                      { cache=CacheManager.getInstanceO.getCacheFactoryO.crea
                                                                      teCache(Collections.emptyMap()) ;} catch (CacheException
                                                                      e){
                                                                          II . ..
                I Consistency hash algorithm                              }
                                                                          String key; II ...
                               MC                                         byte [ ] value; II ...
                                                                          II Put the value into the cache.

                               l
                                                                          Cache. put (key, value);
                                                                          II Get the value from the cache.
                       appliaction                                        Value = (byte []) cache. get (key);

                                                                          App Engine uses distributed cache in order to maintain
             Figure 1. The architecture of distributed system         data consistency, and in this paper, we assume that our
                                                                      system is based on the masterlslaves system [7], there is a
    We can envisage that ms can also be divided, every ms             master server and a large number of slave servers. Each
can be treated as a physical node (PN), within each PN, there         server includes a mc and ms, and we use memcache caching
will be a variable number of virtual nodes (VN) running               mechanism in each server. Each mc is a client, when you
according to the available hardware capacity of the PN. So            need a query from a proxy server, and the proxy server you
the query content is divided into several parts, which                choose will check if the content you query is in the ms, and if
distributed in different ms. Fig. 2 describes the ms-mc               the ms can't find the required content, and then turn the next
structure, inside the ms, each virtual node connected through         proxy server. Some NOSQL databases use the index and
the network.
                                                                      query processing mechanism in the local database (node),
                                                                      and we can use a DHT indexing [3] scheme that enables the
                                                                      proxy server to directly deal the cache query. In this case, we
                                                                      can use the system's query processing program to broadcast
                                                                      the query to all other nodes on the DHT, each node complete
                                                                      the local search, and then return the result to the system
                                                                      processor as a response. Note that the search is parallel in all
                                                                      the nodes on the DHT.


                                                                                      III.   DATA CONSISTENCY
                                                                          With distributed caching mechanism, our next task is to
                                         Virtual Node                 ensure data consistency. We know that there are many types
                                                                      of NOSQL databases, such as Wide Column/Column
                                                                      Families, Document Store, Key ValuelTuple Store, and
                                                                      Eventually Consistent Key Value Store etc; our data needs to
                                                                      partitioned and replicated using consistent hashing, and
      MS                                                              consistency is facilitated by object versioning [5], we need a
                                                                      suitable consistent hashing algorithm to ensure the
                     Figure 2. The ms-mc structure                    consistency of replicas. The consistency among replicas
                                                                      during updates is maintained by a quorum-like technique and
    With the support of distributed cache, the query                  a decentralized replica synchronization protocol.[6]the initial
efficiency has been greatly improved. In this paper, we use           distributed architecture use hash 0 mod n, but its weakness is
JCache, a proposed interface standard for memory caches, as           that if single point failure, the system can not automatically
an interface to the App Engine memcache. This interface is            recover. In order to solve the single point of failure, we use
not yet an official standard, App Engine provides this                hash 0 mod (nl2), so anyone have two options which can be
interface using the net.sf.jsrl0 7 interface package. JCache          selected by client, but this program still has problems: load
provides a Map-like interface to cached data. You store and           imbalance, in particular, if one of the servers failures, the
retrieve values in the cache using keys. Keys and values can          other server has excessive pressure. In consistent hashing,
be of any Serializable type or class.                                 the output range of a hash function is treated as a fixed
    import java.util.Collections;                                     circular space or "ring" (i.e. the largest hash value wraps




                                                                118
around to the smallest hash value). Each node in the system               (the author of Google Chubby). Compared to 2PC/3PC,
is assigned a random value within this space which                        Paxos algorithm improved.
represents its "position" on the ring. Each data item                         PIa. Every time, Paxos instance will assigned a number
identified by a key is assigned to a node by hashing the data             which needs to increment, each replica doesn't accept the
item's key to yield its position on the ring, and then walking            proposal which is smaller to the greatest number in the
the ring clockwise to fmd the first node with a position larger           current.
than the item's position [8]. In order to solve load problem,                 P2. Once a value v is passed by replica, then this value
we introduce Consistent Hashing, as mentioned above, we                   can not be changed, that have no Byzantine question. Take
can use n servers and we can divide each server into several              the above football game as an example; if the participant
virtual nodes (v), each server can be seen as a physical node             accepts the plan, he can't change his idea no matter who ask
(pnode) and all the virtual nodes connected to a ring. Then               him to change. If most participants accept the plan, and the
we assigned the virtual nodes (n*v) to the consistory hash                proposal can be adopted. So the Paxos is more flexible than
ring randomly, so the users through the hash algorithm to                 2PC/3PC, in a cluster with 2f+1 nodes, we allow f nodes
find the first vnode in the circle, and if the vnode failures, we         unavailable.
can use the next vnode with a clockwise way. As Fig. 3
shows that if the hash value located in vnode a, but vnode a                                    IV.     CONCLUSIONS
crashed, then we will use vnode b. If all the vnodes failured,                Applications deployed on the Internet are immediately
then turn to the next vnode of next pnode.                                accessible to a vast population of potential users, and with
                                                                          the coming of WEB2.0, users need more and more data. As a
       •    Virtual node         PNodel                                   result, the current relational databases product exist serious
                                                                          load problem, a large number of unstructured information
                                                                          need to interacted with the database, so we need a distributed
                                                                          and scalable framework. Relational databases can also be
                                                                          distributed, but in many ways, compared with relational
                            I
                                                                          databases, NOSQL databases deal better, in particular in the
                            I                            PNode4           data consistency. In terms of technical system, relational
                            I      Calculate the                          databases more stress concentration or a limited nodes
                            �
                                   hash value                             cluster structure, but, don't design the content with low-cost
                                                                          of distributed computing unit. So that RDBMS can not
                                                                          support world-class information management technology
                                                                          like GoogleBigTable, Amazon Dynamo etc. In this paper, we
                           -0-                                            discussed distributed caching mechanism, and introduce the
     Request to the server                                                data consistency plan; there are many consistent hash
                                  PNode3                                  algorithms, such as gossip, DHT, and Map Reduce
                                                                          Execution and Data Partitioning etc. They all have their own
                   Figure 3. Example of hashing search
                                                                          advantages, here due to limited space; we can't give more
    At last, we will introduce a very useful consistent                   details about them. In Master/Slave model, we can treat the
hashing way: Paxos. Here we first give an example: A                      slave server as proxy server, the distribute cache we
football game question.                                                   introduced is equivalent to query caching to the database. We
    If you (coordinator) would like to organize a football                use the approach to scale the database in order to reduce the
game, the other side has 11 players, so, in addition to                   database load. Our future works are: provide a high cache hit
yourself, you have to organize 10 members. The questions                  rate algorithm to reduce the load on the central database
arise: first you have to call participants(10),and proposed that          server and efficiently ensure the cache consistency as the
the game will organized on Saturday 2am-llam.Every                        database is updated. As Ricky Ho said, compared with
participant should tell you if he can accept your proposal, if            relation database, the query search is nosql's short board. At
accept then the participant can not accept other requests at              present, many nosql databases are based on DHT
that time. In 2PC principle, if all participants agree the game,          (Distributed Hash Table) model, so the query is equivalent to
you notify them to give you a commit, or the game will                    access the hashtable, and in the future we will further study
cancel. The disadvantage of this approach is that if one                  query processing combined with NOSQL design patterns.
participant node crash suddenly, there will be inconsistency
in the cluster. Therefore, the 3PC replaced the 2PC theory:
the3PC divide the commit process into two, precommit and
commit. The disadvantage is that you and participants belong                                          REFERENCES
to two engine room, if the network between you disconnect,
                                                                          [1]   A.P.Atlee and H.Gannon, "Specifying and Verifying Requirements of
then your size will never receive the commit from                               Real-Time Systems," Proc. IEEE Trans.Software Engineering, IEEE
participants and you will choose to abort. "There is only one                   Press, Jan.1993, pp. 41-45, doi:10.l 109/32.210306.
consensus protocol, and that's Paxos, all other approaches                [2]   R.V.Nageshwara and V.Kumar, "Concurrent Access," Proc. IEEE
are just broken versions of Paxos."Said by Mike Burrows                         Trans.Computer,     IEEE Press,  Dec.l988,     pp.1657-1665,
                                                                                doi:10.1109112.9744.




                                                                    119
[3]   Yuzhe Tang, Shuigeng Zhou, and Jianliang Xu, "LIGHT: A Query­              [6]   L. Lamport, "Time, clocks, and the ordering of events in a distributed
      Efficient Yet Low-Maintenance Indexing Scheme over DHTs,"Proc.                   system,"Comrnunications of the ACM,VoI.7, pp. 558-565, July.l978.
      IEEE Trans.Knowledge and Data Engineering, IEEE Press, Jan.2010,           [7]   A.Olson and K.G.Shin, "Probabilistic Clock Synchronization in
      pp.59-75, doi:10.1109ITKDE.2009.47                                               Large Distributed Systems," Proc.lEEE Trans.Computers, IEEE Press,
[4]   Neal Leavitt, "Will NoSQL Databases Live Up to Their Promise?"                   Sep.1994, pp.1106-1112, doi:10.l 109/12.312120
      Compuer,Vol. 43, pp.12- l 4, Feb.2010.                                     [8]   D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D.
[5]   M. Andrew, ""NoSQL" databases in CMS Data and Workflow                           Lewin, "Consistent hashing and random trees: distributed caching
      Management," 13th International Workshop on Advanced Computing                   protocols for relieving hot spots on the World Wide Web," In
      and Analysis Techniques in Physics Research, Physics Department,                 Proceedings of the Twenty-NinthAnnual ACM Symposium on theory
      University of Rajasthan . LNM Institute of Information Technology.               of computing. EI Paso.Texas.United States, pp.654-663,1997.
      Jaipur, pp. 22-27, Feb 2010.




                                                                           120

More Related Content

What's hot

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...
Cisco and Greenplum  Partner to Deliver  High-Performance  Hadoop Reference  ...Cisco and Greenplum  Partner to Deliver  High-Performance  Hadoop Reference  ...
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...EMC
 
Digital Kniznica 09 Ben Osteen Oxford
Digital Kniznica 09 Ben Osteen OxfordDigital Kniznica 09 Ben Osteen Oxford
Digital Kniznica 09 Ben Osteen Oxfordguest045ab43
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012DATAVERSITY
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudRightScale
 
Cloud lockin and interoperability v2 indic threads cloud computing conferen...
Cloud lockin and interoperability v2   indic threads cloud computing conferen...Cloud lockin and interoperability v2   indic threads cloud computing conferen...
Cloud lockin and interoperability v2 indic threads cloud computing conferen...IndicThreads
 
Getting Started with jClouds: Multi Cloud Framework
Getting Started with jClouds: Multi Cloud FrameworkGetting Started with jClouds: Multi Cloud Framework
Getting Started with jClouds: Multi Cloud FrameworkIndicThreads
 
Brief vss
Brief vssBrief vss
Brief vssignaky
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Samsung Business USA
 
Dell - Storage 12sept2012
Dell - Storage 12sept2012Dell - Storage 12sept2012
Dell - Storage 12sept2012Agora Group
 
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityFrom IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityDaoliCloud Ltd
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataRichard McDougall
 
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solution
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solutionDrive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solution
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solutionPrincipled Technologies
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 

What's hot (18)

Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...
Cisco and Greenplum  Partner to Deliver  High-Performance  Hadoop Reference  ...Cisco and Greenplum  Partner to Deliver  High-Performance  Hadoop Reference  ...
Cisco and Greenplum Partner to Deliver High-Performance Hadoop Reference ...
 
Digital Kniznica 09 Ben Osteen Oxford
Digital Kniznica 09 Ben Osteen OxfordDigital Kniznica 09 Ben Osteen Oxford
Digital Kniznica 09 Ben Osteen Oxford
 
chubby_osdi06
chubby_osdi06chubby_osdi06
chubby_osdi06
 
The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012The CIOs Guide to NoSQL 2012
The CIOs Guide to NoSQL 2012
 
NuoDB Product Brochure
NuoDB Product BrochureNuoDB Product Brochure
NuoDB Product Brochure
 
CodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the CloudCodeFutures - Scaling Your Database in the Cloud
CodeFutures - Scaling Your Database in the Cloud
 
Cloud lockin and interoperability v2 indic threads cloud computing conferen...
Cloud lockin and interoperability v2   indic threads cloud computing conferen...Cloud lockin and interoperability v2   indic threads cloud computing conferen...
Cloud lockin and interoperability v2 indic threads cloud computing conferen...
 
Getting Started with jClouds: Multi Cloud Framework
Getting Started with jClouds: Multi Cloud FrameworkGetting Started with jClouds: Multi Cloud Framework
Getting Started with jClouds: Multi Cloud Framework
 
Brief vss
Brief vssBrief vss
Brief vss
 
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
Big Data SSD Architecture: Digging Deep to Discover Where SSD Performance Pay...
 
Dell - Storage 12sept2012
Dell - Storage 12sept2012Dell - Storage 12sept2012
Dell - Storage 12sept2012
 
No sql exploration keyvaluestore
No sql exploration   keyvaluestoreNo sql exploration   keyvaluestore
No sql exploration keyvaluestore
 
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking ScalabilityFrom IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
From IaaS to PaaS to Docker Networking to … Cloud Networking Scalability
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
Bh25352355
Bh25352355Bh25352355
Bh25352355
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solution
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solutionDrive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solution
Drive new initiatives with a powerful Dell EMC, Nutanix, and Toshiba solution
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 

Similar to Cache and consistency in nosql

A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedJoão Gabriel Lima
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesJoão Gabriel Lima
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdfneju3
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase Türkiye
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastInside Analysis
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Artan Ajredini
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentationSalma Gouia
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic WebStefan Ceriu
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebStefan Prutianu
 

Similar to Cache and consistency in nosql (20)

A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributed
 
Megastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive servicesMegastore providing scalable, highly available storage for interactive services
Megastore providing scalable, highly available storage for interactive services
 
access.2021.3077680.pdf
access.2021.3077680.pdfaccess.2021.3077680.pdf
access.2021.3077680.pdf
 
No sql
No sqlNo sql
No sql
 
Datastores
DatastoresDatastores
Datastores
 
No sql database
No sql databaseNo sql database
No sql database
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem Performans
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
Introducing Mache
Introducing MacheIntroducing Mache
Introducing Mache
 
The NoSQL Movement
The NoSQL MovementThe NoSQL Movement
The NoSQL Movement
 
Report 2.0.docx
Report 2.0.docxReport 2.0.docx
Report 2.0.docx
 
Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.Redis Cashe is an open-source distributed in-memory data store.
Redis Cashe is an open-source distributed in-memory data store.
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
No sqlpresentation
No sqlpresentationNo sqlpresentation
No sqlpresentation
 
1 ieee98
1 ieee981 ieee98
1 ieee98
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
NOSQL
NOSQLNOSQL
NOSQL
 

More from João Gabriel Lima

Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationJoão Gabriel Lima
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackJoão Gabriel Lima
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitJoão Gabriel Lima
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência ArtificialJoão Gabriel Lima
 
Mineração de Dados no Weka - Regressão Linear
Mineração de Dados no Weka -  Regressão LinearMineração de Dados no Weka -  Regressão Linear
Mineração de Dados no Weka - Regressão LinearJoão Gabriel Lima
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoJoão Gabriel Lima
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google HackingJoão Gabriel Lima
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisJoão Gabriel Lima
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...João Gabriel Lima
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoJoão Gabriel Lima
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaJoão Gabriel Lima
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideJoão Gabriel Lima
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?João Gabriel Lima
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...João Gabriel Lima
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosJoão Gabriel Lima
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.jsJoão Gabriel Lima
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptJoão Gabriel Lima
 

More from João Gabriel Lima (20)

Cooking with data
Cooking with dataCooking with data
Cooking with data
 
Deep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer SegmentationDeep marketing - Indoor Customer Segmentation
Deep marketing - Indoor Customer Segmentation
 
Aplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full StackAplicações de Alto Desempenho com JHipster Full Stack
Aplicações de Alto Desempenho com JHipster Full Stack
 
Realidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKitRealidade aumentada com react native e ARKit
Realidade aumentada com react native e ARKit
 
JS - IA
JS - IAJS - IA
JS - IA
 
Big data e Inteligência Artificial
Big data e Inteligência ArtificialBig data e Inteligência Artificial
Big data e Inteligência Artificial
 
Mineração de Dados no Weka - Regressão Linear
Mineração de Dados no Weka -  Regressão LinearMineração de Dados no Weka -  Regressão Linear
Mineração de Dados no Weka - Regressão Linear
 
Segurança na Internet - Estudos de caso
Segurança na Internet - Estudos de casoSegurança na Internet - Estudos de caso
Segurança na Internet - Estudos de caso
 
Segurança na Internet - Google Hacking
Segurança na Internet - Google  HackingSegurança na Internet - Google  Hacking
Segurança na Internet - Google Hacking
 
Segurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentaisSegurança na Internet - Conceitos fundamentais
Segurança na Internet - Conceitos fundamentais
 
Web Machine Learning
Web Machine LearningWeb Machine Learning
Web Machine Learning
 
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
Mineração de Dados com RapidMiner - Um Estudo de caso sobre o Churn Rate em...
 
Mineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - ClusterizaçãoMineração de dados com RapidMiner + WEKA - Clusterização
Mineração de dados com RapidMiner + WEKA - Clusterização
 
Mineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e WekaMineração de dados na prática com RapidMiner e Weka
Mineração de dados na prática com RapidMiner e Weka
 
Visualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark sideVisualizacao de dados - Come to the dark side
Visualizacao de dados - Come to the dark side
 
REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?REST x SOAP : Qual abordagem escolher?
REST x SOAP : Qual abordagem escolher?
 
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
Game of data - Predição e Análise da série Game Of Thrones a partir do uso de...
 
E-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãosE-trânsito cidadão - IPVA em suas mãos
E-trânsito cidadão - IPVA em suas mãos
 
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js[Estácio - IESAM] Automatizando Tarefas com Gulp.js
[Estácio - IESAM] Automatizando Tarefas com Gulp.js
 
Hackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com JavascriptHackeando a Internet das Coisas com Javascript
Hackeando a Internet das Coisas com Javascript
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

Cache and consistency in nosql

  • 1. Cache and Consistency in NOSQL Peng Xiang, Ruichun Hou and Zhiming Zhou Information Engineering Center, Ocean University of China Qingdao, China xiangpeng200 7@1 63.com Abstract-In recent years a new type of database (NOSQL) has As we all know, the network traffic problem became the emerged in the field of persistence. June 2009, a global largest obstacle. It has been shown that caching is the key gathering of NOSQL Movement triggered the fuse of solution that reduces network traffic. In recently years, some "database revolution", Non-relational database has now relation databases use the cache or to guarantee the data become an extremely popular new areas, NOSQL aims to solve consistency and to reduce the database server's load. Just as the needs of high concurrent read-write, efficient mass data the hibernate L2 cache (Ehcache) in 12EE, but the traditional storage and access, database scalability and high availability. single cache can't work well especially in the concurrent In these large-scale concurrent systems, cluster cache and data access. Voldemort, a distributed key-value storage system, consistency become the focus of attention. This article proposes unlike relational database, it is basically just a big, a fast and effective approach to estimate the solution of the distributed, persistent, fault-tolerant hash table, and for data consistency. We use node split and consistent hashing techniques to reduce the servers load by analyzing the request applications that can use an OIR mapper like active-record or from the client. Each server node can be divided into a set of hibernate this will provide horizontal scalability and much virtual nodes, which located in a ring. The complexity caused higher availability but at great loss of convenience. In the by the size of the set is determined by the target system. traditional persistence application, there are always two or Through physical node split, we can reduce the cache miss rate. three sets servers, so, the scalable is not good, if one of the In order to reduce the complexity of the algorithm of servers crashed, the system may face paralysis. And now, the consistency hashing, we will introduce a football game example. Cluster Layer Cache become increasingly popular, and it has become a trend.MemCached (engineering team operating Keywords-persistence;NOSQL;clustercache;consistency; Live Journal's), a distributed memory object caching system, algorithm used to reduce the database load in a dynamic system[l] and improve the performance. I. INTRODUCTION The paper is structured as follows. Section 2 presents cluster cache, we will introduce a distribute cache model ms­ With the coming of Web 2.0, the most serious challenge mc in non-relational database with MemCached and discuss so far to the supremacy of RDBMSs in managing data is the how to read and modify data to in order to maintain data increasing need of mass data and concurrent access [2]. consistency and here we will reference Google App Engine. When we running web applications, the backend database Section 3 details the data consistency, we will introduce a systems are often the performance bottleneck and the data consistency mechanism, Paxos, a very efficient existing database products seem to write data to increase the transaction processing consistency algorithm, which can deal load to much. Recently, Twitter, Digg and Reddit and many the consistency management well and Section 4 concludes other Web2.0 companies that switch from MySQL to non­ the paper. relational database (NOSQL) to provide scalable data storage solutions, has aroused strong developers interest on NOSQL. II. CLUSTER LAYER CACHE Joe Stump who was the former chief architect of Digg said his own business project now is to use NOSQL, and he lists a High performance scalable web applications often use a distributed in-memory data cache in front of or in place of series of questions to challenge SQL camps. He said, in robust persistent storage for some tasks. Traditionally, terms of scalability, speed, cost, non-relational database have replication of the database among several servers removes great advantage over relational databases in all the way. These "NoSQL" databases tend to originate from large the performance bottleneck of a centralized database. However, this solution requires low-latency communication internet companies that have to serve simple data structures protocols between servers to implement update consistency to millions of customers daily [4]. Such as Google File System based on BigTable and Amazon's Dynamo are very protocols. Memcached has two core components: the server (ms) and client (mc), in a memcached query, mc first successful business NOSQL cases. The databases specialize for certain use cases or data structures and run on commodity calculated the hash value of the given key to determine the kV's location in the ms.when ms is confmned, the client will hardware, as opposed to large traditional database clusters. Their primary advantage is that, unlike relational databases, send a query request to the corresponding ms, and the ms will find the exact data. Because there is no interaction and they handle unstructured data such as word-processing files, e-mail, multimedia, and social media efficiently. multicast protocol among them, so the interaction of 978-1-4244-5540-9/10/$26.00 ©2010 IEEE 117
  • 2. memcached brings the minimal impact to the internet. Fig. 1 import net.sfjsr107.Cache; shows an abstract view of the architecture of distributed import net.sfjsrl0 7.CacheException; system, which reducing the system load. import net.sfjsrl0 7.CacheManager; II ... MS MS MS Cache cache; Try { cache=CacheManager.getInstanceO.getCacheFactoryO.crea teCache(Collections.emptyMap()) ;} catch (CacheException e){ II . .. I Consistency hash algorithm } String key; II ... MC byte [ ] value; II ... II Put the value into the cache. l Cache. put (key, value); II Get the value from the cache. appliaction Value = (byte []) cache. get (key); App Engine uses distributed cache in order to maintain Figure 1. The architecture of distributed system data consistency, and in this paper, we assume that our system is based on the masterlslaves system [7], there is a We can envisage that ms can also be divided, every ms master server and a large number of slave servers. Each can be treated as a physical node (PN), within each PN, there server includes a mc and ms, and we use memcache caching will be a variable number of virtual nodes (VN) running mechanism in each server. Each mc is a client, when you according to the available hardware capacity of the PN. So need a query from a proxy server, and the proxy server you the query content is divided into several parts, which choose will check if the content you query is in the ms, and if distributed in different ms. Fig. 2 describes the ms-mc the ms can't find the required content, and then turn the next structure, inside the ms, each virtual node connected through proxy server. Some NOSQL databases use the index and the network. query processing mechanism in the local database (node), and we can use a DHT indexing [3] scheme that enables the proxy server to directly deal the cache query. In this case, we can use the system's query processing program to broadcast the query to all other nodes on the DHT, each node complete the local search, and then return the result to the system processor as a response. Note that the search is parallel in all the nodes on the DHT. III. DATA CONSISTENCY With distributed caching mechanism, our next task is to Virtual Node ensure data consistency. We know that there are many types of NOSQL databases, such as Wide Column/Column Families, Document Store, Key ValuelTuple Store, and Eventually Consistent Key Value Store etc; our data needs to partitioned and replicated using consistent hashing, and MS consistency is facilitated by object versioning [5], we need a suitable consistent hashing algorithm to ensure the Figure 2. The ms-mc structure consistency of replicas. The consistency among replicas during updates is maintained by a quorum-like technique and With the support of distributed cache, the query a decentralized replica synchronization protocol.[6]the initial efficiency has been greatly improved. In this paper, we use distributed architecture use hash 0 mod n, but its weakness is JCache, a proposed interface standard for memory caches, as that if single point failure, the system can not automatically an interface to the App Engine memcache. This interface is recover. In order to solve the single point of failure, we use not yet an official standard, App Engine provides this hash 0 mod (nl2), so anyone have two options which can be interface using the net.sf.jsrl0 7 interface package. JCache selected by client, but this program still has problems: load provides a Map-like interface to cached data. You store and imbalance, in particular, if one of the servers failures, the retrieve values in the cache using keys. Keys and values can other server has excessive pressure. In consistent hashing, be of any Serializable type or class. the output range of a hash function is treated as a fixed import java.util.Collections; circular space or "ring" (i.e. the largest hash value wraps 118
  • 3. around to the smallest hash value). Each node in the system (the author of Google Chubby). Compared to 2PC/3PC, is assigned a random value within this space which Paxos algorithm improved. represents its "position" on the ring. Each data item PIa. Every time, Paxos instance will assigned a number identified by a key is assigned to a node by hashing the data which needs to increment, each replica doesn't accept the item's key to yield its position on the ring, and then walking proposal which is smaller to the greatest number in the the ring clockwise to fmd the first node with a position larger current. than the item's position [8]. In order to solve load problem, P2. Once a value v is passed by replica, then this value we introduce Consistent Hashing, as mentioned above, we can not be changed, that have no Byzantine question. Take can use n servers and we can divide each server into several the above football game as an example; if the participant virtual nodes (v), each server can be seen as a physical node accepts the plan, he can't change his idea no matter who ask (pnode) and all the virtual nodes connected to a ring. Then him to change. If most participants accept the plan, and the we assigned the virtual nodes (n*v) to the consistory hash proposal can be adopted. So the Paxos is more flexible than ring randomly, so the users through the hash algorithm to 2PC/3PC, in a cluster with 2f+1 nodes, we allow f nodes find the first vnode in the circle, and if the vnode failures, we unavailable. can use the next vnode with a clockwise way. As Fig. 3 shows that if the hash value located in vnode a, but vnode a IV. CONCLUSIONS crashed, then we will use vnode b. If all the vnodes failured, Applications deployed on the Internet are immediately then turn to the next vnode of next pnode. accessible to a vast population of potential users, and with the coming of WEB2.0, users need more and more data. As a • Virtual node PNodel result, the current relational databases product exist serious load problem, a large number of unstructured information need to interacted with the database, so we need a distributed and scalable framework. Relational databases can also be distributed, but in many ways, compared with relational I databases, NOSQL databases deal better, in particular in the I PNode4 data consistency. In terms of technical system, relational I Calculate the databases more stress concentration or a limited nodes � hash value cluster structure, but, don't design the content with low-cost of distributed computing unit. So that RDBMS can not support world-class information management technology like GoogleBigTable, Amazon Dynamo etc. In this paper, we -0- discussed distributed caching mechanism, and introduce the Request to the server data consistency plan; there are many consistent hash PNode3 algorithms, such as gossip, DHT, and Map Reduce Execution and Data Partitioning etc. They all have their own Figure 3. Example of hashing search advantages, here due to limited space; we can't give more At last, we will introduce a very useful consistent details about them. In Master/Slave model, we can treat the hashing way: Paxos. Here we first give an example: A slave server as proxy server, the distribute cache we football game question. introduced is equivalent to query caching to the database. We If you (coordinator) would like to organize a football use the approach to scale the database in order to reduce the game, the other side has 11 players, so, in addition to database load. Our future works are: provide a high cache hit yourself, you have to organize 10 members. The questions rate algorithm to reduce the load on the central database arise: first you have to call participants(10),and proposed that server and efficiently ensure the cache consistency as the the game will organized on Saturday 2am-llam.Every database is updated. As Ricky Ho said, compared with participant should tell you if he can accept your proposal, if relation database, the query search is nosql's short board. At accept then the participant can not accept other requests at present, many nosql databases are based on DHT that time. In 2PC principle, if all participants agree the game, (Distributed Hash Table) model, so the query is equivalent to you notify them to give you a commit, or the game will access the hashtable, and in the future we will further study cancel. The disadvantage of this approach is that if one query processing combined with NOSQL design patterns. participant node crash suddenly, there will be inconsistency in the cluster. Therefore, the 3PC replaced the 2PC theory: the3PC divide the commit process into two, precommit and commit. The disadvantage is that you and participants belong REFERENCES to two engine room, if the network between you disconnect, [1] A.P.Atlee and H.Gannon, "Specifying and Verifying Requirements of then your size will never receive the commit from Real-Time Systems," Proc. IEEE Trans.Software Engineering, IEEE participants and you will choose to abort. "There is only one Press, Jan.1993, pp. 41-45, doi:10.l 109/32.210306. consensus protocol, and that's Paxos, all other approaches [2] R.V.Nageshwara and V.Kumar, "Concurrent Access," Proc. IEEE are just broken versions of Paxos."Said by Mike Burrows Trans.Computer, IEEE Press, Dec.l988, pp.1657-1665, doi:10.1109112.9744. 119
  • 4. [3] Yuzhe Tang, Shuigeng Zhou, and Jianliang Xu, "LIGHT: A Query­ [6] L. Lamport, "Time, clocks, and the ordering of events in a distributed Efficient Yet Low-Maintenance Indexing Scheme over DHTs,"Proc. system,"Comrnunications of the ACM,VoI.7, pp. 558-565, July.l978. IEEE Trans.Knowledge and Data Engineering, IEEE Press, Jan.2010, [7] A.Olson and K.G.Shin, "Probabilistic Clock Synchronization in pp.59-75, doi:10.1109ITKDE.2009.47 Large Distributed Systems," Proc.lEEE Trans.Computers, IEEE Press, [4] Neal Leavitt, "Will NoSQL Databases Live Up to Their Promise?" Sep.1994, pp.1106-1112, doi:10.l 109/12.312120 Compuer,Vol. 43, pp.12- l 4, Feb.2010. [8] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. [5] M. Andrew, ""NoSQL" databases in CMS Data and Workflow Lewin, "Consistent hashing and random trees: distributed caching Management," 13th International Workshop on Advanced Computing protocols for relieving hot spots on the World Wide Web," In and Analysis Techniques in Physics Research, Physics Department, Proceedings of the Twenty-NinthAnnual ACM Symposium on theory University of Rajasthan . LNM Institute of Information Technology. of computing. EI Paso.Texas.United States, pp.654-663,1997. Jaipur, pp. 22-27, Feb 2010. 120