1. In Memory Database (IMDB)
(old is new again)
Kodamasimham Pridhvi
MT2012066
2. Agenda
› Introduction
› Architecture of IMDB
› Practical Application
› Myths about IMDB’s
› IMDB vs. DRDB
› Impact of IMDB
› Challenges in IMDB
› IMDB Open Sources
3. Introduction
› What is In-Memory Data Base(IMDB)?
An IMDB also called Main memory
Database(MMDB) is a database whose
primary data store is main memory.
4. History
› Is this a new idea? NO!!!
› Why now so important?
Due to 4 factors
5. Factors:
› LOWER ING COSTS & GROWING SIZE
(RAM):
– In early 2000, the cost of 64 MB RAM @ $71
– But now , 8GB DDR3 @ $69.99
› MULTICORE PROCESSORS
– parallel and faster computation
› 64 bit Computing
– multiple GB of main memory
› Faster responses to queries
7. Practical Application
› Applications that demand very fast data
access, storage and manipulation
› In real-time embedded systems
› Music databases in MP3 players
› Programming data in set-top boxes
› e-commerce and social networking sites
› financial services and many more…
8. IMDB vs. DRDB(Disk Resident
DB)
Disk Resident Data Base
In-Memory Data Base
Carries File I/O burden
No file I/O burden
Extra memory For Cache
No extra memory
Algorithm optimized for disk
Algorithms optimized for memory
More CPU cycles
Less CPU cycles
Assumes Memory is abundant
Uses memory more efficiently
9. Myths about IMDB’s
› Given the same amount of RAM, disk DBs
can perform at the same speed as IMDBs
(by using caching technology).
› If a RAM disk is created and a traditional
disk DB is deployed on it, it delivers the
same performance as an in-memory
database.
10. Myths about IMDB’s
› In-memory database the same as an
embedded database.
› Since RAM size is limited, sizes of IMDBs
are also limited.
11. Impact of IMDB:
› Data Representation,
› Concurrency control,
› Data Access Methods,
› Query Processing,
› ACID Properties,
› Recovery.
12. Data Representation
› In Disk Resident DB, we use flat files and
sequential access.
› In IMDB, Relational tuples with direct
pointers.
– Space efficient.
– Shared between columns and relations
13. Data Representation Diagram
Relation Tuple
Domain Table
Marks
Value 1
Age
Value 2
Pointers
To memory
address
Value 3
RollNo
Age
RollNo
Age
Age
Relational Tuple
14. Concurrency Control(lock based)
› In DRDB , locking granules are low level.
– To reduce contention
› In IMDB, due to fast processing it create
coarser locks.
– locking granules like a relation or entire database.
– No need of hash table look up.
15. Data Access Methods
› In DRDB, B-tree index structure is used.
– ranges , exact match queries
– lies in hard disk
› In IMDB, T-tree index structure is explicitly
designed
– reduces the CPU processing
– Eliminates index value compression and
expansion
16. T-tree
› T-tree node consists
– ordered elements in the range min and max
values
– two pointers to the left and right nodes
Pointer to parent
Parent
Key 1
Minimum Element
Left ptr
……………..
Control
Key..
Key n
Maximum Element
Right ptr
17. Query Processing
› In DRDB, main focus is on processing costs,
and attempt to minimize disk access.
› In IMDB, main factors are
–
–
–
–
Cardinality of table
Presence of index
Any ORDER BY clause
Predicate evaluation
› Ex : TimesTen provides range, hash and
bitmap indexes and support two types of join
methods nested-loop and merge-join.
18. ACID Properties
› IMDBs can be said to lack support for the
durability portion of the ACID
› Many MMDBs have added durability via the
following mechanisms:
– Checkpoints
– Transaction logging
– NVRAM(Non-Volatile RAM)
19. Recovery
› Mechanisms for recovery are :
– Logging
– Checkpoints
– Reloading
› transactional durability is kept, by keeping
two separate but synchronized copies of
the database at all times as well as storing
log files on-disk.
Today’s agenda for my presentation will be as follows:1.Introduction – we will see what is IMDB, some brief history about it and why is it becoming popular now..2.Architectural details of IMDB – some technical details of the IMDB3.Applications – where this IMDB can be used.4. Some Myths about IMDB .5. Difference between IMDB and DRDB – point where IMDB is superior to DRDB6.Impact of IMDB7.Some of the challenges in IMDB8.IMDB open Sources9.Thank you
Lets start with the basic question: What is In-Memory Data Base?From the name itself you can guess what does it means.. An IMDB also called Main memory Database(MMDB) is a database whose primary data store is main memory.That means in IMDB the primary copy lives permanently in memory..
1.The idea of using In Memory Database (IMDB) as physical memory is not new but is in existence quite since a decade. IMDB have evolved from a period when they were only used for caching or in high-speed data systems to a time now in twenty first century when they form a established part of the mainstream IT.2.Early in this century, although larger main memories were affordable but processors were not fast enough for main memory databases to be admired. However, today’s processors are faster, available in multicore and multiprocessor configurations having 64-bit memory addressability stocked with multiple gigabytes of main memory.
Three developments in recent years have made in-memory analytics increasingly feasible:64bit computing, multi-core servers and lower RAM prices and growing RAM sizes. One of the key reasons for the interest in in-memory database is to get faster responses to queries which otherwise would be limited by the speed of the disk storage systems.
Architecture:1.IMDB eliminates disk access by storing and manipulating entire database in main memory.2. The access time for main memory is orders of magnitude less than for disk storage3. Disks have a high, fixed cost per access that does not depend on the amount of data that is retrieved during the access. For this reason, disks are block-oriented storage devices. Main memory is not block oriented.4. The layout of data on a disk is much more critical than the layout of data in main memory, since sequential access to a disk is faster than random access. Sequential access is not as important in main memories.5. Buffer pool management totally disappears, number of machine instructions are reduced the structure and size of index pages is simplified, consequently the design becomes simple and more compact and most importantly requests are executed faster.6.You can see there is a secondary storage used for writing logs , checkpoints etc.
Most real-time applications need very short and anticipated response time and Main Memory as we know has short response time.IMDSs running on real-time operating systems (RTOSs) provide the responsiveness needed in applications including IP network routing, telecom switching, and industrial control.Open Music Daemon music player uses IMDB In-memory databases’ typically small memory and CPU footprint make them ideal because most embedded systems are highly resource-constrained. E-commerce and social networking sites use in-memory databases to cache portions of their back-end on-disk database systems
1.If the cache of a DRDB is large enough,copies of the data will reside in memory at all times. Although such a system will perform well, it is not taking full advantage of the memory. For example, the index structures will be designed for disk access (e.g., B-trees), even though the data are in memory. Also, applications may have to access data through a buffer manager, as if the data were on disk.2. data in an on-disk database system must be transferred to numerous locations as it is used. the handoffs required for an application to read a piece of data from an on-disk database, modify it and write that record back to the database. These steps require time and CPU cycles, and cannot be avoided in a traditional database, even when it runs on a RAM disk. Still more copies and transfers are required if transaction logging is active.In contrast, an in-memory database system entails a single data transfer. Elimination of multiple data transfers streamlines processing. Removing multiple copies of data reduces memory consumption, and the simplified processing makes for greater reliability and minimizes CPU demands.
1.EDB - database system that is built into the software program by the application developer in-memory database systems employ the client/server model. Eg. Timesten of oracle , polyhedra2. The database size is limited by the amount of physical RAM in the server. On 32-bit platforms, it’s constrained by 32-bit address space, so the database size is under 2GB in size or smaller depending on specific platform. For 64-bit platforms, there is no limit in size other than the amount of physical memory you have in the machine. We have customers that deploy with database size ranges from 1GB (gigabyte) to over 2 TBs (terabytes).
Main memory databases can also take advantage of efficient pointer following for data representation. Relational tuples can be represented as a set of pointers to data values. The use of pointers is space efficient when large values appear multiple times in the database, since the actual value needs to only be stored once. Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap.Relational data are usually represented as flat files.Tuples are stored sequentially. Enumerated types larger than the pointer size are stored in the tuple as pointers to the domain table values, domain tables can be shared among different columns and even among different relations.
In DRDB, Systems choose small locking granules (fields or records) so to reduce contention. In IMDB, due to fast processing it create coarser locks, as contention is already low because data are memory resident, the principal advantageof small lock granulesis effectively removed.So we suggest large lock granules like a relation or an entire database.Implementation:In a conventional system, locks are implemented via a hash table that contains entries for the objects currently locked. The objects themselves (on disk) contain no lock information. If the objects are in memory, we may be able to afford a small number of bits in them to represent their lock status.
MMDB uses T-tree index structure unlike B-tree index structure used by DRDB. Since the ultimate aim of MMDB is to condense computation time while exploiting little memory. T-tree index structure is explicitly designed for MMDB.A T-tree node consists of ordered elements in the range min and max values,and two pointers to the left and right nodesIndex structures can store pointers to the indexed data, rather than the data itself. This eliminates the problem of storing variable length fields in an index and saves space as long as the pointers are smaller than the data they point to.Indexes are very space efficient and are reasonably fast for range and exact-match queries, although updates are slow.Use of T-trees dramatically reduces the CPU processing required to access data and completely eliminates the index value compression and expansionfound in B-trees.
T-trees uphold the fact that the actual data is always in main memory collectively with the index, hence it do not keep copies of actual attribute values within the index tree nodes. Instead it just contains pointers to the actual data fields . It is an ordered structure like an AVL tree having multiple keys per node. It is an Ideal index structure for ordered search over data. Other index structure supported by MMDB is heap file for handling a large number of fixed-length data items. Hash file supports unordered scan of data items as well as locking of data item that are obtained transparently when items are inserted, deleted, updated or scanned. The Oracle TimesTen, uses T-tree and hash indexing algorithms to speed access to indexed data, while also reducing CPU consumption. Use of T-trees dramatically reduces the CPU processing required to access data and completely eliminates the index value compression and expansionfound in B-trees.
Query processors for memory resident data must focus on processing costs, whereas most conventional systems attempt to minimize disk access.TimesTen and IMDB Cache provide range, hash and bitmap indexes and support two types of join methods nested-loop and merge-join. The optimizer can create temporary indexes as needed. The optimizer also accepts hints that give applications the flexibility to make tradeoffs between such factors as temporary space usage and performance
IMDB’s are logically more exposed to failure than DRDB’s due to high performance requirement as it is directly accessed by processor.