This document provides an overview of NoSQL databases and HBase. It discusses why NoSQL databases are gaining popularity due to trends in data and architecture. It also summarizes the CAP theorem and how different databases balance consistency, availability and partition tolerance. The document describes research activities including evaluating HBase for telco usage and performing bulk processing tests on HBase. It finds that while HBase can scale horizontally, managing compaction storms and small files is challenging.
2. Who AM I?
› Software Developer
10+ years at Ericsson
› HLR, PGM, IMS-M, MMS, MTV, BCS
› Joined Research late 2012
–BMUM -> BUSS (5+ years)
–DUCI (<6 months)
› Active member in various /// groups
–Linux (ELX, UMWP, etc.), Agile, SWAN, EQNA
› Open source contributor
Ericsson Internal | 2013-06-03 | Page 2
3. The Plan
› Why NoSQL?
› CAP
› Research activities
› Market trends
Ericsson Internal | 2013-06-03 | Page 3
5. NoSQL: Why?
Trends – Usual Suspects
Gossip
SDN
Gartner Data Center TCO Report, June 2012.
Ericsson Internal | 2013-06-03 | Page 5
Internet Hypertext, RSS, Wikis,
blogs, wikis, tagging, user
generated content, RDF, ontologies
9. CAP Theorem
Brewer’s Conjecture
“Of three properties of shared-data systems – data
Consistency, system Availability and tolerance to
network Partitions – only two can be achieved at
any given moment in time .”
› 2000 Prof Eric Brewer, PoDC Conference Keynote
› 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33 (2)
Ericsson Internal | 2013-06-03 | Page 9
10. CAP Theorem
The business decision
CONSISTENT
Partition
OR
Available
Ericsson Internal | 2013-06-03 | Page 10
11. CAP Summary
Available
Traditio
MySQL nal relationa
l:
, Postg
re S Q L
, e t c.
Consistent
AP
CA
CP
dra,
as s an em s
iak, C e syst
or t , R
lik
m
Volde , Dynamo
hD b
Couc
AP: Requests will complete at any node possibly
violating consistency
Partition Tolerance
HBase, MongoDB, Redis,
BigTable like systems
CP: Requests will complete at nodes that have quorum
Ericsson Internal | 2013-06-03 | Page 11
14. HBAse
BigTable/Columnar
Coordination
Master selection
Root region lookup
Node registration
…
Data files
Write-Ahead Log (WAL)
Rack aware
Default data replication x3
Region allocation
Failover
Log splitting
Load balancing
One active (elected), many stand by
Holds regions
Handle I/O requests
In-Memory data (MemStore)
Split regions
Compact regions
› ZooKeeper (cluster)
› Hadoop (cluster)
› HBase: 1 elected master / many region servers
Ericsson Internal | 2013-06-03 | Page 14
15. TelCO Applicability Study
Hbase For HLR data?
›Comprehensive
report
›Using HBase is
DOABLE!
OK!
Ericsson Internal | 2013-06-03 | Page 15
16. HBASE BULK Processing
Event Processing & Aggregation
› 100 Million rows
Queries evaluated
SELECT col1 FROM table
SELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3
›
›
›
›
› Map/Reduce
› Scan
› Co-processor
Ericsson Internal | 2013-06-03 | Page 16
CPU
RAM
Network
Schema
17. Bulk Processing
Scaling out/Horizontally
› 100 Million rows
› Linear scaling!
SELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3
Ericsson Internal | 2013-06-03 | Page 17
20. How much Data can it Fit?
ITK / Constellation / CEA
› Network produces events
– RNC, SGSN, S-&R-KPI
– Traffic DPI
– GTP-C
› CEA (Perfmon)
– Correlated events
1000+ K events/s
Event
Event
Feeder
Feeder
10+ K events/s
Map/Reduce
Put.. Put.. Put…
10,000,000 subscribers
Staging data on HDFS
HBase
HBase
BulkLoader
BulkLoader
HBase
HBase
PutLoader
PutLoader
Look
Ericsson Internal | 2013-06-03 | Page 20
up d
at a
22. What about HDFS ?
Small files
(250 B)
› It scales!
› TestDFSIO benchmark
> 3000 GB/s
- Read
> 2000 GB/s
- Writes
CPU
CPU
Larger files
(1 KB)
› But
…. it is not that simple…
Ericsson Internal | 2013-06-03 | Page 22
CPU and I/O
CPU and I/O
Larger files
(1 KB)
Network
Network
23. What about End to End?
writing to Hbase included
100 K events/s
› It scales!
› And it gets…
more complicated
200 K events/s
Ericsson Internal | 2013-06-03 | Page 23
24. But….
› Within ~2 hours
– Rows/s
– CPU
– IO
----------+++
+++++++++
Ericsson Internal | 2013-06-03 | Page 24
7K/s
x2
100%
25. HDFS CURSE
Compaction Storm
› Remember what we were doing?
– Hint: Creating lots of small files to add to HBase?..
› Major compaction storm!
– Manage compaction and region splitting
Ericsson Internal | 2013-06-03 | Page 25
HBase
HBase
BulkLoader
BulkLoader
M/R
26. Conclusion
› Scalability … Scalability… Scalability
› It works but it is not so easy…
› Recommendation:
– Polyglot data storage
Ericsson Internal | 2013-06-03 | Page 26
29. NoSQL: The name
› It is not about saying SQL is bad or should not be used
› ”An accidental neologism” – Martin Fowler
› A twitter hash
› No prescriptive definition, just observations of common
characteristics
– “Any database that is not a Relational Database”
– Running well on clusters (scalable)
– schemaless
› Polyglot persistence
– Using different stores in different circumstances
Ericsson Internal | 2013-06-03 | Page 29
The term was coined at a meetup with
the creators behind some prominent
emerging databases
... then there was a conference ...
... and a mailing list ...
... the name caught on ...
... then there were more conferences ...
... and here we are!
30. NoSQL: Why?
Trend No 2/4: Connectedness
Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated
content, RDF, ontologies
M2M
Application
Ericsson Internal | 2013-06-03 | Page 30
31. NoSQL: Why?
Trend No 3/4: Content Individualization
Schemaless
•Extend at runtime
•De-normalize
•Domain design (not schema migration)
› Individualization of content
› Decentralization
Ericsson Internal | 2013-06-03 | Page 31
33. Consistency
“A system is consistent if an update is applied to all
relevant nodes at the same logical time ”
Strong consistency
Weak consistency
Atomicity Consistency Isolation Durability
(ACID)
Eventual consistency
(inconsistency window)
NoSQL solutions DO support Transactions
Standard database replication (or caching) IS NOT strongly consistent,
as such any solutions making use of any of those is by definition
Eventually Consistent at best
Ericsson Internal | 2013-06-03 | Page 33
34. Partition Tolerance / Availability
› “The network will be allowed to lose arbitrarily many messages sent from one node to
another” [..]
› “For a distributed system to be continuously available, every request received by a
non-failing node in the system must result in a response ”
Gilbert and Lynch, SIGACT 2002
CP: Requests will complete at nodes that have quorum
AP: Requests will complete at any node possibly violating
consistency
High latency ~= Partition
Ericsson Internal | 2013-06-03 | Page 34
35. HBASE BULK Processing
Event Processing & Aggregation
› 100 Million rows
Queries evaluated
SELECT col1 FROM table
SELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3
Ericsson Internal | 2013-06-03 | Page 35
Notes de l'éditeur
Individualization of content
•In the salary lists of the 1970s, all elements had exactly one job
•In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15?
Database developers all know the ACID acronym. It says that database transactions should be:
Atomic: Everything in a transaction succeeds or the entire transaction is rolled back.
Consistent: A transaction cannot leave the database in an inconsistent state.
Isolated: Transactions cannot interfere with each other.
Durable: Completed transactions persist, even when servers restart etc.
These qualities seem indispensable, and yet they are incompatible with availability and performance in very large systems. For example, suppose you run an online book store and you proudly display how many of each book you have in your inventory. Every time someone is in the process of buying a book, you lock part of the database until they finish so that all visitors around the world will see accurate inventory numbers. That works well if you run The Shop Around the Corner but not if you run Amazon.com.
Amazon might instead use cached data. Users would not see not the inventory count at this second, but what it was say an hour ago when the last snapshot was taken. Also, Amazon might violate the “I” in ACID by tolerating a small probability that simultaneous transactions could interfere with each other. For example, two customers might both believe that they just purchased the last copy of a certain book. The company might risk having to apologize to one of the two customers (and maybe compensate them with a gift card) rather than slowing down their site and irritating myriad other customers.
There is a computer science theorem that quantifies the inevitable trade-offs. Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.)
An alternative to ACID is BASE:
Basic Availability
Soft-state
Eventual consistency
Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers.
It’s harder to develop software in the fault-tolerant BASE world compared to the fastidious ACID world, but Brewer’s CAP theorem says you have no choice if you want to scale up. However, as Brewer points out in this presentation, there is a continuum between ACID and BASE. You can decide how close you want to be to one end of the continuum or the other according to your priorities.
The Sex Pistols had shown that barely-constrained fury was more important to their contemporaries than art-school structuralism, giving anyone with three chords and something to say permission to start a band. Eric Brewer, in what became known as Brewer&apos;s Conjecture, said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have, thus giving anyone with three servers and a keen eye for customer experience permission to start an internet scale business. Disciples of Brewer (present that day or later converts) include the likes of Amazon, EBay, and Twitter.
What he said was there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment (he was talking specifically about the web but so many corporate businesses are multi-site/multi-country these days that the effects could equally apply to your data-centre/LAN/WAN arrangement).
The Sex Pistols had shown that barely-constrained fury was more important to their contemporaries than art-school structuralism, giving anyone with three chords and something to say permission to start a band. Eric Brewer, in what became known as Brewer&apos;s Conjecture, said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have, thus giving anyone with three servers and a keen eye for customer experience permission to start an internet scale business. Disciples of Brewer (present that day or later converts) include the likes of Amazon, EBay, and Twitter.
What he said was there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment (he was talking specifically about the web but so many corporate businesses are multi-site/multi-country these days that the effects could equally apply to your data-centre/LAN/WAN arrangement).
Conistent vs available is a business decision. Not an engineering one.
NoSQL ttend to sacrifice full C and A for P at any given time -&gt; All in all eventually A or C
Databases are great at this because they focus on ACID properties and give us Consistency by also giving us Isolation, so that when Customer One is reducing books-in-stock by one, and simultaneously increasing books-in-basket by one, any intermediate states are isolated from Customer Two, who has to wait a few milliseconds while the data store is made consistent.
Once you start to spread data and logic around different nodes then there&apos;s a risk of partitions forming. A partition happens when, say, a network cable gets chopped, and Node A can no longer communicate with Node B. With the kind of distribution capabilities the web provides, temporary partitions are a relatively common occurrence and, as I said earlier, they&apos;re also not that rare inside global corporations with multiple data centres.
Easier scalability is the first aspect highlighted by Wiederhold. NoSQL databases like Couchbase and 10Gen&apos;s MongoDB, he said, can be scaled up to handle much bigger data volumes with relative ease.If your company suddenly finds itself deluged by overnight success, for example, with customers coming to your Web site by the droves, a relational database would have to be painstakingly replicated and re-partitioned in order to scale up to meet the new demand.Wiederhold cited social and mobile gaming vendors as the big example of this kind of situation. An endorsement or a few well-timed tweets could spin up semi-dormant gaming servers and get them to capacity in mere hours. Because of the distributed nature of non-relational databases, to scale NoSQL all you need to do is add machines to the cluster to meet demand.
Could we store these, if they came in a stream?
As data increases, there may be many StoreFiles on HDFS, which is not good for its performance. Thus, HBase will automatically pick up a couple of the smaller StoreFiles and rewrite them into a bigger one. This process is called minor compaction. For certain situations, or when triggered by a configured interval (once a day by default), major compaction runs automatically. Major compaction will drop the deleted or expired cells and rewrite all the StoreFiles in the Store into a single StoreFile; this usually improves the performance.However, as major compaction rewrites all of the Stores&apos; data, lots of disk I/O and network traffic might occur during the process. This is not acceptable on a heavy load system. You might want to run it at a lower load time of your system.
Rather than let HBase auto-split your Regions, manage the splitting manually [11]. With growing amounts of data, splits will continually be needed. Since you always know exactly what regions you have, long-term debugging and profiling is much easier with manual splits. It is hard to trace the logs to understand region level problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number of split regions == oh crap! If an HLog or StoreFile was mistakenly unprocessed by HBase due to a weird bug and you notice it a day or so later, you can be assured that the regions specified in these files are the same as the current regions and you have less headaches trying to restore/replay your data. You can finely tune your compaction algorithm. With roughly uniform data growth, it&apos;s easy to cause split / compaction storms as the regions all roughly hit the same data size at the same time. With manual splits, you can let staggered, time-based major compactions spread out your network IO load.How do I turn off automatic splitting? Automatic splitting is determined by the configuration value hbase.hregion.max.filesize. It is not recommended that you set this to Long.MAX_VALUE in case you forget about manual splits. A suggested setting is 100GB, which would result in &gt; 1hr major compactions if reached.What&apos;s the optimal number of pre-split regions to create? Mileage will vary depending upon your application. You could start low with 10 pre-split regions / server and watch as data grows over time. It&apos;s better to err on the side of too little regions and rolling split later. A more complicated answer is that this depends upon the largest storefile in your region. With a growing data size, this will get larger over time. You want the largest region to be just big enough that the Store compact selection algorithm only compacts it due to a timed major. If you don&apos;t, your cluster can be prone to compaction storms as the algorithm decides to run major compactions on a large series of regions all at once. Note that compaction storms are due to the uniform data growth, not the manual split decision.If you pre-split your regions too thin, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size grows too large, use the (post-0.90.0 HBase) org.apache.hadoop.hbase.util.RegionSplitter script to perform a network IO safe rolling split of all regions.
Either way this is not good for business. Amazon claim(http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it) that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said(http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html) they noticed that just a half a second increase in latency caused traffic to drop by a fifth.
Where both sides agree though is that the answer to scale is distributed parallelisation not, as was once thought, supercomputer grunt.
If they&apos;re not working in parallel you have no chance to get the problem done in a reasonable amount of time. This is a lot like anything else. If you have a really big job to do you get lots of people to do it. So if you are building a bridge you have lots of construction workers. That&apos;s parallel processing also. So a lot of this will end up being &quot;how do we mix parallel processing and the internet?&quot;Inktomi and the Internet Bubble
In our account of the history of NoSQL development, we’ve concentrated on big data running on clusters. While we think this is the key thing that drove the opening up of the database world, it isn&apos;t the only reason we see project teams considering NoSQL databases. An equally important reason is the old frustration with the impedance mismatch problem. The big data concerns have created an opportunity for people to think freshly about their data storage needs, and some development teams see that using a NoSQL database can help their productivity by simplifying their database access even if they have no need to scale beyond a single machine.
Individualization of content
•In the salary lists of the 1970s, all elements had exactly one job
•In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15?
Individualization of content
•In the salary lists of the 1970s, all elements had exactly one job
•In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15?
The recent trend in discussing NoSQL databases is to highlight their schemaless nature—it is a popular feature that allows developers to concentrate on the domain design without worrying about schema changes. It’s especially true with the rise of agile methods [Agile Methods] where responding to changing requirements is important. Discussions, iterations, and feedback loops involving domain experts and product owners are important to derive the right understanding of the data; these discussions must not be hampered by a database&apos;s schema complexity. With NoSQL data stores, changes to the schema can be made with the least amount of friction, improving developer productivity (“The Emergence of NoSQL,” p. 9). We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration
A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term &quot;document-oriented database&quot; (or &quot;document store&quot;) has grown[citation needed] with the use of the term NoSQL itself. In contrast to well-known relational databases and their notions of &quot;Relations&quot; (or &quot;Tables&quot;), these systems are designed around an abstract notion of a &quot;Document&quot;.
By storing and managing data based on columns rather than rows, column-oriented architecture overcomes query limitations that exist in traditional row-based RDBMS. Only the necessary columns in a query are accessed, reducing I/O activities by circumventing unneeded rows.[7] This enables it to[clarification needed] with very large data volumes (Terabytes to Petabytes), commonly referred to as Big Data.
Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,[3] search, sharing, transfer, analysis,[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller set
http://codahale.com/you-cant-sacrifice-partition-tolerance/
Some systems cannot be partitioned. Single-node systems (e.g., a monolithic Oracle server with no replication) are incapable of experiencing a network partition. But practically speaking these are rare; add remote clients to the monolithic Oracle server and you get a distributed system which can experience a network partition (e.g., the Oracle server becomes unavailable).
Network partitions aren’t limited to dropped packets: a crashed server can be thought of as a network partition. The failed node is effectively the only member of its partition component, and thus all messages to it are “lost” (i.e., they are not processed by the node due to its failure). Handling a crashed machine counts as partition-tolerance.