Precisando lidar com dados massivos onde centenas de gigabytes com crescimento para terabytes ou mesmo petabytes fazem parte do seu dia-a-dia ? Você precisa realizar milhares de operações por segundo em múltiplos terabytes de dados ? Venha conhecer o Apache HBase, um banco de dados NoSQL que roda em cima do HDFS e é altamente disponível, tolerante a falhas e escalável. HBase tem sido muito utilizado em empresas como Facebook e Twitter. Esta palestra faz uma introdução, mostrando o que é o HBase e quando usar, sua arquitetura e também exemplos de soluções reais de grandes empresas como Facebook, Twitter e Trend Micro
2. • NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• The goal is the hosting of very large tables (billions of
rows X millions of columns)
• Based on Google’s BigTable paper
What Is HBase?
3. • Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
Why Use HBase?
4. • Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
– Not random reads or writes
• Access patterns of the data are ill-defined
When to Consider Not Using HBase?
5. • More complete list at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Hbase in production
16. • Data is not accessed over SQL
• You must:
– Create your own connections
– Keep track of the type of data in a column
– Give each row a key
– Access a row by its key
No SQL Means No SQL
17. • Gets
– Gets a row’s data based on the row key
• Puts
– Update/inserts a row with data based on the row key
• Scans
– Finds all matching rows based on the row key
– Scan logic can be increased by using filters
Types of Access
22. • Designing schemas for HBase requires an in-depth knowledge
• Schema Design is ‘data-centric’ not ‘relationship-centric’
• You design around how data is accessed
• Row keys are engineered
No SQL Means No SQL
23. • A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row key
– Contents of a row key vary by access pattern
– Often made up of several pieces of data
Row Keys
24. • Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
– Fewer, bigger tables
Schema Design
26. • Use of HBase to integrate SMS, chat, email and Facebook Messages into
one inbox
• HydraBase – The evolution of HBase@Facebook
27. • HBase provides a distributed, read/write backup of all mysql tables in
Twitter's production
• A number of applications including people search rely on HBase internally
for data generation
• Additionally, the operations team uses HBase as a timeseries database for
cluster-wide monitoring/performance data
28. • Uses HBase as a foundation for cloud scale storage for a variety of
applications
• Uses HBase to build a graph service for global web threat entities
evaluation and reputation
29. Internal Use Only
Non-profit R&D Center
founded by Nokia in 2001 in Brazil
Focused on projects
delivering solutions and products in the mobile
technology area
Technical team of 200+
Located in Brazil
Manaus | Brasilia | Recife | São Paulo
50+
invention reports
accepted by
Nokia/Microsoft to file
patent application
500+
items of scientific
production
300+
completed projects
31. Internal Use Only
OUR
AWARDS
Eco System Saving Tips (app)
Mobile World Congress 2012
Facelock1st prize
London Hackathon | Nokia World 2010
Audio Aid
1st prize |Forum Nokia
Calling All Innovators 2009
Microsoft Data Gathering
Tele.Síntese
2012 & 2013
award
32. • About training in Big Data (Developer, Analyst, Admin):
http://www.indt.org/servicos/treinamentos/hadoop-developer
http://www.indt.org/servicos/treinamentos/hadoop-analyst
http://www.indt.org/servicos/treinamentos/hadoop-admin
• About Hbase
http://hbase.apache.org/
• About INDT:
http://www.indt.org
communications@indt.org.br
• About Hortonworks:
http://www.hortonworks.com
communications@indt.org.br
INFOS + CONTACT