SlideShare une entreprise Scribd logo
1  sur  35
for Telco
Data Research
Day 2013

Prepared by
Nicolas Seyvet

Help from
N. Hari Kumar

P. Matray
Who AM I?
› Software Developer
10+ years at Ericsson
› HLR, PGM, IMS-M, MMS, MTV, BCS

› Joined Research late 2012
–BMUM -> BUSS (5+ years)
–DUCI (<6 months)

› Active member in various /// groups
–Linux (ELX, UMWP, etc.), Agile, SWAN, EQNA

› Open source contributor
Ericsson Internal | 2013-06-03 | Page 2
The Plan
› Why NoSQL?
› CAP
› Research activities
› Market trends

Ericsson Internal | 2013-06-03 | Page 3
NoSQL: Why?
Data Research
Day 2013
NoSQL: Why?
Trends – Usual Suspects

Gossip
SDN

Gartner Data Center TCO Report, June 2012.
Ericsson Internal | 2013-06-03 | Page 5

Internet Hypertext, RSS, Wikis,
blogs, wikis, tagging, user
generated content, RDF, ontologies
NoSQL: Why?
TrendS: Architecture

› Multicore
› Parallelization/Distribute
d
› Cloud
› Schemaless
Application
Application

1980s: Mainframe applications

Ericsson Internal | 2013-06-03 | Page 6

Application
Application

Application
Application

Application
Application

1990s: Database as integration hub

Application
Application

Application
Application

2000s: Decoupled services

Application
Application
Two Ways to Scale
Go BIG or many?

PARTITIO
N
Ericsson Internal | 2013-06-03 | Page 7

(replication)
Vailability

CAP

artition

Data Research
Day 2013
CAP Theorem
Brewer’s Conjecture

“Of three properties of shared-data systems – data
Consistency, system Availability and tolerance to
network Partitions – only two can be achieved at
any given moment in time .”

› 2000 Prof Eric Brewer, PoDC Conference Keynote
› 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33 (2)

Ericsson Internal | 2013-06-03 | Page 9
CAP Theorem
The business decision

CONSISTENT
Partition

OR
Available

Ericsson Internal | 2013-06-03 | Page 10
CAP Summary
Available

Traditio
MySQL nal relationa
l:
, Postg
re S Q L
, e t c.

Consistent

AP

CA

CP

dra,
as s an em s
iak, C e syst
or t , R
lik
m
Volde , Dynamo
hD b
Couc

AP: Requests will complete at any node possibly
violating consistency

Partition Tolerance

HBase, MongoDB, Redis,
BigTable like systems

CP: Requests will complete at nodes that have quorum

Ericsson Internal | 2013-06-03 | Page 11
Why NoSQL now?
› Trends
“Internet size”, Cluster friendly
Rapid development / Solution oriented
Polyglot Persistence
Schemaless

Ericsson Internal | 2013-06-03 | Page 12
Research Activities
TelCO Applicability
Aggregation
Event Streams

Data Research
Day 2013
HBAse

BigTable/Columnar

Coordination
Master selection
Root region lookup
Node registration
…

Data files
Write-Ahead Log (WAL)
Rack aware
Default data replication x3

Region allocation
Failover
Log splitting
Load balancing
One active (elected), many stand by

Holds regions
Handle I/O requests
In-Memory data (MemStore)
Split regions
Compact regions

› ZooKeeper (cluster)
› Hadoop (cluster)
› HBase: 1 elected master / many region servers

Ericsson Internal | 2013-06-03 | Page 14
TelCO Applicability Study
Hbase For HLR data?

›Comprehensive
report
›Using HBase is
DOABLE!
OK!

Ericsson Internal | 2013-06-03 | Page 15
HBASE BULK Processing
Event Processing & Aggregation
› 100 Million rows
Queries evaluated
SELECT col1 FROM table
SELECT SUM(col1) FROM table WHERE col2=val2

GROUP BY col3

›
›
›
›
› Map/Reduce
› Scan
› Co-processor
Ericsson Internal | 2013-06-03 | Page 16

CPU
RAM
Network
Schema
Bulk Processing
Scaling out/Horizontally
› 100 Million rows
› Linear scaling!

SELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3

Ericsson Internal | 2013-06-03 | Page 17
READ/WRITE
100000 iterations

Periodic
degradation

› 150,000,000 rows
› row = key + 1 column (1K)

Entire cluster up and running
8 nodes ( 1Master / 7 slaves)

Ericsson Internal | 2013-06-03 | Page 18
Robustness
Killing Them Softly…

Master

Slaves

Ericsson Internal | 2013-06-03 | Page 19
How much Data can it Fit?
ITK / Constellation / CEA

› Network produces events
– RNC, SGSN, S-&R-KPI
– Traffic DPI
– GTP-C

› CEA (Perfmon)
– Correlated events

1000+ K events/s
Event
Event
Feeder
Feeder

10+ K events/s
Map/Reduce

Put.. Put.. Put…

10,000,000 subscribers

Staging data on HDFS

HBase
HBase
BulkLoader
BulkLoader
HBase
HBase
PutLoader
PutLoader

Look

Ericsson Internal | 2013-06-03 | Page 20

up d
at a
The Upcoming Fight

Storkluster
18 machines

Ericsson Internal | 2013-06-03 | Page 21

Bigdata
2 machines
What about HDFS ?

Small files
(250 B)

› It scales!
› TestDFSIO benchmark
> 3000 GB/s
- Read
> 2000 GB/s
- Writes

CPU
CPU

Larger files
(1 KB)

› But
…. it is not that simple…

Ericsson Internal | 2013-06-03 | Page 22

CPU and I/O
CPU and I/O

Larger files
(1 KB)

Network
Network
What about End to End?
writing to Hbase included

100 K events/s

› It scales!

› And it gets…
more complicated
200 K events/s

Ericsson Internal | 2013-06-03 | Page 23
But….
› Within ~2 hours
– Rows/s
– CPU
– IO

----------+++
+++++++++

Ericsson Internal | 2013-06-03 | Page 24

7K/s
x2
100%
HDFS CURSE
Compaction Storm

› Remember what we were doing?
– Hint: Creating lots of small files to add to HBase?..

› Major compaction storm!
– Manage compaction and region splitting

Ericsson Internal | 2013-06-03 | Page 25

HBase
HBase
BulkLoader
BulkLoader

M/R
Conclusion
› Scalability … Scalability… Scalability
› It works but it is not so easy…

› Recommendation:
– Polyglot data storage

Ericsson Internal | 2013-06-03 | Page 26
Ericsson Internal | 2013-06-03 | Page 27
NoSQL
Data Research
Day 2013
NoSQL: The name
› It is not about saying SQL is bad or should not be used
› ”An accidental neologism” – Martin Fowler
› A twitter hash
› No prescriptive definition, just observations of common
characteristics
– “Any database that is not a Relational Database”
– Running well on clusters (scalable)
– schemaless

› Polyglot persistence
– Using different stores in different circumstances
Ericsson Internal | 2013-06-03 | Page 29

The term was coined at a meetup with
the creators behind some prominent
emerging databases
... then there was a conference ...
... and a mailing list ...
... the name caught on ...
... then there were more conferences ...
... and here we are!
NoSQL: Why?

Trend No 2/4: Connectedness

Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated
content, RDF, ontologies

M2M
Application

Ericsson Internal | 2013-06-03 | Page 30
NoSQL: Why?

Trend No 3/4: Content Individualization

Schemaless
•Extend at runtime
•De-normalize
•Domain design (not schema migration)

› Individualization of content
› Decentralization

Ericsson Internal | 2013-06-03 | Page 31
NoSQL Landscape
› 4 emerging categories
Key-Value
Graph
BigTable
Document
(NewSQL)
DBN

Ericsson Internal | 2013-06-03 | Page 32

(Object)
Consistency
“A system is consistent if an update is applied to all
relevant nodes at the same logical time ”
Strong consistency

Weak consistency

Atomicity Consistency Isolation Durability
(ACID)

Eventual consistency
(inconsistency window)

NoSQL solutions DO support Transactions
Standard database replication (or caching) IS NOT strongly consistent,
as such any solutions making use of any of those is by definition
Eventually Consistent at best
Ericsson Internal | 2013-06-03 | Page 33
Partition Tolerance / Availability
› “The network will be allowed to lose arbitrarily many messages sent from one node to
another” [..]
› “For a distributed system to be continuously available, every request received by a
non-failing node in the system must result in a response ”
Gilbert and Lynch, SIGACT 2002

CP: Requests will complete at nodes that have quorum
AP: Requests will complete at any node possibly violating
consistency
High latency ~= Partition

Ericsson Internal | 2013-06-03 | Page 34
HBASE BULK Processing
Event Processing & Aggregation
› 100 Million rows
Queries evaluated
SELECT col1 FROM table
SELECT SUM(col1) FROM table WHERE col2=val2

GROUP BY col3

Ericsson Internal | 2013-06-03 | Page 35

Contenu connexe

Tendances

introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Managementsameerfaizan
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseAnita Luthra
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDBDATAVERSITY
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionBrian Enochson
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql DatabasePrashant Gupta
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introductionEdward Yoon
 

Tendances (20)

NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
NoSql Data Management
NoSql Data ManagementNoSql Data Management
NoSql Data Management
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSQL Introduction
NoSQL IntroductionNoSQL Introduction
NoSQL Introduction
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDBIntro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Mongodb - NoSql Database
Mongodb - NoSql DatabaseMongodb - NoSql Database
Mongodb - NoSql Database
 
MongoDB introduction
MongoDB introductionMongoDB introduction
MongoDB introduction
 

En vedette

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityLars Marius Garshol
 
Big Data overview
Big Data overviewBig Data overview
Big Data overviewalexisroos
 
NoSql : conception des schémas, requêtage, et optimisation
NoSql : conception des schémas, requêtage, et optimisationNoSql : conception des schémas, requêtage, et optimisation
NoSql : conception des schémas, requêtage, et optimisationMicrosoft Technet France
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
Capillary Networks integrates the machine and IoT devices as integral part of...
Capillary Networks integrates the machine and IoT devices as integral part of...Capillary Networks integrates the machine and IoT devices as integral part of...
Capillary Networks integrates the machine and IoT devices as integral part of...Ericsson Labs
 
Introduction aux bases de données NoSQL
Introduction aux bases de données NoSQLIntroduction aux bases de données NoSQL
Introduction aux bases de données NoSQLAntoine Augusti
 
Props music video pp
Props music video ppProps music video pp
Props music video ppeloisesmith98
 
02 תואר בוגר וגליון ציונים
02 תואר בוגר וגליון ציונים02 תואר בוגר וגליון ציונים
02 תואר בוגר וגליון ציוניםEvyatar Glatzer
 
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...Murray Izenwasser
 

En vedette (19)

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Banco de Dados - NoSQL
Banco de Dados - NoSQLBanco de Dados - NoSQL
Banco de Dados - NoSQL
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
noSQL
noSQLnoSQL
noSQL
 
Big Data overview
Big Data overviewBig Data overview
Big Data overview
 
Bancos de dados NoSQL
Bancos de dados NoSQLBancos de dados NoSQL
Bancos de dados NoSQL
 
NoSql : conception des schémas, requêtage, et optimisation
NoSql : conception des schémas, requêtage, et optimisationNoSql : conception des schémas, requêtage, et optimisation
NoSql : conception des schémas, requêtage, et optimisation
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
NoSQL et Big Data
NoSQL et Big DataNoSQL et Big Data
NoSQL et Big Data
 
Capillary Networks integrates the machine and IoT devices as integral part of...
Capillary Networks integrates the machine and IoT devices as integral part of...Capillary Networks integrates the machine and IoT devices as integral part of...
Capillary Networks integrates the machine and IoT devices as integral part of...
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Introduction aux bases de données NoSQL
Introduction aux bases de données NoSQLIntroduction aux bases de données NoSQL
Introduction aux bases de données NoSQL
 
Props music video pp
Props music video ppProps music video pp
Props music video pp
 
Webles10
Webles10Webles10
Webles10
 
02 תואר בוגר וגליון ציונים
02 תואר בוגר וגליון ציונים02 תואר בוגר וגליון ציונים
02 תואר בוגר וגליון ציונים
 
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...
My Presentation to SFIMA Summit 2010 - Social Media Strategy, YouTube, and Vi...
 
FINAL.LosOjos
FINAL.LosOjosFINAL.LosOjos
FINAL.LosOjos
 

Similaire à Telco Data Research Day 2013 NoSQL Presentation

EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...
EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...
EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...Splunk
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use CaseErni Susanti
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1Erni Susanti
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...In-Memory Computing Summit
 
Converged Everything, Converged Infrastructure Delivering Business Value and ...
Converged Everything, Converged Infrastructure Delivering Business Value and ...Converged Everything, Converged Infrastructure Delivering Business Value and ...
Converged Everything, Converged Infrastructure Delivering Business Value and ...NetApp
 
Converged Everything, Converged Infrastructure delivering business value and ...
Converged Everything, Converged Infrastructure delivering business value and ...Converged Everything, Converged Infrastructure delivering business value and ...
Converged Everything, Converged Infrastructure delivering business value and ...NetAppUK
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014Connor McDonald
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...GEO Analytics Canada
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...Alluxio, Inc.
 
APRICOT 2022: Using CCTLD data to study the impact of Local IXPs
APRICOT 2022: Using CCTLD data to study the impact of Local IXPsAPRICOT 2022: Using CCTLD data to study the impact of Local IXPs
APRICOT 2022: Using CCTLD data to study the impact of Local IXPsAPNIC
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIAlluxio, Inc.
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 BriefingJonathan Symonds
 
Emerging Computing Architectures
Emerging Computing ArchitecturesEmerging Computing Architectures
Emerging Computing ArchitecturesDaniel Holmberg
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingRifaHaryadi
 
Networking Challenges for the Next Decade
Networking Challenges for the Next DecadeNetworking Challenges for the Next Decade
Networking Challenges for the Next DecadeOpen Networking Summit
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesDataWorks Summit
 

Similaire à Telco Data Research Day 2013 NoSQL Presentation (20)

EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...
EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...
EMC Sponsored Session- Building Massive + Efficient Indexer Storage Environme...
 
Cisco Big Data Use Case
Cisco Big Data Use CaseCisco Big Data Use Case
Cisco Big Data Use Case
 
cisco_bigdata_case_study_1
cisco_bigdata_case_study_1cisco_bigdata_case_study_1
cisco_bigdata_case_study_1
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
IMC Summit 2016 Breakout - Pandurang Naik - Demystifying In-Memory Data Grid,...
 
Converged Everything, Converged Infrastructure Delivering Business Value and ...
Converged Everything, Converged Infrastructure Delivering Business Value and ...Converged Everything, Converged Infrastructure Delivering Business Value and ...
Converged Everything, Converged Infrastructure Delivering Business Value and ...
 
Converged Everything, Converged Infrastructure delivering business value and ...
Converged Everything, Converged Infrastructure delivering business value and ...Converged Everything, Converged Infrastructure delivering business value and ...
Converged Everything, Converged Infrastructure delivering business value and ...
 
Infrastructure Strategies 2007
Infrastructure Strategies 2007Infrastructure Strategies 2007
Infrastructure Strategies 2007
 
Long and winding road - Chile 2014
Long and winding road - Chile 2014Long and winding road - Chile 2014
Long and winding road - Chile 2014
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
STAC, ZARR, COG, K8S and Data Cubes: The brave new world of satellite EO anal...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
APRICOT 2022: Using CCTLD data to study the impact of Local IXPs
APRICOT 2022: Using CCTLD data to study the impact of Local IXPsAPRICOT 2022: Using CCTLD data to study the impact of Local IXPs
APRICOT 2022: Using CCTLD data to study the impact of Local IXPs
 
Unified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AIUnified Data API for Distributed Cloud Analytics and AI
Unified Data API for Distributed Cloud Analytics and AI
 
Data Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud EraData Orchestration for the Hybrid Cloud Era
Data Orchestration for the Hybrid Cloud Era
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Emerging Computing Architectures
Emerging Computing ArchitecturesEmerging Computing Architectures
Emerging Computing Architectures
 
Dimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution OfferingDimension Data Cloud Business Unit - Solution Offering
Dimension Data Cloud Business Unit - Solution Offering
 
Networking Challenges for the Next Decade
Networking Challenges for the Next DecadeNetworking Challenges for the Next Decade
Networking Challenges for the Next Decade
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 

Plus de Ericsson Labs

Ericsson 5 g at mobile world congress 2014
Ericsson 5 g at mobile world congress 2014 Ericsson 5 g at mobile world congress 2014
Ericsson 5 g at mobile world congress 2014 Ericsson Labs
 
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research Ericsson Labs
 
Ericsson Application Awards 2014
Ericsson Application Awards 2014Ericsson Application Awards 2014
Ericsson Application Awards 2014Ericsson Labs
 
5G for the Networked Society beyond 2020
5G for the Networked Society beyond 20205G for the Networked Society beyond 2020
5G for the Networked Society beyond 2020Ericsson Labs
 
3D visual communication
3D visual communication3D visual communication
3D visual communicationEricsson Labs
 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationEricsson Labs
 
Federated Networked Cloud
Federated Networked CloudFederated Networked Cloud
Federated Networked CloudEricsson Labs
 
Technology Challenges in the Networked Society
Technology Challenges in the Networked SocietyTechnology Challenges in the Networked Society
Technology Challenges in the Networked SocietyEricsson Labs
 
The Connected Megacity
The Connected MegacityThe Connected Megacity
The Connected MegacityEricsson Labs
 
The Networked Society
The Networked SocietyThe Networked Society
The Networked SocietyEricsson Labs
 
Towards Timely Efficient Semantic Reasoning for the Networked Society
Towards Timely Efficient Semantic Reasoning for the Networked SocietyTowards Timely Efficient Semantic Reasoning for the Networked Society
Towards Timely Efficient Semantic Reasoning for the Networked SocietyEricsson Labs
 
Web Connectivity on Ericsson Labs
Web Connectivity on Ericsson LabsWeb Connectivity on Ericsson Labs
Web Connectivity on Ericsson LabsEricsson Labs
 
Stream analytics for churn prediction from Ericsson Research
Stream analytics for churn prediction from Ericsson ResearchStream analytics for churn prediction from Ericsson Research
Stream analytics for churn prediction from Ericsson ResearchEricsson Labs
 
Geo Location Messaging on Ericsson Labs
Geo Location Messaging on Ericsson LabsGeo Location Messaging on Ericsson Labs
Geo Location Messaging on Ericsson LabsEricsson Labs
 
An Overview of All Ericsson Labs APIs
An Overview of All Ericsson Labs APIsAn Overview of All Ericsson Labs APIs
An Overview of All Ericsson Labs APIsEricsson Labs
 
Over the Air 2011 Security Workshop
Over the Air 2011 Security Workshop Over the Air 2011 Security Workshop
Over the Air 2011 Security Workshop Ericsson Labs
 
Mobile Monday Athens 111003
Mobile Monday Athens 111003Mobile Monday Athens 111003
Mobile Monday Athens 111003Ericsson Labs
 
Mobile Monday London M2M Event 110516
Mobile Monday London M2M Event 110516Mobile Monday London M2M Event 110516
Mobile Monday London M2M Event 110516Ericsson Labs
 
Distributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsDistributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsEricsson Labs
 

Plus de Ericsson Labs (20)

Ericsson 5 g at mobile world congress 2014
Ericsson 5 g at mobile world congress 2014 Ericsson 5 g at mobile world congress 2014
Ericsson 5 g at mobile world congress 2014
 
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research
Evolved Cloud Collaboration Presentation at MWC14 by Ericsson Research
 
Ericsson Application Awards 2014
Ericsson Application Awards 2014Ericsson Application Awards 2014
Ericsson Application Awards 2014
 
5G for the Networked Society beyond 2020
5G for the Networked Society beyond 20205G for the Networked Society beyond 2020
5G for the Networked Society beyond 2020
 
3D visual communication
3D visual communication3D visual communication
3D visual communication
 
Openflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson CollaborationOpenflow Stanford University - Ericsson Collaboration
Openflow Stanford University - Ericsson Collaboration
 
Federated Networked Cloud
Federated Networked CloudFederated Networked Cloud
Federated Networked Cloud
 
Exploring Big Data
Exploring Big DataExploring Big Data
Exploring Big Data
 
Technology Challenges in the Networked Society
Technology Challenges in the Networked SocietyTechnology Challenges in the Networked Society
Technology Challenges in the Networked Society
 
The Connected Megacity
The Connected MegacityThe Connected Megacity
The Connected Megacity
 
The Networked Society
The Networked SocietyThe Networked Society
The Networked Society
 
Towards Timely Efficient Semantic Reasoning for the Networked Society
Towards Timely Efficient Semantic Reasoning for the Networked SocietyTowards Timely Efficient Semantic Reasoning for the Networked Society
Towards Timely Efficient Semantic Reasoning for the Networked Society
 
Web Connectivity on Ericsson Labs
Web Connectivity on Ericsson LabsWeb Connectivity on Ericsson Labs
Web Connectivity on Ericsson Labs
 
Stream analytics for churn prediction from Ericsson Research
Stream analytics for churn prediction from Ericsson ResearchStream analytics for churn prediction from Ericsson Research
Stream analytics for churn prediction from Ericsson Research
 
Geo Location Messaging on Ericsson Labs
Geo Location Messaging on Ericsson LabsGeo Location Messaging on Ericsson Labs
Geo Location Messaging on Ericsson Labs
 
An Overview of All Ericsson Labs APIs
An Overview of All Ericsson Labs APIsAn Overview of All Ericsson Labs APIs
An Overview of All Ericsson Labs APIs
 
Over the Air 2011 Security Workshop
Over the Air 2011 Security Workshop Over the Air 2011 Security Workshop
Over the Air 2011 Security Workshop
 
Mobile Monday Athens 111003
Mobile Monday Athens 111003Mobile Monday Athens 111003
Mobile Monday Athens 111003
 
Mobile Monday London M2M Event 110516
Mobile Monday London M2M Event 110516Mobile Monday London M2M Event 110516
Mobile Monday London M2M Event 110516
 
Distributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson LabsDistributed Shared Memory on Ericsson Labs
Distributed Shared Memory on Ericsson Labs
 

Dernier

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Telco Data Research Day 2013 NoSQL Presentation

  • 1. for Telco Data Research Day 2013 Prepared by Nicolas Seyvet Help from N. Hari Kumar P. Matray
  • 2. Who AM I? › Software Developer 10+ years at Ericsson › HLR, PGM, IMS-M, MMS, MTV, BCS › Joined Research late 2012 –BMUM -> BUSS (5+ years) –DUCI (<6 months) › Active member in various /// groups –Linux (ELX, UMWP, etc.), Agile, SWAN, EQNA › Open source contributor Ericsson Internal | 2013-06-03 | Page 2
  • 3. The Plan › Why NoSQL? › CAP › Research activities › Market trends Ericsson Internal | 2013-06-03 | Page 3
  • 5. NoSQL: Why? Trends – Usual Suspects Gossip SDN Gartner Data Center TCO Report, June 2012. Ericsson Internal | 2013-06-03 | Page 5 Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies
  • 6. NoSQL: Why? TrendS: Architecture › Multicore › Parallelization/Distribute d › Cloud › Schemaless Application Application 1980s: Mainframe applications Ericsson Internal | 2013-06-03 | Page 6 Application Application Application Application Application Application 1990s: Database as integration hub Application Application Application Application 2000s: Decoupled services Application Application
  • 7. Two Ways to Scale Go BIG or many? PARTITIO N Ericsson Internal | 2013-06-03 | Page 7 (replication)
  • 9. CAP Theorem Brewer’s Conjecture “Of three properties of shared-data systems – data Consistency, system Availability and tolerance to network Partitions – only two can be achieved at any given moment in time .” › 2000 Prof Eric Brewer, PoDC Conference Keynote › 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33 (2) Ericsson Internal | 2013-06-03 | Page 9
  • 10. CAP Theorem The business decision CONSISTENT Partition OR Available Ericsson Internal | 2013-06-03 | Page 10
  • 11. CAP Summary Available Traditio MySQL nal relationa l: , Postg re S Q L , e t c. Consistent AP CA CP dra, as s an em s iak, C e syst or t , R lik m Volde , Dynamo hD b Couc AP: Requests will complete at any node possibly violating consistency Partition Tolerance HBase, MongoDB, Redis, BigTable like systems CP: Requests will complete at nodes that have quorum Ericsson Internal | 2013-06-03 | Page 11
  • 12. Why NoSQL now? › Trends “Internet size”, Cluster friendly Rapid development / Solution oriented Polyglot Persistence Schemaless Ericsson Internal | 2013-06-03 | Page 12
  • 14. HBAse BigTable/Columnar Coordination Master selection Root region lookup Node registration … Data files Write-Ahead Log (WAL) Rack aware Default data replication x3 Region allocation Failover Log splitting Load balancing One active (elected), many stand by Holds regions Handle I/O requests In-Memory data (MemStore) Split regions Compact regions › ZooKeeper (cluster) › Hadoop (cluster) › HBase: 1 elected master / many region servers Ericsson Internal | 2013-06-03 | Page 14
  • 15. TelCO Applicability Study Hbase For HLR data? ›Comprehensive report ›Using HBase is DOABLE! OK! Ericsson Internal | 2013-06-03 | Page 15
  • 16. HBASE BULK Processing Event Processing & Aggregation › 100 Million rows Queries evaluated SELECT col1 FROM table SELECT SUM(col1) FROM table WHERE col2=val2 GROUP BY col3 › › › › › Map/Reduce › Scan › Co-processor Ericsson Internal | 2013-06-03 | Page 16 CPU RAM Network Schema
  • 17. Bulk Processing Scaling out/Horizontally › 100 Million rows › Linear scaling! SELECT SUM(col1) FROM table WHERE col2=val2 GROUP BY col3 Ericsson Internal | 2013-06-03 | Page 17
  • 18. READ/WRITE 100000 iterations Periodic degradation › 150,000,000 rows › row = key + 1 column (1K) Entire cluster up and running 8 nodes ( 1Master / 7 slaves) Ericsson Internal | 2013-06-03 | Page 18
  • 20. How much Data can it Fit? ITK / Constellation / CEA › Network produces events – RNC, SGSN, S-&R-KPI – Traffic DPI – GTP-C › CEA (Perfmon) – Correlated events 1000+ K events/s Event Event Feeder Feeder 10+ K events/s Map/Reduce Put.. Put.. Put… 10,000,000 subscribers Staging data on HDFS HBase HBase BulkLoader BulkLoader HBase HBase PutLoader PutLoader Look Ericsson Internal | 2013-06-03 | Page 20 up d at a
  • 21. The Upcoming Fight Storkluster 18 machines Ericsson Internal | 2013-06-03 | Page 21 Bigdata 2 machines
  • 22. What about HDFS ? Small files (250 B) › It scales! › TestDFSIO benchmark > 3000 GB/s - Read > 2000 GB/s - Writes CPU CPU Larger files (1 KB) › But …. it is not that simple… Ericsson Internal | 2013-06-03 | Page 22 CPU and I/O CPU and I/O Larger files (1 KB) Network Network
  • 23. What about End to End? writing to Hbase included 100 K events/s › It scales! › And it gets… more complicated 200 K events/s Ericsson Internal | 2013-06-03 | Page 23
  • 24. But…. › Within ~2 hours – Rows/s – CPU – IO ----------+++ +++++++++ Ericsson Internal | 2013-06-03 | Page 24 7K/s x2 100%
  • 25. HDFS CURSE Compaction Storm › Remember what we were doing? – Hint: Creating lots of small files to add to HBase?.. › Major compaction storm! – Manage compaction and region splitting Ericsson Internal | 2013-06-03 | Page 25 HBase HBase BulkLoader BulkLoader M/R
  • 26. Conclusion › Scalability … Scalability… Scalability › It works but it is not so easy… › Recommendation: – Polyglot data storage Ericsson Internal | 2013-06-03 | Page 26
  • 27. Ericsson Internal | 2013-06-03 | Page 27
  • 29. NoSQL: The name › It is not about saying SQL is bad or should not be used › ”An accidental neologism” – Martin Fowler › A twitter hash › No prescriptive definition, just observations of common characteristics – “Any database that is not a Relational Database” – Running well on clusters (scalable) – schemaless › Polyglot persistence – Using different stores in different circumstances Ericsson Internal | 2013-06-03 | Page 29 The term was coined at a meetup with the creators behind some prominent emerging databases ... then there was a conference ... ... and a mailing list ... ... the name caught on ... ... then there were more conferences ... ... and here we are!
  • 30. NoSQL: Why? Trend No 2/4: Connectedness Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies M2M Application Ericsson Internal | 2013-06-03 | Page 30
  • 31. NoSQL: Why? Trend No 3/4: Content Individualization Schemaless •Extend at runtime •De-normalize •Domain design (not schema migration) › Individualization of content › Decentralization Ericsson Internal | 2013-06-03 | Page 31
  • 32. NoSQL Landscape › 4 emerging categories Key-Value Graph BigTable Document (NewSQL) DBN Ericsson Internal | 2013-06-03 | Page 32 (Object)
  • 33. Consistency “A system is consistent if an update is applied to all relevant nodes at the same logical time ” Strong consistency Weak consistency Atomicity Consistency Isolation Durability (ACID) Eventual consistency (inconsistency window) NoSQL solutions DO support Transactions Standard database replication (or caching) IS NOT strongly consistent, as such any solutions making use of any of those is by definition Eventually Consistent at best Ericsson Internal | 2013-06-03 | Page 33
  • 34. Partition Tolerance / Availability › “The network will be allowed to lose arbitrarily many messages sent from one node to another” [..] › “For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response ” Gilbert and Lynch, SIGACT 2002 CP: Requests will complete at nodes that have quorum AP: Requests will complete at any node possibly violating consistency High latency ~= Partition Ericsson Internal | 2013-06-03 | Page 34
  • 35. HBASE BULK Processing Event Processing & Aggregation › 100 Million rows Queries evaluated SELECT col1 FROM table SELECT SUM(col1) FROM table WHERE col2=val2 GROUP BY col3 Ericsson Internal | 2013-06-03 | Page 35

Notes de l'éditeur

  1. Individualization of content •In the salary lists of the 1970s, all elements had exactly one job •In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15?
  2. Database developers all know the ACID acronym. It says that database transactions should be: Atomic: Everything in a transaction succeeds or the entire transaction is rolled back. Consistent: A transaction cannot leave the database in an inconsistent state. Isolated: Transactions cannot interfere with each other. Durable: Completed transactions persist, even when servers restart etc. These qualities seem indispensable, and yet they are incompatible with availability and performance in very large systems. For example, suppose you run an online book store and you proudly display how many of each book you have in your inventory. Every time someone is in the process of buying a book, you lock part of the database until they finish so that all visitors around the world will see accurate inventory numbers. That works well if you run The Shop Around the Corner but not if you run Amazon.com. Amazon might instead use cached data. Users would not see not the inventory count at this second, but what it was say an hour ago when the last snapshot was taken. Also, Amazon might violate the “I” in ACID by tolerating a small probability that simultaneous transactions could interfere with each other. For example, two customers might both believe that they just purchased the last copy of a certain book. The company might risk having to apologize to one of the two customers (and maybe compensate them with a gift card) rather than slowing down their site and irritating myriad other customers. There is a computer science theorem that quantifies the inevitable trade-offs. Eric Brewer’s CAP theorem says that if you want consistency, availability, and partition tolerance, you have to settle for two out of three. (For a distributed system, partition tolerance means the system will continue to work unless there is a total network failure. A few nodes can fail and the system keeps going.) An alternative to ACID is BASE: Basic Availability Soft-state Eventual consistency Rather than requiring consistency after every transaction, it is enough for the database to eventually be in a consistent state. (Accounting systems do this all the time. It’s called “closing out the books.”) It’s OK to use stale data, and it’s OK to give approximate answers. It’s harder to develop software in the fault-tolerant BASE world compared to the fastidious ACID world, but Brewer’s CAP theorem says you have no choice if you want to scale up. However, as Brewer points out in this presentation, there is a continuum between ACID and BASE. You can decide how close you want to be to one end of the continuum or the other according to your priorities.
  3. The Sex Pistols had shown that barely-constrained fury was more important to their contemporaries than art-school structuralism, giving anyone with three chords and something to say permission to start a band. Eric Brewer, in what became known as Brewer&amp;apos;s Conjecture, said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have, thus giving anyone with three servers and a keen eye for customer experience permission to start an internet scale business. Disciples of Brewer (present that day or later converts) include the likes of Amazon, EBay, and Twitter.     What he said was there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment (he was talking specifically about the web but so many corporate businesses are multi-site/multi-country these days that the effects could equally apply to your data-centre/LAN/WAN arrangement).
  4. The Sex Pistols had shown that barely-constrained fury was more important to their contemporaries than art-school structuralism, giving anyone with three chords and something to say permission to start a band. Eric Brewer, in what became known as Brewer&amp;apos;s Conjecture, said that as applications become more web-based we should stop worrying about data consistency, because if we want high availability in these new distributed applications, then guaranteed consistency of data is something we cannot have, thus giving anyone with three servers and a keen eye for customer experience permission to start an internet scale business. Disciples of Brewer (present that day or later converts) include the likes of Amazon, EBay, and Twitter.     What he said was there are three core systemic requirements that exist in a special relationship when it comes to designing and deploying applications in a distributed environment (he was talking specifically about the web but so many corporate businesses are multi-site/multi-country these days that the effects could equally apply to your data-centre/LAN/WAN arrangement). Conistent vs available is a business decision. Not an engineering one.
  5. NoSQL ttend to sacrifice full C and A for P at any given time -&amp;gt; All in all eventually A or C Databases are great at this because they focus on ACID properties and give us Consistency by also giving us Isolation, so that when Customer One is reducing books-in-stock by one, and simultaneously increasing books-in-basket by one, any intermediate states are isolated from Customer Two, who has to wait a few milliseconds while the data store is made consistent.   Once you start to spread data and logic around different nodes then there&amp;apos;s a risk of partitions forming. A partition happens when, say, a network cable gets chopped, and Node A can no longer communicate with Node B. With the kind of distribution capabilities the web provides, temporary partitions are a relatively common occurrence and, as I said earlier, they&amp;apos;re also not that rare inside global corporations with multiple data centres.
  6. Easier scalability is the first aspect highlighted by Wiederhold. NoSQL databases like Couchbase and 10Gen&amp;apos;s MongoDB, he said, can be scaled up to handle much bigger data volumes with relative ease.If your company suddenly finds itself deluged by overnight success, for example, with customers coming to your Web site by the droves, a relational database would have to be painstakingly replicated and re-partitioned in order to scale up to meet the new demand.Wiederhold cited social and mobile gaming vendors as the big example of this kind of situation. An endorsement or a few well-timed tweets could spin up semi-dormant gaming servers and get them to capacity in mere hours. Because of the distributed nature of non-relational databases, to scale NoSQL all you need to do is add machines to the cluster to meet demand.
  7. Could we store these, if they came in a stream?
  8. As data increases, there may be many StoreFiles on HDFS, which is not good for its performance. Thus, HBase will automatically pick up a couple of the smaller StoreFiles and rewrite them into a bigger one. This process is called minor compaction. For certain situations, or when triggered by a configured interval (once a day by default), major compaction runs automatically. Major compaction will drop the deleted or expired cells and rewrite all the StoreFiles in the Store into a single StoreFile; this usually improves the performance.However, as major compaction rewrites all of the Stores&amp;apos; data, lots of disk I/O and network traffic might occur during the process. This is not acceptable on a heavy load system. You might want to run it at a lower load time of your system. Rather than let HBase auto-split your Regions, manage the splitting manually [11]. With growing amounts of data, splits will continually be needed. Since you always know exactly what regions you have, long-term debugging and profiling is much easier with manual splits. It is hard to trace the logs to understand region level problems if it keeps splitting and getting renamed. Data offlining bugs + unknown number of split regions == oh crap! If an HLog or StoreFile was mistakenly unprocessed by HBase due to a weird bug and you notice it a day or so later, you can be assured that the regions specified in these files are the same as the current regions and you have less headaches trying to restore/replay your data. You can finely tune your compaction algorithm. With roughly uniform data growth, it&amp;apos;s easy to cause split / compaction storms as the regions all roughly hit the same data size at the same time. With manual splits, you can let staggered, time-based major compactions spread out your network IO load.How do I turn off automatic splitting? Automatic splitting is determined by the configuration value hbase.hregion.max.filesize. It is not recommended that you set this to Long.MAX_VALUE in case you forget about manual splits. A suggested setting is 100GB, which would result in &amp;gt; 1hr major compactions if reached.What&amp;apos;s the optimal number of pre-split regions to create? Mileage will vary depending upon your application. You could start low with 10 pre-split regions / server and watch as data grows over time. It&amp;apos;s better to err on the side of too little regions and rolling split later. A more complicated answer is that this depends upon the largest storefile in your region. With a growing data size, this will get larger over time. You want the largest region to be just big enough that the Store compact selection algorithm only compacts it due to a timed major. If you don&amp;apos;t, your cluster can be prone to compaction storms as the algorithm decides to run major compactions on a large series of regions all at once. Note that compaction storms are due to the uniform data growth, not the manual split decision.If you pre-split your regions too thin, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size grows too large, use the (post-0.90.0 HBase) org.apache.hadoop.hbase.util.RegionSplitter script to perform a network IO safe rolling split of all regions.
  9. Either way this is not good for business. Amazon claim(http://highscalability.com/latency-everywhere-and-it-costs-you-sales-how-crush-it) that just an extra one tenth of a second on their response times will cost them 1% in sales. Google said(http://glinden.blogspot.com/2006/11/marissa-mayer-at-web-20.html) they noticed that just a half a second increase in latency caused traffic to drop by a fifth. Where both sides agree though is that the answer to scale is distributed parallelisation not, as was once thought, supercomputer grunt.   If they&amp;apos;re not working in parallel you have no chance to get the problem done in a reasonable amount of time. This is a lot like anything else. If you have a really big job to do you get lots of people to do it. So if you are building a bridge you have lots of construction workers. That&amp;apos;s parallel processing also. So a lot of this will end up being &amp;quot;how do we mix parallel processing and the internet?&amp;quot;Inktomi and the Internet Bubble
  10. In our account of the history of NoSQL development, we’ve concentrated on big data running on clusters. While we think this is the key thing that drove the opening up of the database world, it isn&amp;apos;t the only reason we see project teams considering NoSQL databases. An equally important reason is the old frustration with the impedance mismatch problem. The big data concerns have created an opportunity for people to think freshly about their data storage needs, and some development teams see that using a NoSQL database can help their productivity by simplifying their database access even if they have no need to scale beyond a single machine.
  11. Individualization of content •In the salary lists of the 1970s, all elements had exactly one job •In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15?
  12. Individualization of content •In the salary lists of the 1970s, all elements had exactly one job •In the salary lists of the 2000s, we need 5 job columns! Or 8?Or 15? The recent trend in discussing NoSQL databases is to highlight their schemaless nature—it is a popular feature that allows developers to concentrate on the domain design without worrying about schema changes. It’s especially true with the rise of agile methods [Agile Methods] where responding to changing requirements is important. Discussions, iterations, and feedback loops involving domain experts and product owners are important to derive the right understanding of the data; these discussions must not be hampered by a database&amp;apos;s schema complexity. With NoSQL data stores, changes to the schema can be made with the least amount of friction, improving developer productivity (“The Emergence of NoSQL,” p. 9). We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration
  13. A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of so-called NoSQL databases and the popularity of the term &amp;quot;document-oriented database&amp;quot; (or &amp;quot;document store&amp;quot;) has grown[citation needed] with the use of the term NoSQL itself. In contrast to well-known relational databases and their notions of &amp;quot;Relations&amp;quot; (or &amp;quot;Tables&amp;quot;), these systems are designed around an abstract notion of a &amp;quot;Document&amp;quot;. By storing and managing data based on columns rather than rows, column-oriented architecture overcomes query limitations that exist in traditional row-based RDBMS. Only the necessary columns in a query are accessed, reducing I/O activities by circumventing unneeded rows.[7] This enables it to[clarification needed] with very large data volumes (Terabytes to Petabytes), commonly referred to as Big Data. Big data[1][2] is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,[3] search, sharing, transfer, analysis,[4] and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller set
  14. http://codahale.com/you-cant-sacrifice-partition-tolerance/ Some systems cannot be partitioned. Single-node systems (e.g., a monolithic Oracle server with no replication) are incapable of experiencing a network partition. But practically speaking these are rare; add remote clients to the monolithic Oracle server and you get a distributed system which can experience a network partition (e.g., the Oracle server becomes unavailable). Network partitions aren’t limited to dropped packets: a crashed server can be thought of as a network partition. The failed node is effectively the only member of its partition component, and thus all messages to it are “lost” (i.e., they are not processed by the node due to its failure). Handling a crashed machine counts as partition-tolerance.