SlideShare une entreprise Scribd logo
1  sur  188
Session Objectives
Introduction
Hadoop Distributed File System (HDFS)
Scheduling in Hadoop (using YARN).
Hadoop Ecosystem: Databases and Querying
DFS and HDFS
Summary
DFS and HDFS
DFS and HDFS
HDFS Read Operation
HDFS Write Operation
File block and replication
File block and replication
9
Why Scheduling?
 Multiple “tasks” to schedule
 The processes on a single-core OS
 The tasks of a Hadoop job
 The tasks of multiple Hadoop jobs
 Limited resources that these tasks require
 Processor(s)
 Memory
 (Less contentious) disk, network
 Scheduling goals
1. Good throughput or response time for tasks (or jobs)
2. High utilization of resources
10
Single Processor Scheduling
Task 1
10
Task 2
5
Task 3
3
Arrival Times  0 6 8
Processor
Task Length Arrival
1 10 0
2 5 6
3 3 8
Which tasks run when?
10
11
FIFO Scheduling (First-In First-Out)/FCFS
Task 1 Task 2 Task 3
Time  0 6 8 10 15 18
Processor
Task Length Arrival
1 10 0
2 5 6
3 3 8
• Maintain tasks in a queue in order of arrival
• When processor free, dequeue head and schedule it
11
12
FIFO/FCFS Performance
 Average completion time may be high
 For our example on previous slides,
 Average completion time of FIFO/FCFS =
(Task 1 + Task 2 + Task 3)/3
= (10+15+18)/3
= 43/3
= 14.33
12
13
STF Scheduling (Shortest Task First)
Task 1Task 2Task 3
Time  0 3 8 18
Processor
Task Length Arrival
1 10 0
2 5 0
3 3 0
• Maintain all tasks in a queue, in increasing order of running time
• When processor free, dequeue head and schedule
13
14
STF Is Optimal!
 Average completion of STF is the shortest among all
scheduling approaches!
 For our example on previous slides,
 Average completion time of STF =
(Task 1 + Task 2 + Task 3)/3
= (18+8+3)/3
= 29/3
= 9.66
(versus 14.33 for FIFO/FCFS)
 In general, STF is a special case of priority scheduling
 Instead of using time as priority, scheduler could use user-provided
priority
14
15
Round-Robin Scheduling
Time  0 6 8
Processor
Task Length Arriv
al
1 10 0
2 5 6
3 3 8
• Use a quantum (say 1 time unit) to run portion of task at queue head
• Pre-empts processes by saving their state, and resuming later
• After pre-empting, add to end of queue
Task 1
15 (Task 3 done)
…
15
16
Round-Robin vs. STF/FIFO
 Round-Robin preferable for
 Interactive applications
 User needs quick responses from system
 FIFO/STF preferable for Batch applications
 User submits jobs, goes away, comes back to get result
16
17
Summary
 Single processor scheduling algorithms
 FIFO/FCFS
 Shortest task first (optimal!)
 Priority
 Round-robin
 Many other scheduling algorithms out there!
 What about cloud scheduling?
 Next!
17
18
Hadoop Scheduling
 A Hadoop job consists of Map tasks and Reduce tasks
 Only one job in entire cluster => it occupies cluster
 Multiple customers with multiple jobs
 Users/jobs = “tenants”
 Multi-tenant system
 => Need a way to schedule all these jobs (and their
constituent tasks)
 => Need to be fair across the different tenants
 Hadoop YARN has two popular schedulers
 Hadoop Capacity Scheduler
 Hadoop Fair Scheduler
18
19
Hadoop Capacity Scheduler
 Contains multiple queues
 Each queue contains multiple jobs
 Each queue guaranteed some portion of the cluster capacity
E.g.,
 Queue 1 is given 80% of cluster
 Queue 2 is given 20% of cluster
 Higher-priority jobs go to Queue 1
 For jobs within same queue, FIFO typically used
 Administrators can configure queues
Source: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
19
20
Elasticity in HCS
 Administrators can configure each queue with limits
 Soft limit: how much % of cluster is the queue guaranteed to occupy
 (Optional) Hard limit: max % of cluster given to the queue
 Elasticity
 A queue allowed to occupy more of cluster if resources free
 But if other queues below their capacity limit, now get full, need to give
these other queues resources
 Pre-emption not allowed!
 Cannot stop a task part-way through
 When reducing % cluster to a queue, wait until some tasks of that queue
have finished
20
21
Other HCS Features
 Queues can be hierarchical
 May contain child sub-queues, which may contain child sub-queues, and so
on
 Child sub-queues can share resources equally
 Scheduling can take memory requirements into account
(memory specified by user)
21
22
Hadoop Fair Scheduler
 Goal: all jobs get equal share of resources
 When only one job present, occupies entire cluster
 As other jobs arrive, each job given equal % of cluster
 E.g., Each job might be given equal number of cluster-wide YARN
containers
 Each container == 1 task of job
Source: http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html 22
23
Hadoop Fair Scheduler (2)
 Divides cluster into pools
 Typically one pool per user
 Resources divided equally among pools
 Gives each user fair share of cluster
 Within each pool, can use either
 Fair share scheduling, or
 FIFO/FCFS
 (Configurable)
23
24
Pre-emption in HFS
 Some pools may have minimum shares
 Minimum % of cluster that pool is guaranteed
 When minimum share not met in a pool, for a while
 Take resources away from other pools
 By pre-empting jobs in those other pools
 By killing the currently-running tasks of those jobs
 Tasks can be re-started later
 Ok since tasks are idempotent!
 To kill, scheduler picks most-recently-started tasks
 Minimizes wasted work
24
25
HFS Features
 Can also set limits on
 Number of concurrent jobs per user
 Number of concurrent jobs per pool
 Number of concurrent tasks per pool
 Prevents cluster from being hogged by one user/job
25
26
Estimating Task Lengths
 HCS/HFS use FIFO
 May not be optimal (as we know!)
 Why not use shortest-task-first instead? It‟s optimal (as we know!)
 Challenge: Hard to know expected running time of task (before it‟s
completed)
 Solution: Estimate length of task
 Some approaches
 Within a job: Calculate running time of task as proportional to size of its input
 Across tasks: Calculate running time of task in a given job as average of other
tasks in that given job (weighted by input size)
 Lots of recent research results in this area!
26
27
Summary
 Hadoop Scheduling in YARN
 Hadoop Capacity Scheduler
 Hadoop Fair Scheduler
 Yet, so far we‟ve talked of only one kind of resource
 Either processor, or memory
 How about multi-resource requirements?
 Next!
27
28
Challenge
 What about scheduling VMs in a cloud (cluster)?
 Jobs may have multi-resource requirements
 Job 1‟s tasks: 2 CPUs, 8 GB
 Job 2‟s tasks: 6 CPUs, 2 GB
 How do you schedule these jobs in a “fair” manner?
 That is, how many tasks of each job do you allow the
system to run concurrently?
 What does fairness even mean?
28
29
Dominant Resource Fairness (DRF)
 Proposed by researchers from U. California Berkeley
 Proposes notion of fairness across jobs with multi-
resource requirements
 They showed that DRF is
 Fair for multi-tenant systems
 Strategy-proof: tenant can‟t benefit by lying
 Envy-free: tenant can‟t envy another tenant‟s allocations
29
30
Where is DRF Useful?
 DRF is
 Usable in scheduling VMs in a cluster
 Usable in scheduling Hadoop in a cluster
 DRF used in Mesos, an OS intended for cloud environments
 DRF-like strategies also used some cloud computing
company‟s distributed OS‟s
30
31
How DRF Works
 Our example
 Job 1‟s tasks: 2 CPUs, 8 GB
=> Job 1‟s resource vector = <2 CPUs, 8 GB>
 Job 2‟s tasks: 6 CPUs, 2 GB
=> Job 2‟s resource vector = <6 CPUs, 2 GB>
 Consider a cloud with <18 CPUs, 36 GB RAM>
31
32
DRF Works (2)
 Our example
 Job 1‟s tasks: 2 CPUs, 8 GB
=> Job 1‟s resource vector = <2 CPUs, 8 GB>
 Job 2‟s tasks: 6 CPUs, 2 GB
=> Job 2‟s resource vector = <6 CPUs, 2 GB>
 Consider a cloud with <18 CPUs, 36 GB RAM>
 Each Job 1‟s task consumes % of total CPUs = 2/18 = 1/9
 Each Job 1‟s task consumes % of total RAM = 8/36 = 2/9
 1/9 < 2/9
 => Job 1’s dominant resource is RAM, i.e., Job 1 is more memory-
intensive than it is CPU-intensive
32
33
How DRF Works (3)
 Our example
 Job 1‟s tasks: 2 CPUs, 8 GB
=> Job 1‟s resource vector = <2 CPUs, 8 GB>
 Job 2‟s tasks: 6 CPUs, 2 GB
=> Job 2‟s resource vector = <6 CPUs, 2 GB>
 Consider a cloud with <18 CPUs, 36 GB RAM>
 Each Job 2‟s task consumes % of total CPUs = 6/18 =
6/18
 Each Job 2‟s task consumes % of total RAM = 2/36 =
1/18
 6/18 > 1/18
 => Job 2’s dominant resource is CPU, i.e., Job 1 is more CPU-
intensive than it is memory-intensive
33
34
DRF Fairness
 For a given job, the % of its dominant resource type
that it gets cluster-wide, is the same for all jobs
 Job 1‟s % of RAM = Job 2‟s % of CPU
 Can be written as linear equations, and solved
34
35
DRF Solution, For our Example
 DRF Ensures
 Job 1‟s % of RAM = Job 2‟s % of CPU
 Solution for our example:
 Job 1 gets 3 tasks each with <2 CPUs, 8 GB>
 Job 2 gets 2 tasks each with <6 CPUs, 2 GB>
• Job 1‟s % of RAM
= Number of tasks * RAM per task / Total cluster RAM
= 3*8/36 = 2/3
• Job 2‟s % of CPU
= Number of tasks * CPU per task / Total cluster CPUs
= 2*6/18 = 2/3
35
36
Other DRF Details
 DRF generalizes to multiple jobs
 DRF also generalizes to more than 2 resource types
 CPU, RAM, Network, Disk, etc.
 DRF ensures that each job gets a fair share of that
type of resource which the job desires the most
 Hence fairness
36
37
Summary: Scheduling
 Scheduling very important problem in cloud
computing
 Limited resources, lots of jobs requiring access to these jobs
 Single-processor scheduling
 FIFO/FCFS, STF, Priority, Round-Robin
 Hadoop scheduling
 Capacity scheduler, Fair scheduler
 Dominant-Resources Fairness
37
Session Objectives
Introduction, DBMS, Types
HBASE
Hive
PIG
Big table and Graph Database
Summary
40
History of the World, Part 1
40
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce sites
 Developers begin to front RDBMS with memcache or
integrate other caching mechanisms within the
application (ie. Ehcache)
41
Scaling Up
41
 Issues with scaling up when the dataset is just too big
 RDBMS were not designed to be distributed
 Began to look at multi-node database solutions
 Known as „scaling out‟ or „horizontal scaling‟
 Different approaches include:
 Master-slave
 Sharding
42
Scaling RDBMS – Master/Slave
42
 Master-Slave
 All writes are written to the master. All
reads performed against the replicated
slave databases
 Critical reads may be incorrect as writes
may not have been propagated down
 Large data sets can pose problems as
master needs to duplicate data to slaves
43
Scaling RDBMS - Sharding
43
 Partition or sharding
 Scales well for both reads and writes
 Not transparent, application needs to be
partition-aware
 Can no longer have relationships/joins
across partitions
 Loss of referential integrity across shards
44
Other ways to scale RDBMS
44
 Multi-Master replication
 INSERT only, not UPDATES/DELETES
 No JOINs, thereby reducing query time
 This involves de-normalizing data
 In-memory databases
45
What is NoSQL?
45
 Stands for Not Only SQL
 Class of non-relational data storage
systems
 Usually do not require a fixed table
schema nor do they use the concept of
joins
 All NoSQL offerings relax one or more of
the ACID properties (will talk about the
CAP theorem)
46
Why NoSQL?
46
 For data storage, an RDBMS cannot be the
be-all/end-all
 Just as there are different programming
languages, need to have other data
storage tools in the toolbox
 A NoSQL solution is more acceptable to a
client now than even a year ago
 Think about proposing a Ruby/Rails or
Groovy/Grails solution now versus a couple
of years ago
47
How did we get here?
47
 Explosion of social media sites (Facebook, Twitter)
with large data needs
 Rise of cloud-based solutions such as Amazon S3
(simple storage solution)
 Just as moving to dynamically-typed languages
(Ruby/Groovy), a shift to dynamically-typed data
with frequent schema changes
 Open-source community
48
Dynamo and BigTable
48
 Three major papers were the seeds of the
NoSQL movement
 BigTable (Google)
 Dynamo (Amazon)
 Gossip protocol (discovery and error
detection)
 Distributed key-value data store
 Eventual consistency
 CAP Theorem (discuss in a sec ..)
49
The Perfect Storm
49
 Large datasets, acceptance of alternatives, and
dynamically-typed data has come together in a
perfect storm
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be
rivaled by the current list of NoSQL offerings
50
CAP Theorem
50
 Three properties of a system: consistency,
availability and partitions
 You can have at most two of these three
properties for any shared-data system
 To scale out, you have to partition. That
leaves either consistency or availability to
choose from
 In almost all cases, you would choose
availability over consistency
51
CAP Theorem
51
Consistency
Partition
tolerance
Availability
52
CAP Theorem
52
Once a writer has written,
all readers will see that
write
Consistency
Partition
tolerance
Availability
53
Consistency
53
 Two kinds of consistency:
 strong consistency – ACID(Atomicity Consistency
Isolation Durability)
 weak consistency – BASE(Basically Available Soft-
state Eventual consistency )
54
ACID Transactions
54
54
 A DBMS is expected to support “ACID transactions,”
processes that are:
 Atomic : Either the whole process is done or none is.
 Consistent : Database constraints are preserved.
 Isolated : It appears to the user as if only one process executes at
a time.
 Durable : Effects of a process do not get lost if the system
crashes.
55
Atomicity
55
55
 A real-world event either happens or does
not happen
 Student either registers or does not register
 Similarly, the system must ensure that either
the corresponding transaction runs to
completion or, if not, it has no effect at all
 Not true of ordinary programs. A crash could
leave files partially updated on recovery
56
Commit and Abort
56
56
 If the transaction successfully completes it
is said to commit
 The system is responsible for ensuring that all
changes to the database have been saved
 If the transaction does not successfully
complete, it is said to abort
 The system is responsible for undoing, or rolling
back, all changes the transaction has made
57
Database Consistency
57
57
 Enterprise (Business) Rules limit the
occurrence of certain real-world events
 Student cannot register for a course if the current
number of registrants equals the maximum allowed
 Correspondingly, allowable database states
are restricted
cur_reg <= max_reg
 These limitations are called (static) integrity
constraints: assertions that must be satisfied
by all database states (state invariants).
58
Database Consistency (state invariants)
58
58
 Other static consistency requirements are
related to the fact that the database might
store the same information in different ways
 cur_reg = |list_of_registered_students|
 Such limitations are also expressed as integrity
constraints
 Database is consistent if all static integrity
constraints are satisfied
59
Transaction Consistency
59
59
 A consistent database state does not necessarily
model the actual state of the enterprise
 A deposit transaction that increments the balance by
the wrong amount maintains the integrity constraint
balance  0, but does not maintain the relation between
the enterprise and database states
 A consistent transaction maintains database
consistency and the correspondence between the
database state and the enterprise state (implements
its specification)
 Specification of deposit transaction includes
balance = balance + amt_deposit ,
(balance is the next value of balance)
60
Dynamic Integrity Constraints (transition invariants)
60
60
 Some constraints restrict allowable state
transitions
 A transaction might transform the database
from one consistent state to another, but the
transition might not be permissible
 Example: A letter grade in a course (A, B, C, D,
F) cannot be changed to an incomplete (I)
 Dynamic constraints cannot be checked
by examining the database state
61
Transaction Consistency
61
61
 Consistent transaction: if DB is in consistent
state initially, when the transaction completes:
 All static integrity constraints are satisfied (but
constraints might be violated in intermediate states)
Can be checked by examining snapshot of database
 New state satisfies specifications of transaction
Cannot be checked from database snapshot
 No dynamic constraints have been violated
Cannot be checked from database snapshot
62
Isolation
62
62
 Serial Execution: transactions execute in sequence
 Each one starts after the previous one completes.
 Execution of one transaction is not affected by the
operations of another since they do not overlap in time
 The execution of each transaction is isolated from
all others.
 If the initial database state and all transactions are
consistent, then the final database state will be
consistent and will accurately reflect the real-world
state, but
 Serial execution is inadequate from a performance
perspective
63
Isolation
63
63
 Concurrent execution offers performance benefits:
 A computer system has multiple resources capable of
executing independently (e.g., cpu’s, I/O devices), but
 A transaction typically uses only one resource at a time
 Hence, only concurrently executing transactions can
make effective use of the system
 Concurrently executing transactions yield interleaved
schedules
64
Concurrent Execution
64
T1
T2
DBMS
local computation
local variables
sequence of db
operations output by T1op1,1 op1.2
op2,1 op2.2
op1,1 op2,1 op2.2 op1.2
interleaved sequence of db
operations input to DBMS
begin trans
..
op1,1
..
op1,2
..
commit
65
Durability
65
65
 The system must ensure that once a transaction
commits, its effect on the database state is not
lost in spite of subsequent failures
 Not true of ordinary programs. A media failure after a
program successfully terminates could cause the file
system to be restored to a state that preceded the
program’s execution
66
Implementing Durability
66
 Database stored redundantly on mass storage
devices to protect against media failure
 Architecture of mass storage devices affects
type of media failures that can be tolerated
 Related to Availability: extent to which a
(possibly distributed) system can provide
service despite failure
 Non-stop DBMS (mirrored disks)
 Recovery based DBMS (log)
67
Consistency Model
67
 A consistency model determines rules for visibility and
apparent order of updates.
 For example:
 Row X is replicated on nodes M and N
 Client A writes row X to node N
 Some period of time t elapses.
 Client B reads row X from node M
 Does client B see the write from client A?
 Consistency is a continuum with tradeoffs
 For NoSQL, the answer would be: maybe
 CAP Theorem states: Strict Consistency can't be achieved at
the same time as availability and partition-tolerance.
68
Eventual Consistency
68
 When no updates occur for a long
period of time, eventually all updates
will propagate through the system and
all the nodes will be consistent
 For a given accepted update and a
given node, eventually either the
update reaches the node or the node
is removed from service
 Known as BASE (Basically Available,
Soft state, Eventual consistency), as
opposed to ACID
69
The CAP Theorem
69
System is available during
software and hardware
upgrades and node
failures.
Consistency
Partition
toleranc
e
Availabilit
y
70
Availability
70
 Traditionally, thought of as the server/process
available five 9‟s (99.999 %).
 However, for large node system, at almost any
point in time there‟s a good chance that a node is
either down or there is a network disruption among
the nodes.
 Want a system that is resilient in the face of network
disruption
71
The CAP Theorem
71
A system can continue to
operate in the presence
of a network partitions.
Consistency
Partition
toleranc
e
Availabilit
y
72
The CAP Theorem
72
Theorem: You can have at most two of
these properties for any shared-data
system
Consistency
Partition
toleranc
e
Availabilit
y
73
What kinds of NoSQL
73
 NoSQL solutions fall into two major areas:
 Key/Value or „the big hash table‟.
 Amazon S3 (Dynamo)
 Voldemort
 Scalaris
 Memcached (in-memory key/value store)
 Redis
 Schema-less which comes in multiple flavors, column-based,
document-based or graph-based.
 Cassandra (column-based)
 CouchDB (document-based)
 MongoDB(document-based)
 Neo4J (graph-based)
 HBase (column-based)
74
Key/Value
74
Pros:
 very fast
 very scalable
 simple model
 able to distribute horizontally
Cons:
- many data structures (objects) can't be
easily modeled as key value pairs
75
Schema-Less
75
Pros:
- Schema-less data model is richer than key/value
pairs
- eventual consistency
- many are distributed
- still provide excellent performance and scalability
Cons:
- typically no ACID transactions or joins
76
Common Advantages
76
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore identical and fault-tolerant)
and can be partitioned
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don't require a schema
 Can scale up and down
 Relax the data consistency requirement (CAP)
77
What am I giving up?
77
 joins
 group by
 order by
 ACID transactions
 SQL as a sometimes frustrating but still
powerful query language
 easy integration with other applications
that support SQL
79
Types of DBMS
79
 Hierarchical database
model resembles a tree structure, similar to a folder architecture in your computer system. The relationships between records are
pre-defined in a one to one manner, between 'parent and child' nodes. They require the user to pass a hierarchy in order to access
needed data. Due to limitations, such databases may be confined to specific uses. Discover more about hierarchical
 Network database
models also have a hierarchical structure. However, instead of using a single-parent tree hierarchy, this model supports many
to many relationships, as child tables can have more than one parent. See more on network databases
 NoSQL or non-relational databases
A popular alternative to relational databases, NoSQL databases take a variety of forms and allow you to store and manipulate
large amounts of unstructured and semi-structured data. Examples include key-value stores, document stores and graph
databases.
 A flat file database
A flat file database stores data in a plain text file, with each line of text typically holding one record.
Delimiters such as commas or tabs separate fields. A flat file database uses a simple structure and, unlike a
relational database, cannot contain multiple tables and relations.
 object-oriented database systems
in object-oriented databases, the information is represented as objects, with different types of relationships possible between
two or more objects. Such databases use an object-oriented programming language for development. Find out more about
object-
80
80
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce sites
 Developers begin to front RDBMS with memcache or
integrate other caching mechanisms within the
application (ie. Ehcache)
Types of DBMS
81
Scaling Up
81
 Issues with scaling up when the dataset is just too big
 RDBMS were not designed to be distributed
 Began to look at multi-node database solutions
 Known as „scaling out‟ or „horizontal scaling‟
 Different approaches include:
 Master-slave
 Sharding
82
Scaling RDBMS – Master/Slave
82
 Master-Slave
 All writes are written to the master. All
reads performed against the replicated
slave databases
 Critical reads may be incorrect as writes
may not have been propagated down
 Large data sets can pose problems as
master needs to duplicate data to slaves
83
Scaling RDBMS - Sharding
83
 Partition or sharding
 Scales well for both reads and writes
 Not transparent, application needs to be
partition-aware
 Can no longer have relationships/joins
across partitions
 Loss of referential integrity across shards
84
Other ways to scale RDBMS
84
 Multi-Master replication
 INSERT only, not UPDATES/DELETES
 No JOINs, thereby reducing query time
 This involves de-normalizing data
 In-memory databases
85
What is NoSQL?
85
 Stands for Not Only SQL
 RDBMS search mechanism is tuple wise
 When to use SQL: Schema, Consistency and transaction
 When to use NoSQL: Speed, Scalabilty,Flexibility
 Types of NoSql: Colum oriented, Document, Key value stored,Graph oriented,
 Class of non-relational data storage systems
 Usually do not require a fixed table schema nor do they use the concept of joins
 All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP
theorem)
 Example of the user uses Samsung and iPhone ( if you want individual specific data
then SQL is preferred but if you want a data in bulk then you are preferring NoSQL
HBASE: Colum oriented
86
Why NoSQL?
86
 For data storage, an RDBMS cannot be the
be-all/end-all
 Just as there are different programming
languages, need to have other data
storage tools in the toolbox
 A NoSQL solution is more acceptable to a
client now than even a year ago
 Think about proposing a Ruby/Rails or
Groovy/Grails solution now versus a couple
of years ago
87
How did we get here?
87
 Explosion of social media sites (Facebook, Twitter)
with large data needs
 Rise of cloud-based solutions such as Amazon S3
(simple storage solution)
 Just as moving to dynamically-typed languages
(Ruby/Groovy), a shift to dynamically-typed data
with frequent schema changes
 Open-source community
88
Dynamo and BigTable
88
 Three major papers were the seeds of the
NoSQL movement
 BigTable (Google)
 Dynamo (Amazon)
 Gossip protocol (discovery and error
detection)
 Distributed key-value data store
 Eventual consistency
 CAP Theorem (discuss in a sec ..)
89
The Perfect Storm
89
 Large datasets, acceptance of alternatives, and
dynamically-typed data has come together in a
perfect storm
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be
rivaled by the current list of NoSQL offerings
90
CAP Theorem
90
 Three properties of a system: consistency,
availability and partitions
 You can have at most two of these three
properties for any shared-data system
 To scale out, you have to partition. That
leaves either consistency or availability to
choose from
 In almost all cases, you would choose
availability over consistency
91
CAP Theorem
91
Consistency
Partition
tolerance
Availability
92
Consistency
92
 Two kinds of consistency:
 strong consistency – ACID(Atomicity Consistency
Isolation Durability)
 weak consistency – BASE(Basically Available Soft-
state Eventual consistency )
93
93
Databases and Querying (HBASE, Pig, and Hive)
94
HBASE
94
 As hadoop also can prosess the dataset then why do we require
HBASE?
Hadoop uses batch processing and sequential data access. Then for search of small data with specific
information we cannot go for one trillion tuples to be searched at once.
 IS NoSQL Column oriented/ Column family oriented
 But HABSE uses random data access permission. No need to
search dataset in a batch processing that Hadoop do.
 HA, Replication, Fault tolerance as it is installed on Hadoop it
provides all hadoop features
 When to use Hbase?
When small database size that is in MB of Gb then RDBMS with SQL. But if database is in TB/Peta Bytes the HBASE.
When you do not require Transaction, Rigid schema, Big queries, Complex joins, Speed, Scalability, Flexibility
Who are using HBASE?
Pinterest, Facebook, Adobe, Yahoo
95
HBASE
95
 A distributed data store that can scale horizontally to
1,000s of commodity servers and petabytes of indexed
storage.
 Designed to operate on top of the Hadoop distributed file
system (HDFS) or Kosmos File System (KFS, aka Cloudstore)
for scalability, fault tolerance, and high availability.
 Distributed storage
 Table-like in data structure
 multi-dimensional map
 High scalability
 High availability
 High performance
96
HBASE
96
 Started toward by Chad Walters and Jim
 2006.11
 Google releases paper on BigTable
 2007.2
 Initial HBase prototype created as Hadoop contrib.
 2007.10
 First useable HBase
 2008.1
 Hadoop become Apache top-level project and HBase becomes
subproject
 2008.10~
 HBase 0.18, 0.19 released
97
HBASE is not a…
97
 Tables have one primary index, the row key.
 No join operators.
 Limited atomicity and transaction support.
 HBase supports multiple batched mutations of single rows only.
 Data is unstructured and untyped.
 No accessed or manipulated via SQL.
 Programmatic access via Java, REST, or Thrift APIs.
 Scripting via JRuby.
 Scans and queries can select a subset of available columns, perhaps by using a
wildcard.
 There are three types of lookups:
 Fast lookup using row key and optional timestamp.
 Full table scan
 Range scan from region start to end.
98
HBASE Advantages
98
 No real indexes
 Automatic partitioning
 Scale linearly and automatically with new
nodes
 Commodity hardware
 Fault tolerance
 Batch processing
99
HBASE Data model
99
 Tables are sorted by Row
 Table schema only define it‟s column families .
 Each family consists of any number of columns
 Each column consists of any number of versions
 Columns only exist when inserted, NULLs are free.
 Columns within a family are sorted and stored together
 Everything except table names are byte[]
 (Row, Family: Column, Timestamp)  Value
Row key
Column Family
valueTimeStamp
100
Members
Master
Responsible for monitoring region servers
Load balancing for regions
Redirect client to correct region servers
The current SPOF
Region server slaves
Serving requests(Write/Read/Scan) of Client
Send HeartBeat to Master
Throughput and Region numbers are scalable by region
servers
100
HBASE Architecture
102
102
 Region Default size: 256 MB, once region full new region is created. Why not to have one region to store all data. degrade performance
 Region has write memory read memory
HBASE Architecture
103
103
HBASE Architecture
 Region Server handles multiple region , Each region has column family, Each reagion can have different table like
employee, students, prodcts.
 Region Default size: 256 MB, once region full new region is created. Why not to have one region to store all data.
degrade performance
 Data is written in Memstore/ Write ahead log: It‟s a file very region server maintain. For recovery purpose if data
loss.
 Memstore is write buffer. Default size is 100 MB. Once it is full it flush the data. Its segmented into very small
hfiles in KB and stored on disk.
 All these hfiles are zipped together by admin that is called as major compaction. It is generally done in non peak
hours.
 Few files zipped together by admin that is called as Minor compaction
 Region has write memory read memory
write memory
Read memory
104
104
HBASE Architecture
Hmaster Functions
 Create, delete, update operations
 Region Assignment in region server
 Reassessing regions after load balancing
 Manage region server failure (region server failure then recovery is also done by Hmaster)
105
HBASE
105
Zookeeper Functions
 Active/ Inactive Hmaster, Region server ping/ Heart beat signal to zookeeper.
 If active Hmaster crashes it won‟t send the heartbeat signal then zookeeper activate inactive Hmaster
 Root table and meta tables are chandelled by zoo keeper
 Complete cluster management task is under zoo keeper.
 Root is only one and meta tables can be more ( Which data where, which region. Mestore, block cache .
106
HBASE:ZooKeeper
106
 HBase depends on
ZooKeeper and by
default it manages a
ZooKeeper instance
as the authority on
cluster state
107
HBASE:ZooKeeper:Operation
107
The -ROOT-
table holds
the list
of .META.
table regions
The .META.
table holds the
list of all user-
space regions.
108
HBASE:ZooKeeper
108
Installation (1)
$ wget
http://ftp.twaren.net/Unix/Web/apache/hadoop/hbase/hba
se-0.20.2/hbase-0.20.2.tar.gz
$ sudo tar -zxvf hbase-*.tar.gz -C /opt/
$ sudo ln -sf /opt/hbase-0.20.2 /opt/hbase
$ sudo chown -R $USER:$USER /opt/hbase
$ sudo mkdir /var/hadoop/
$ sudo chmod 777 /var/hadoop
START Hadoop…
109
HBASE:ZooKeeper
109
Setup (1)
$ vim /opt/hbase/conf/hbase-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export HADOOP_CONF_DIR=/opt/hadoop/conf
export HBASE_HOME=/opt/hbase
export HBASE_LOG_DIR=/var/hadoop/hbase-logs
export HBASE_PID_DIR=/var/hadoop/hbase-pids
export HBASE_MANAGES_ZK=true
export HBASE_CLASSPATH=$HBASE_CLASSPATH:/opt/hadoop/conf
$ cd /opt/hbase/conf
$ cp /opt/hadoop/conf/core-site.xml ./
$ cp /opt/hadoop/conf/hdfs-site.xml ./
$ cp /opt/hadoop/conf/mapred-site.xml ./
110
HBASE:ZooKeeper
110
Setup (2)
<configuration>
<property>
<name> name </name>
<value> value </value>
</property>
</configuration>
Name value
hbase.rootdir hdfs://secuse.nchc.org.tw:9000/hbase
hbase.tmp.dir /var/hadoop/hbase-${user.name}
hbase.cluster.distributed true
hbase.zookeeper.property
.clientPort
2222
hbase.zookeeper.quorum Host1, Host2
hbase.zookeeper.property
.dataDir
/var/hadoop/hbase-data
111
HBASE
111
Startup & Stop
$ start-hbase.sh
$ stop-hbase.sh
112
HBASE
112
Testing (4)
$ hbase shell
> create 'test', 'data'
0 row(s) in 4.3066 seconds
> list
test
1 row(s) in 0.1485 seconds
> put 'test', 'row1', 'data:1', 'value1'
0 row(s) in 0.0454 seconds
> put 'test', 'row2', 'data:2', 'value2'
0 row(s) in 0.0035 seconds
> put 'test', 'row3', 'data:3', 'value3'
0 row(s) in 0.0090 seconds
> scan 'test'
ROW COLUMN+CELL
row1 column=data:1, timestamp=1240148026198, value=value1
row2 column=data:2, timestamp=1240148040035, value=value2
row3 column=data:3, timestamp=1240148047497, value=value3
3 row(s) in 0.0825 seconds
> disable 'test'
09/04/19 06:40:13 INFO client.HBaseAdmin: Disabled test
0 row(s) in 6.0426 seconds
> drop 'test'
09/04/19 06:40:17 INFO client.HBaseAdmin: Deleted test
0 row(s) in 0.0210 seconds
> list
0 row(s) in 2.0645 seconds
113
HBASE
113
Connecting to HBase
 Java client
 get(byte [] row, byte [] column, long timestamp, int versions);
 Non-Java clients
 Thrift server hosting HBase client instance
 Sample ruby, c++, & java (via thrift) clients
 REST server hosts HBase client
 TableInput/OutputFormat for MapReduce
 HBase as MR source or sink
 HBase Shell
 JRuby IRB with “DSL” to add get, scan, and admin
 ./bin/hbase shell YOUR_SCRIPT
114
HBASE
114
Thrift
 a software framework for scalable cross-language
services development.
 By facebook
 seamlessly between C++, Java, Python, PHP, and Ruby.
 This will start the server instance, by default on port
9090
 The other similar project “rest”
$ hbase-daemon.sh start thrift
$ hbase-daemon.sh stop thrift
115
Hive
115
 What is hive ?
 Its is Data warehouse package built on the top of Hadoop. Used for data visualization and analysis.
 User with SQL background uses hive
 No need of java Familiarizations
 History?
 FB Daily generate 78 TB/day, 1.5L queries per day, 300 M images /day
 Facebook was using Backup strategy and import was done using schedule job(Cron Job)
 ETL(Extract transform and load) using python
 Oracle DBMS , MS SQL server was being used which caused lo of problems
 So oracle was having SQL programmer so they developed Hive compatible with SQL which is called as HQL
 Features
 Tables can be created
 JDBC/ODBC drivers are available
 Data is only stored on Hadoop.
 Uses Hadoop for the fault tolerance . as Hadoop provide the fault tolerance for all like pIG, HIVE, HBASE
 Features
 Data Mining
 Document indexing ( Face book images indexing)
 Video indexing
 Predictive modeling
116
Hive
116
Need for High-Level Languages
 Hadoop is great for large-data processing!
 But writing Java programs for everything is verbose and slow
 Not everyone wants to (or can) write Java code
 Solution: develop higher-level data processing languages
 Hive: HQL is like SQL
 Pig: Pig Latin is a bit like Perl
117
Hive
117
Hive: data warehousing application in Hadoop
Query language is HQL, variant of SQL
Tables stored on HDFS as flat files
Developed by Facebook, now open source
Pig: large-scale data processing system
Scripts are written in Pig Latin, a dataflow language
Developed by Yahoo!, now open source
Roughly 1/3 of all Yahoo! internal jobs
Common idea:
Provide higher-level language to facilitate large-data
processing
Higher-level language “compiles down” to Hadoop jobs
118
Hive
118
Hive: Example
 Hive looks similar to an SQL database
 Relational join on two tables:
 Table of word counts from Shakespeare collection
 Table of word counts from the bible
Source: Material drawn from Cloudera training VM
SELECT s.word, s.freq, k.freq FROM shakespeare s
JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1
ORDER BY s.freq DESC LIMIT 10;
the 25848 62394
I 23031 8854
and 19671 38985
to 18038 13526
of 16700 34654
a 14170 8057
you 12702 2720
my 11297 4135
in 10797 12445
is 8882 6884
119
Hive
119
Hive: Behind the Scenes
SELECT s.word, s.freq, k.freq FROM shakespeare s
JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1
ORDER BY s.freq DESC LIMIT 10;
(TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF shakespeare s) (TOK_TABREF bible k) (= (. (TOK_TABLE_OR_COL s) word) (.
(TOK_TABLE_OR_COL k) word)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (.
(TOK_TABLE_OR_COL s) word)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL s) freq)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL k)
freq))) (TOK_WHERE (AND (>= (. (TOK_TABLE_OR_COL s) freq) 1) (>= (. (TOK_TABLE_OR_COL k) freq) 1))) (TOK_ORDERBY
(TOK_TABSORTCOLNAMEDESC (. (TOK_TABLE_OR_COL s) freq))) (TOK_LIMIT 10)))
(one or more of MapReduce jobs)
(Abstract Syntax Tree)
120
Hive: Behind the Scenes
120
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
s
TableScan
alias: s
Filter Operator
predicate:
expr: (freq >= 1)
type: boolean
Reduce Output Operator
key expressions:
expr: word
type: string
sort order: +
Map-reduce partition columns:
expr: word
type: string
tag: 0
value expressions:
expr: freq
type: int
expr: word
type: string
k
TableScan
alias: k
Filter Operator
predicate:
expr: (freq >= 1)
type: boolean
Reduce Output Operator
key expressions:
expr: word
type: string
sort order: +
Map-reduce partition columns:
expr: word
type: string
tag: 1
value expressions:
expr: freq
type: int
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE._col0} {VALUE._col1}
1 {VALUE._col0}
outputColumnNames: _col0, _col1, _col2
Filter Operator
predicate:
expr: ((_col0 >= 1) and (_col2 >= 1))
type: boolean
Select Operator
expressions:
expr: _col1
type: string
expr: _col0
type: int
expr: _col2
type: int
outputColumnNames: _col0, _col1, _col2
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
hdfs://localhost:8022/tmp/hive-training/364214370/10002
Reduce Output Operator
key expressions:
expr: _col1
type: int
sort order: -
tag: -1
value expressions:
expr: _col0
type: string
expr: _col1
type: int
expr: _col2
type: int
Reduce Operator Tree:
Extract
Limit
File Output Operator
compressed: false
GlobalTableId: 0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Stage: Stage-0
Fetch Operator
limit: 10
121
Hive
121
Example Data Analysis Task
user url time
Amy www.cnn.com 8:00
Amy www.crap.com 8:05
Amy www.myblog.com 10:00
Amy www.flickr.com 10:05
Fred cnn.com/index.htm 12:00
url pagerank
www.cnn.com 0.9
www.flickr.com 0.9
www.myblog.com 0.7
www.crap.com 0.2
Find users who tend to visit “good” pages.
PagesVisits
...
...
Pig Slides adapted from Olston et al.
122
Hive
122
System-Level Dataflow
. . . . . .
Visits
Pages
...
... join by url
the answer
loadload
canonicalize
compute average pagerank
filter
group by user
Pig Slides adapted from Olston et al.
123
Hive:MapReduce Code
123
i m p o r t j a v a . i o . I O E x c e p t i o n ;
i m p o r t j a v a . u t i l . A r r a y L i s t ;
i m p o r t j a v a . u t i l . I t e r a t o r ;
i m p o r t j a v a . u t i l . L i s t ;
i m p o r t o r g . a p a c h e . h a d o o p . f s . P a t h ;
i m p o r t o r g . a p a c h e . h a d o o p . i o . L o n g W r i t a b l e ;
i m p o r t o r g . a p a c h e . h a d o o p . i o . T e x t ;
i m p o r t o r g . a p a c h e . h a d o o p . i o . W r i t a b l e ;
im p o r t o r g . a p a c h e . h a d o o p . i o . W r i t a b l e C o m p a r a b l e ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . F i l e I n p u t F o r m a t ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . F i l e O u t p u t F o r m a t ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . J o b C o n f ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . K e y V a l u e T e x t I n p u t F o r m a t ;
i m p o r t o r g . ap a c h e . h a d o o p . m a p r e d . M a p p e r ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . M a p R e d u c e B a s e ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . O u t p u t C o l l e c t o r ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e c o r d R e a d e r ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e d u c e r ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e p o r t e r ;
i m po r t o r g . a p a c h e . h a d o o p . m a p r e d . S e q u e n c e F i l e I n p u t F o r m a t ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . S e q u e n c e F i l e O u t p u t F o r m a t ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . T e x t I n p u t F o r m a t ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . j o b c o n t r o l . J o b ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . j o b c o n t r o l . J o b Co n t r o l ;
i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . l i b . I d e n t i t y M a p p e r ;
p u b l i c c l a s s M R E x a m p l e {
p u b l i c s t a t i c c l a s s L o a d P a g e s e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s M a p p e r < L o n g W r i t a b l e , T e x t , T e x t , T e x t > {
p u b l i c v o i d m a p ( L o n g W r i t a b l e k , T e x t v a l ,
O u t p u t C o l l e c t o r < T e x t , T e x t > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / P u l l t h e k e y o u t
S t r i n g l i n e = v a l . t o S t r i n g ( ) ;
i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ;
S t r i n g k e y = l i n e . s u bs t r i n g ( 0 , f i r s t C o m m a ) ;
S t r i n g v a l u e = l i n e . s u b s t r i n g ( f i r s t C o m m a + 1 ) ;
T e x t o u t K e y = n e w T e x t ( k e y ) ;
/ / P r e p e n d a n i n d e x t o t h e v a l u e s o w e k n o w w h i c h f i l e
/ / i t c a m e f r o m .
T e x t o u t V a l = n e w T e x t ( " 1" + v a l u e ) ;
o c . c o l l e c t ( o u t K e y , o u t V a l ) ;
}
}
p u b l i c s t a t i c c l a s s L o a d A n d F i l t e r U s e r s e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s M a p p e r < L o n g W r i t a b l e , T e x t , T e x t , T e x t > {
p u b l i c v o i d m a p ( L o n g W r i t a b l e k , T e x t v a l ,
O u t p u t C o l l e c t o r < T e x t , T e x t > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / P u l l t h e k e y o u t
S t r i n g l i n e = v a l . t o S t r i n g ( ) ;
i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ;
S t r i n g v a l u e = l i n e . s u b s t r i n g (f i r s t C o m m a + 1 ) ;
i n t a g e = I n t e g e r . p a r s e I n t ( v a l u e ) ;
i f ( a g e < 1 8 | | a g e > 2 5 ) r e t u r n ;
S t r i n g k e y = l i n e . s u b s t r i n g ( 0 , f i r s t C o m m a ) ;
T e x t o u t K e y = n e w T e x t ( k e y ) ;
/ / P r e p e n d a n i n d e x t o t h e v a l u e s o we k n o w w h i c h f i l e
/ / i t c a m e f r o m .
T e x t o u t V a l = n e w T e x t ( " 2 " + v a l u e ) ;
o c . c o l l e c t ( o u t K e y , o u t V a l ) ;
}
}
p u b l i c s t a t i c c l a s s J o i n e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s R e d u c e r < T e x t , T e x t , T e x t , T e x t > {
p u b l i c v o i d r e d u c e ( T e x t k e y ,
I t e r a t o r < T e x t > i t e r ,
O u t p u t C o l l e c t o r < T e x t , T e x t > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / F o r e a c h v a l u e , f i g u r e o u t w h i c h f i l e i t ' s f r o m a n d
s t o r e i t
/ / a c c o r d i n g l y .
L i s t < S t r i n g > f i r s t = n e w A r r a y L i s t < S t r i n g > ( ) ;
L i s t < S t r i n g > s e c o n d = n e w A r r a y L i s t < S t r i n g > ( ) ;
w h i l e ( i t e r . h a s N e x t ( ) ) {
T e x t t = i t e r . n e x t ( ) ;
S t r i n g v a l u e = t . t oS t r i n g ( ) ;
i f ( v a l u e . c h a r A t ( 0 ) = = ' 1 ' )
f i r s t . a d d ( v a l u e . s u b s t r i n g ( 1 ) ) ;
e l s e s e c o n d . a d d ( v a l u e . s u b s t r i n g ( 1 ) ) ;
r e p o r t e r . s e t S t a t u s ( " O K " ) ;
}
/ / D o t h e c r o s s p r o d u c t a n d c o l l e c t t h e v a l u e s
f o r ( S t r i n g s 1 : f i r s t ) {
f o r ( S t r i n g s 2 : s e c o n d ) {
S t r i n g o u t v a l = k e y + " , " + s 1 + " , " + s 2 ;
o c . c o l l e c t ( n u l l , n e w T e x t ( o u t v a l ) ) ;
r e p o r t e r . s e t S t a t u s ( " O K " ) ;
}
}
}
}
p u b l i c s t a t i c c l a s s L o a d J o i n e d e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s M a p p e r < T e x t , T e x t , T e x t , L o n g W r i t a b l e > {
p u b l i c v o i d m a p (
T e x t k ,
T e x t v a l ,
O u t p u t C o l l ec t o r < T e x t , L o n g W r i t a b l e > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / F i n d t h e u r l
S t r i n g l i n e = v a l . t o S t r i n g ( ) ;
i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ;
i n t s e c o n d C o m m a = l i n e . i n d e x O f ( ' , ' , f i r s tC o m m a ) ;
S t r i n g k e y = l i n e . s u b s t r i n g ( f i r s t C o m m a , s e c o n d C o m m a ) ;
/ / d r o p t h e r e s t o f t h e r e c o r d , I d o n ' t n e e d i t a n y m o r e ,
/ / j u s t p a s s a 1 f o r t h e c o m b i n e r / r e d u c e r t o s u m i n s t e a d .
T e x t o u t K e y = n e w T e x t ( k e y ) ;
o c . c o l l e c t ( o u t K e y , n e w L o n g W r i t a b l e ( 1 L ) ) ;
}
}
p u b l i c s t a t i c c l a s s R e d u c e U r l s e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s R e d u c e r < T e x t , L o n g W r i t a b l e , W r i t a b l e C o m p a r a b l e ,
W r i t a b l e > {
p u b l i c v o i d r e d u c e (
T e x t k ey ,
I t e r a t o r < L o n g W r i t a b l e > i t e r ,
O u t p u t C o l l e c t o r < W r i t a b l e C o m p a r a b l e , W r i t a b l e > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / A d d u p a l l t h e v a l u e s w e s e e
l o n g s u m = 0 ;
w hi l e ( i t e r . h a s N e x t ( ) ) {
s u m + = i t e r . n e x t ( ) . g e t ( ) ;
r e p o r t e r . s e t S t a t u s ( " O K " ) ;
}
o c . c o l l e c t ( k e y , n e w L o n g W r i t a b l e ( s u m ) ) ;
}
}
p u b l i c s t a t i c c l a s s L o a d C l i c k s e x t e n d s M a p R e d u c e B a s e
im p l e m e n t s M a p p e r < W r i t a b l e C o m p a r a b l e , W r i t a b l e , L o n g W r i t a b l e ,
T e x t > {
p u b l i c v o i d m a p (
W r i t a b l e C o m p a r a b l e k e y ,
W r i t a b l e v a l ,
O u t p u t C o l l e c t o r < L o n g W r i t a b l e , T e x t > o c ,
R e p o r t e r r e p o r t e r )t h r o w s I O E x c e p t i o n {
o c . c o l l e c t ( ( L o n g W r i t a b l e ) v a l , ( T e x t ) k e y ) ;
}
}
p u b l i c s t a t i c c l a s s L i m i t C l i c k s e x t e n d s M a p R e d u c e B a s e
i m p l e m e n t s R e d u c e r < L o n g W r i t a b l e , T e x t , L o n g W r i t a b l e , T e x t > {
i n t c o u n t = 0 ;
p u b l i cv o i d r e d u c e (
L o n g W r i t a b l e k e y ,
I t e r a t o r < T e x t > i t e r ,
O u t p u t C o l l e c t o r < L o n g W r i t a b l e , T e x t > o c ,
R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n {
/ / O n l y o u t p u t t h e f i r s t 1 0 0 r e c o r d s
w h i l e ( c o u n t< 1 0 0 & & i t e r . h a s N e x t ( ) ) {
o c . c o l l e c t ( k e y , i t e r . n e x t ( ) ) ;
c o u n t + + ;
}
}
}
p u b l i c s t a t i c v o i d m a i n ( S t r i n g [ ] a r g s ) t h r o w s I O E x c e p t i o n {
J o b C o n f l p = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ;
l p . s et J o b N a m e ( " L o a d P a g e s " ) ;
l p . s e t I n p u t F o r m a t ( T e x t I n p u t F o r m a t . c l a s s ) ;
l p . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ;
l p . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ;
l p . s e t M a p p e r C l a s s ( L o a d P a g e s . c l a s s ) ;
F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( l p , n e w
P a t h ( " /u s e r / g a t e s / p a g e s " ) ) ;
F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( l p ,
n e w P a t h ( " / u s e r / g a t e s / t m p / i n d e x e d _ p a g e s " ) ) ;
l p . s e t N u m R e d u c e T a s k s ( 0 ) ;
J o b l o a d P a g e s = n e w J o b ( l p ) ;
J o b C o n f l f u = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ;
l f u . se t J o b N a m e ( " L o a d a n d F i l t e r U s e r s " ) ;
l f u . s e t I n p u t F o r m a t ( T e x t I n p u t F o r m a t . c l a s s ) ;
l f u . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ;
l f u . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ;
l f u . s e t M a p p e r C l a s s ( L o a d A n d F i l t e r U s e r s . c l a s s ) ;
F i l e I n p u t F o r m a t . a d dI n p u t P a t h ( l f u , n e w
P a t h ( " / u s e r / g a t e s / u s e r s " ) ) ;
F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( l f u ,
n e w P a t h ( " / u s e r / g a t e s / t m p / f i l t e r e d _ u s e r s " ) ) ;
l f u . s e t N u m R e d u c e T a s k s ( 0 ) ;
J o b l o a d U s e r s = n e w J o b ( l f u ) ;
J o b C o n f j o i n = n e w J o b C o n f (M R E x a m p l e . c l a s s ) ;
j o i n . s e t J o b N a m e ( " J o i n U s e r s a n d P a g e s " ) ;
j o i n . s e t I n p u t F o r m a t ( K e y V a l u e T e x t I n p u t F o r m a t . c l a s s ) ;
j o i n . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ;
j o i n . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ;
j o i n . s e t M a p p e r C l a s s ( I d e n t i t y M a pp e r . c l a s s ) ;
j o i n . s e t R e d u c e r C l a s s ( J o i n . c l a s s ) ;
F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( j o i n , n e w
P a t h ( " / u s e r / g a t e s / t m p / i n d e x e d _ p a g e s " ) ) ;
F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( j o i n , n e w
P a t h ( " / u s e r / g a t e s / t m p / f i l t e r e d _ u s e r s " ) ) ;
F i l e O u t p u t F o r m a t . s et O u t p u t P a t h ( j o i n , n e w
P a t h ( " / u s e r / g a t e s / t m p / j o i n e d " ) ) ;
j o i n . s e t N u m R e d u c e T a s k s ( 5 0 ) ;
J o b j o i n J o b = n e w J o b ( j o i n ) ;
j o i n J o b . a d d D e p e n d i n g J o b ( l o a d P a g e s ) ;
j o i n J o b . a d d D e p e n d i n g J o b ( l o a d U s e r s ) ;
J o b C o n f g r o u p = n e w J o b C o n f ( M R Ex a m p l e . c l a s s ) ;
g r o u p . s e t J o b N a m e ( " G r o u p U R L s " ) ;
g r o u p . s e t I n p u t F o r m a t ( K e y V a l u e T e x t I n p u t F o r m a t . c l a s s ) ;
g r o u p . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ;
g r o u p . s e t O u t p u t V a l u e C l a s s ( L o n g W r i t a b l e . c l a s s ) ;
g r o u p . s e t O u t p u t F o r m a t ( S e q u e n c e F il e O u t p u t F o r m a t . c l a s s ) ;
g r o u p . s e t M a p p e r C l a s s ( L o a d J o i n e d . c l a s s ) ;
g r o u p . s e t C o m b i n e r C l a s s ( R e d u c e U r l s . c l a s s ) ;
g r o u p . s e t R e d u c e r C l a s s ( R e d u c e U r l s . c l a s s ) ;
F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( g r o u p , n e w
P a t h ( " / u s e r / g a t e s / t m p / j o i n e d " ) ) ;
F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( g r o u p , n e w
P a t h ( " / u s e r / g a t e s / t m p / g r o u p e d " ) ) ;
g r o u p . s e t N u m R e d u c e T a s k s ( 5 0 ) ;
J o b g r o u p J o b = n e w J o b ( g r o u p ) ;
g r o u p J o b . a d d D e p e n d i n g J o b ( j o i n J o b ) ;
J o b C o n f t o p 1 0 0 = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ;
t o p 1 0 0 . s e t J o b N a m e ( " T o p 1 0 0 s i t e s " ) ;
t o p 1 0 0 . s e t I n p u t F o r m a t ( S e q u e n c e F i l e I n p u t F o r m a t . c l a s s ) ;
t o p 1 0 0 . s e t O u t p u t K e y C l a s s ( L o n g W r i t a b l e . c l a s s ) ;
t o p 1 0 0 . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ;
t o p 1 0 0 . s e t O u t p u t F o r m a t ( S e q u e n c e F i l e O u t p u t Fo r m a t . c l a s s ) ;
t o p 1 0 0 . s e t M a p p e r C l a s s ( L o a d C l i c k s . c l a s s ) ;
t o p 1 0 0 . s e t C o m b i n e r C l a s s ( L i m i t C l i c k s . c l a s s ) ;
t o p 1 0 0 . s e t R e d u c e r C l a s s ( L i m i t C l i c k s . c l a s s ) ;
F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( t o p 1 0 0 , n e w
P a t h ( " / u s e r / g a t e s / t m p / g r o u p e d " ) ) ;
F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( t o p 1 0 0 , n e w
P a t h ( " / u s e r / g a t e s / t o p 1 0 0 s i t e s f o r u s e r s 1 8 t o 2 5 " ) ) ;
t o p 1 0 0 . s e t N u m R e d u c e T a s k s ( 1 ) ;
J o b l i m i t = n e w J o b ( t o p 1 0 0 ) ;
l i m i t . a d d D e p e n d i n g J o b ( g r o u p J o b ) ;
J o b C o n t r o l j c = n e w J o b C o n t r o l ( " F i n d t o p1 0 0 s i t e s f o r u s e r s
1 8 t o 2 5 " ) ;
j c . a d d J o b ( l o a d P a g e s ) ;
j c . a d d J o b ( l o a d U s e r s ) ;
j c . a d d J o b ( j o i n J o b ) ;
j c . a d d J o b ( g r o u p J o b ) ;
j c . a d d J o b ( l i m i t ) ;
j c . r u n ( ) ;
}
}
124
Hive
124
Data Flows
 Moving HBase data
HBase Prod
Imported in parallel into
HBase MRCopyTable MR job
Read in parallel
* HBase replication currently only works for a single slave cluster, in our case HBase
replicates to a backup cluster.
125
Hive Architecture
125
 Command line interface/ Hive web interface/ Thrift server are used to access hive or firing query
 If you want to access hive on other machine you can access using thrift sever using C, C++, Java that is
cross language interface.
 Metadata of tables and hive meta data is stored in metastore .
 Meta store types: Embeded metastore (Driver- MS(Meta store)-Derby), Local Metastore (Driver- My SQL),
Remote meatstore (Driver- My SQL),
 Its is Data warehouse package built on the top of Hadoop. Used for data visualization and analysis.
 User with SQL background uses hive
 No need of java Familiarizations
 Limitations: Donot use for row level updates, Latency of Hive queries are high, Not designed for OLTP(insert update delete)
126
Hive VS Hadoop
126
127
Hive VS RDBMS
127
128
Hive Data Model
128
Database: path: user/warehouse/hive its folder is created :
Table: Table created employee then folder is created in database folder:
Partition: Date wise portioning is created under table folder , So the searching becomes faster.
Buckets or clusters:Similar data allocated together depending on the has value.
Types of Tables: Internal and external
129
PIG
129
Developed by:
 Abstraction of for large datasets.
Why pig?
No need of Java.
 Code reducibility
Multi query approach
Provides nested data types
130
PIG
130
Pig Latin Script
Visits = load ‘/data/visits’ as (user, url, time);
Visits = foreach Visits generate user, Canonicalize(url), time;
Pages = load ‘/data/pages’ as (url, pagerank);
VP = join Visits by url, Pages by url;
UserVisits = group VP by user;
UserPageranks = foreach UserVisits generate user, AVG(VP.pagerank) as avgpr;
GoodUsers = filter UserPageranks by avgpr > ‘0.5’;
store GoodUsers into '/data/good_users';
131
PIG
131
Java vs. Pig Latin
0
20
40
60
80
100
120
140
160
180
Hadoop Pig
1/20 the lines of code
0
50
100
150
200
250
300
Hadoop PigMinutes
1/16 the development time
Performance on par with raw Hadoop!
132
PIG
132
Pig takes care of…
 Schema and type checking
 Translating into efficient physical dataflow
 (i.e., sequence of one or more MapReduce jobs)
 Exploiting data reduction opportunities
 (e.g., early partial aggregation via a combiner)
 Executing the system-level dataflow
 (i.e., running the MapReduce jobs)
 Tracking progress, errors, etc.
133
PIG
133
Integration
 Reasons to use Hive on HBase:
 A lot of data sitting in HBase due to its usage in a real-time
environment, but never used for analysis
 Give access to data in HBase usually only queried through
MapReduce to people that don‟t code (business analysts)
 When needing a more flexible storage solution, so that rows
can be updated live by either a Hive job or an application and
can be seen immediately to the other
 Reasons not to do it:
 Run SQL queries on HBase to answer live user requests (it‟s
still a MR job)
 Hoping to see interoperability with other SQL analytics systems
134
PIG
134
Integration
 How it works:
 Hive can use tables that already exist in HBase or manage
its own ones, but they still all reside in the same HBase
instance
HBase
Hive table definitions
Points to an existing table
Manages this table from Hive
135
PIG
135
Integration
 How it works:
 When using an already existing table, defined as EXTERNAL, you can create multiple
Hive tables that point to it
HBaseHive table definitions
Points to some column
Points to other
columns,
different names
136
PIG
136
Integration
 How it works:
 Columns are mapped however you want, changing names and giving types
HBase tableHive table definition
name STRING
age INT
siblings MAP<string, string>
d:fullname
d:age
d:address
f:
persons people
137
PIG
137
Integration
 Drawbacks (that can be fixed with brain juice):
 Binary keys and values (like integers represented on 4 bytes) aren‟t
supported since Hive prefers string representations, HIVE-1634
 Compound row keys aren‟t supported, there‟s no way of using multiple
parts of a key as different “fields”
 This means that concatenated binary row keys are completely
unusable, which is what people often use for HBase
 Filters are done at Hive level instead of being pushed to the region
servers
 Partitions aren‟t supported
138
PIG
138
Data Flows
 Data is being generated all over the place:
 Apache logs
 Application logs
 MySQL clusters
 HBase clusters
139
PIG
139
Data Flows
 Moving application log files
Wild log file
Read nightly
Transforms format Dumped into HDFS
Tail’ed
continuou
sly
Inserted into
HBaseParses into HBase format
140
PIG
140
Data Flows
 Moving MySQL data
MySQL
Dumped
nightly with
CSV import
HDFS
Tungsten
replicator
Inserted into
HBaseParses into HBase format
141
PIG
141
Use Cases
 Front-end engineers
 They need some statistics regarding their latest product
 Research engineers
 Ad-hoc queries on user data to validate some assumptions
 Generating statistics about recommendation quality
 Business analysts
 Statistics on growth and activity
 Effectiveness of advertiser campaigns
 Users‟ behavior VS past activities to determine, for example,
why certain groups react better to email communications
 Ad-hoc queries on stumbling behaviors of slices of the user
base
142
PIG
142
Use Cases
 Using a simple table in HBase:
CREATE EXTERNAL TABLE blocked_users(
userid INT,
blockee INT,
blocker INT,
created BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,f:blockee,f:blocker,f:created")
TBLPROPERTIES("hbase.table.name" = "m2h_repl-userdb.stumble.blocked_users");
HBase is a special case here, it has a unique row key map with :key
Not all the columns in the table need to be mapped
143
PIG
143
Use Cases
 Using a complicated table in HBase:
CREATE EXTERNAL TABLE ratings_hbase(
userid INT,
created BIGINT,
urlid INT,
rating INT,
topic INT,
modified BIGINT)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key#b@0,:key#b@1,:key#b@2,default:rating#b,default:topic#b,default:modified#b")
TBLPROPERTIES("hbase.table.name" = "ratings_by_userid");
#b means binary, @ means position in composite key (SU-specific hack)
144
PIG Architecture
144
 Grunt Shell / pig server : If you want access PIG using any
program then you will be using pig server.
 Then the code goes to parser for syntax change. If it is error free
then creates a logical plan that is DAG( Directed acyclic graph).
 DAG which has logical operators. This logical plan is forwarded to
optimizer to optimize.
 After this optimized code is sent to compiler.
 Compiler output is a series of map reduce job which is given to
execution engine.
 Execution engine takes care to execute the job on map reduce
145
Data Model
145
 A table in Bigtable is a sparse, distributed, persistent
multidimensional sorted map
 Map indexed by a row key, column key, and a timestamp
 (row:string, column:string, time:int64)  uninterpreted
byte array
 Supports lookups, inserts, deletes
 Single row transactions only
Image Source: Chang et al., OSDI 2006
146
Rows and Columns
146
 Rows maintained in sorted lexicographic order
 Applications can exploit this property for efficient row
scans
 Row ranges dynamically partitioned into tablets
 Columns grouped into column families
 Column key = family:qualifier
 Column families provide locality hints
 Unbounded number of columns
147
Bigtable Building Blocks
147
 GFS
 Chubby
 SSTable
148
SSTable
148
 Basic building block of Bigtable
 Persistent, ordered immutable map from keys to values
 Stored in GFS
 Sequence of blocks on disk plus an index for block lookup
 Can be completely mapped into memory
 Supported operations:
 Look up value associated with key
 Iterate key/value pairs within a key range
Index
64K
block
64K
block
64K
block
SSTable
Source: Graphic from slides by Erik Paulson
149
Tablet
149
 Dynamically partitioned range of rows
 Built from multiple SSTables
Index
64K
block
64K
block
64K
block
SSTable
Index
64K
block
64K
block
64K
block
SSTable
Tablet Start:aardvark End:apple
Source: Graphic from slides by Erik Paulson
150
Table
150
 Multiple tablets make up the table
 SSTables can be shared
SSTable SSTable SSTable SSTable
Tablet
aardvark apple
Tablet
apple_two_E boat
Source: Graphic from slides by Erik Paulson
151
Architecture
151
 Client library
 Single master server
 Tablet servers
152
Bigtable Master
152
 Assigns tablets to tablet servers
 Detects addition and expiration of tablet servers
 Balances tablet server load
 Handles garbage collection
 Handles schema changes
153
Bigtable Tablet Servers
153
 Each tablet server manages a set of tablets
 Typically between ten to a thousand tablets
 Each 100-200 MB by default
 Handles read and write requests to the tablets
 Splits tablets that have grown too large
154
Tablet Location
154
Upon discovery, clients cache tablet locations
Image Source: Chang et al., OSDI 2006
155
Tablet Assignment
155
 Master keeps track of:
 Set of live tablet servers
 Assignment of tablets to tablet servers
 Unassigned tablets
 Each tablet is assigned to one tablet server at a time
 Tablet server maintains an exclusive lock on a file in Chubby
 Master monitors tablet servers and handles assignment
 Changes to tablet structure
 Table creation/deletion (master initiated)
 Tablet merging (master initiated)
 Tablet splitting (tablet server initiated)
156
Tablet Serving
156
Image Source: Chang et al., OSDI 2006
“Log Structured Merge Trees”
157
Compactions
157
 Minor compaction
 Converts the memtable into an SSTable
 Reduces memory usage and log traffic on restart
 Merging compaction
 Reads the contents of a few SSTables and the memtable,
and writes out a new SSTable
 Reduces number of SSTables
 Major compaction
 Merging compaction that results in only one SSTable
 No deletion records, only live data
158
Bigtable Applications
158
 Data source and data sink for MapReduce
 Google‟s web crawl
 Google Earth
 Google Analytics
159
Cassandra
159
Why Cassandra?
 Lots of data
 Copies of messages, reverse indices of messages, per user
data.
 Many incoming requests resulting in a lot of random
reads and random writes.
 No existing production ready solutions in the market
meet these requirements.
160
Cassandra
160
Design Goals
 High availability
 Eventual consistency
 trade-off strong consistency in favor of high
availability
 Incremental scalability
 Optimistic Replication
 “Knobs” to tune tradeoffs between consistency,
durability and latency
 Low total cost of ownership
 Minimal administration
161
Cassandra
161
innovation at scale
 google bigtable (2006)
 consistency model: strong
 data model: sparse map
 clones: hbase, hypertable
 amazon dynamo (2007)
 O(1) dht
 consistency model: client tune-able
 clones: riak, voldemort
cassandra ~= bigtable + dynamo
162
Cassandra
162
proven
 The Facebook stores 150TB of data on 150 nodes
web 2.0
 used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo,
Ooyala, OpenX, others
163
Cassandra
163
Data Model
KEY
ColumnFamily1 Name : MailList Type : Simple Sort : Name
Name : tid1
Value : <Binary>
TimeStamp : t1
Name : tid2
Value : <Binary>
TimeStamp : t2
Name : tid3
Value : <Binary>
TimeStamp : t3
Name : tid4
Value : <Binary>
TimeStamp : t4
ColumnFamily2 Name : WordList Type : Super Sort : Time
Name : aloha
ColumnFamily3 Name : System Type : Super Sort : Name
Name : hint1
<Column List>
Name : hint2
<Column List>
Name : hint3
<Column List>
Name : hint4
<Column List>
C1
V1
T1
C2
V2
T2
C3
V3
T3
C4
V4
T4
Name : dude
C2
V2
T2
C6
V6
T6
Column Families
are declared
upfront
Columns are
added and
modified
dynamically
SuperColumns
are added and
modified
dynamically
Columns are
added and
modified
dynamically
164
Cassandra
164
Write Operations
 A client issues a write request to a random node in the
Cassandra cluster.
 The “Partitioner” determines the nodes responsible for
the data.
 Locally, write operations are logged and then applied to
an in-memory version.
 Commit log is stored on a dedicated disk local to the
machine.
165
Cassandra
165
write op
166
Cassandra
166
Write cont‟d
Key (CF1 , CF2 , CF3)
Commit Log
Binary serialized
Key ( CF1 , CF2 , CF3 )
Memtable ( CF1)
Memtable ( CF2)
Memtable ( CF2)
• Data size
• Number of Objects
• Lifetime
Dedicated Disk
<Key name><Size of key Data><Index of columns/supercolumns><
Serialized column family>
---
---
---
---
<Key name><Size of key Data><Index of columns/supercolumns><
Serialized column family>
BLOCK Index <Key Name> Offset, <Key Name> Offset
K128 Offset
K256 Offset
K384 Offset
Bloom Filter
(Index in memory)
Data file on disk
167
Cassandra
167
Write Properties
 No locks in the critical path
 Sequential disk access
 Behaves like a write back Cache
 Append support without read ahead
 Atomicity guarantee for a key
“Always Writable”
accept writes during failure
scenarios
168
Cassandra
168
Read
Query
Closest replica
Cassandra Cluster
Replica A
Result
Replica B Replica C
Digest Query
Digest Response Digest Response
Result
Client
Read repair if
digests differ
169
Cassandra
169
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)Partitioning
And Replication
170
Cassandra
170
Cluster Membership and Failure
Detection
 Gossip protocol is used for cluster membership.
 Super lightweight with mathematically provable properties.
 State disseminated in O(logN) rounds where N is the number
of nodes in the cluster.
 Every T seconds each member increments its heartbeat
counter and selects one other member to send its list to.
 A member merges the list with its own list .
171
Cassandra
171
172
Cassandra
172
173
Cassandra
173
174
Cassandra
174
175
Cassandra
175
Accrual Failure Detector
 Valuable for system management, replication, load
balancing etc.
 Defined as a failure detector that outputs a value,
PHI, associated with each process.
 Also known as Adaptive Failure detectors - designed
to adapt to changing network conditions.
 The value output, PHI, represents a suspicion level.
 Applications set an appropriate threshold, trigger
suspicions and perform appropriate actions.
 In Cassandra the average time taken to detect a
failure is 10-15 seconds with the PHI threshold set at
5.
176
Cassandra
176
Information Flow in the Implementation
177
Cassandra
177
Performance Benchmark
 Loading of data - limited by network bandwidth.
 Read performance for Inbox Search in production:
Search Interactions Term Search
Min 7.69 ms 7.78 ms
Median 15.69 ms 18.27 ms
Average 26.13 ms 44.41 ms
178
Cassandra
178
MySQL Comparison
 MySQL > 50 GB Data
Writes Average : ~300 ms
Reads Average : ~350 ms
 Cassandra > 50 GB Data
Writes Average : 0.12 ms
Reads Average : 15 ms
179
Cassandra
179
Lessons Learnt
 Add fancy features only when absolutely required.
 Many types of failures are possible.
 Big systems need proper systems-level monitoring.
 Value simple designs
180
Graph Databases
180
NEO4J (Graphbase)
• A graph is a collection nodes (things) and edges (relationships) that connect
pairs of nodes.
• Attach properties (key-value pairs) on nodes and relationships
•Relationships connect two nodes and both nodes and relationships can hold an
arbitrary amount of key-value pairs.
• A graph database can be thought of as a key-value store, with full support for
relationships.
• http://neo4j.org/
181
Graph Databases
181
NEO4J
182
Graph Databases
182
NEO4J
183
Graph Databases
183
NEO4J
184
Graph Databases
184
NEO4J
185
Graph Databases
185
NEO4J
186
Graph Databases
186
NEO4J
Properties
187
History of the World, Part 1
187
NEO4J Features
• Dual license: open source and commercial
•Well suited for many web use cases such as tagging, metadata annotations,
social networks, wikis and other network-shaped or hierarchical data sets
• Intuitive graph-oriented model for data representation. Instead of static and
rigid tables, rows and columns, you work with a flexible graph network
consisting of nodes, relationships and properties.
• Neo4j offers performance improvements on the order of 1000x
or more compared to relational DBs.
• A disk-based, native storage manager completely optimized for storing
graph structures for maximum performance and scalability
• Massive scalability. Neo4j can handle graphs of several billion
nodes/relationships/properties on a single machine and can be sharded to
scale out across multiple machines
•Fully transactional like a real database
•Neo4j traverses depths of 1000 levels and beyond at millisecond speed.
(many orders of magnitude faster than relational systems)
Thank You

Contenu connexe

Tendances

Parallel Processing
Parallel ProcessingParallel Processing
Parallel ProcessingRTigger
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classificationNarayan Kandel
 
Unit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex dataUnit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex datavishal choudhary
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RLun-Hsien Chang
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Dynamic Memory Allocation, Pointers and Functions, Pointers and Structures
Dynamic Memory Allocation, Pointers and Functions, Pointers and StructuresDynamic Memory Allocation, Pointers and Functions, Pointers and Structures
Dynamic Memory Allocation, Pointers and Functions, Pointers and StructuresSelvaraj Seerangan
 
previous question solve of operating system.
previous question solve of operating system.previous question solve of operating system.
previous question solve of operating system.Ibrahim Khalil Shakik
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in RLun-Hsien Chang
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computingHeman Pathak
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithmK Hari Shankar
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: OverviewGeoffrey Fox
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik TambekarPratik Tambekar
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data Jay Nagar
 
Data Structure with C
Data Structure with CData Structure with C
Data Structure with CSyed Mustafa
 

Tendances (20)

Parallel Processing
Parallel ProcessingParallel Processing
Parallel Processing
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Feng’s classification
Feng’s classificationFeng’s classification
Feng’s classification
 
Unit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex dataUnit 5-hive data types – primitive and complex data
Unit 5-hive data types – primitive and complex data
 
Tackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in RTackling repetitive tasks with serial or parallel programming in R
Tackling repetitive tasks with serial or parallel programming in R
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Dynamic Memory Allocation, Pointers and Functions, Pointers and Structures
Dynamic Memory Allocation, Pointers and Functions, Pointers and StructuresDynamic Memory Allocation, Pointers and Functions, Pointers and Structures
Dynamic Memory Allocation, Pointers and Functions, Pointers and Structures
 
previous question solve of operating system.
previous question solve of operating system.previous question solve of operating system.
previous question solve of operating system.
 
Manipulating string data with a pattern in R
Manipulating string data with  a pattern in RManipulating string data with  a pattern in R
Manipulating string data with a pattern in R
 
Datamining with R
Datamining with RDatamining with R
Datamining with R
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Replication
ReplicationReplication
Replication
 
Distributed System by Pratik Tambekar
Distributed System by Pratik TambekarDistributed System by Pratik Tambekar
Distributed System by Pratik Tambekar
 
map reduce Technic in big data
map reduce Technic in big data map reduce Technic in big data
map reduce Technic in big data
 
3 processes
3 processes3 processes
3 processes
 
Array Cont
Array ContArray Cont
Array Cont
 
Data Structure with C
Data Structure with CData Structure with C
Data Structure with C
 

Similaire à BIG DATA Session 7 8

Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
HFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolHFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolMatteo Dell'Amico
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...Govt.Engineering college, Idukki
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterIOSR Journals
 
OS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for HadoopOS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for HadoopMatteo Dell'Amico
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptxShimoFcis
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentationVu Thi Trang
 

Similaire à BIG DATA Session 7 8 (20)

Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Resource scheduling
Resource schedulingResource scheduling
Resource scheduling
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
HFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn ProtocolHFSP: the Hadoop Fair Sojourn Protocol
HFSP: the Hadoop Fair Sojourn Protocol
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...
Dache: A Data Aware Caching for Big-Data Applications Using the MapReduce Fra...
 
Survey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop ClusterSurvey on Job Schedulers in Hadoop Cluster
Survey on Job Schedulers in Hadoop Cluster
 
OS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for HadoopOS-Assisted Task Preemption for Hadoop
OS-Assisted Task Preemption for Hadoop
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptx
 
final report
final reportfinal report
final report
 
MapReduce presentation
MapReduce presentationMapReduce presentation
MapReduce presentation
 
H04502048051
H04502048051H04502048051
H04502048051
 

Plus de Infinity Tech Solutions

Database Management System-session 3-4-5
Database Management System-session 3-4-5Database Management System-session 3-4-5
Database Management System-session 3-4-5Infinity Tech Solutions
 
Main topic 3 problem solving and office automation
Main topic 3 problem solving and office automationMain topic 3 problem solving and office automation
Main topic 3 problem solving and office automationInfinity Tech Solutions
 
Computer memory, Types of programming languages
Computer memory, Types of programming languagesComputer memory, Types of programming languages
Computer memory, Types of programming languagesInfinity Tech Solutions
 
AI/ML/DL/BCT A Revolution in Maritime Sector
AI/ML/DL/BCT A Revolution in Maritime SectorAI/ML/DL/BCT A Revolution in Maritime Sector
AI/ML/DL/BCT A Revolution in Maritime SectorInfinity Tech Solutions
 
Programming with matlab session 5 looping
Programming with matlab session 5 loopingProgramming with matlab session 5 looping
Programming with matlab session 5 loopingInfinity Tech Solutions
 

Plus de Infinity Tech Solutions (20)

Database management system session 6
Database management system session 6Database management system session 6
Database management system session 6
 
Database management system session 5
Database management system session 5Database management system session 5
Database management system session 5
 
Database Management System-session 3-4-5
Database Management System-session 3-4-5Database Management System-session 3-4-5
Database Management System-session 3-4-5
 
Database Management System-session1-2
Database Management System-session1-2Database Management System-session1-2
Database Management System-session1-2
 
Main topic 3 problem solving and office automation
Main topic 3 problem solving and office automationMain topic 3 problem solving and office automation
Main topic 3 problem solving and office automation
 
Introduction to c programming
Introduction to c programmingIntroduction to c programming
Introduction to c programming
 
E commerce
E commerce E commerce
E commerce
 
E commerce
E commerceE commerce
E commerce
 
Bds session 13 14
Bds session 13 14Bds session 13 14
Bds session 13 14
 
Computer memory, Types of programming languages
Computer memory, Types of programming languagesComputer memory, Types of programming languages
Computer memory, Types of programming languages
 
Basic hardware familiarization
Basic hardware familiarizationBasic hardware familiarization
Basic hardware familiarization
 
User defined functions in matlab
User defined functions in  matlabUser defined functions in  matlab
User defined functions in matlab
 
Programming with matlab session 6
Programming with matlab session 6Programming with matlab session 6
Programming with matlab session 6
 
Programming with matlab session 3 notes
Programming with matlab session 3 notesProgramming with matlab session 3 notes
Programming with matlab session 3 notes
 
AI/ML/DL/BCT A Revolution in Maritime Sector
AI/ML/DL/BCT A Revolution in Maritime SectorAI/ML/DL/BCT A Revolution in Maritime Sector
AI/ML/DL/BCT A Revolution in Maritime Sector
 
Programming with matlab session 5 looping
Programming with matlab session 5 loopingProgramming with matlab session 5 looping
Programming with matlab session 5 looping
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
 
MS word
MS word MS word
MS word
 
DBMS CS 4-5
DBMS CS 4-5DBMS CS 4-5
DBMS CS 4-5
 
DBMS CS3
DBMS CS3DBMS CS3
DBMS CS3
 

Dernier

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 

Dernier (20)

Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 

BIG DATA Session 7 8

  • 1.
  • 2. Session Objectives Introduction Hadoop Distributed File System (HDFS) Scheduling in Hadoop (using YARN). Hadoop Ecosystem: Databases and Querying DFS and HDFS Summary
  • 7. File block and replication
  • 8. File block and replication
  • 9. 9 Why Scheduling?  Multiple “tasks” to schedule  The processes on a single-core OS  The tasks of a Hadoop job  The tasks of multiple Hadoop jobs  Limited resources that these tasks require  Processor(s)  Memory  (Less contentious) disk, network  Scheduling goals 1. Good throughput or response time for tasks (or jobs) 2. High utilization of resources
  • 10. 10 Single Processor Scheduling Task 1 10 Task 2 5 Task 3 3 Arrival Times  0 6 8 Processor Task Length Arrival 1 10 0 2 5 6 3 3 8 Which tasks run when? 10
  • 11. 11 FIFO Scheduling (First-In First-Out)/FCFS Task 1 Task 2 Task 3 Time  0 6 8 10 15 18 Processor Task Length Arrival 1 10 0 2 5 6 3 3 8 • Maintain tasks in a queue in order of arrival • When processor free, dequeue head and schedule it 11
  • 12. 12 FIFO/FCFS Performance  Average completion time may be high  For our example on previous slides,  Average completion time of FIFO/FCFS = (Task 1 + Task 2 + Task 3)/3 = (10+15+18)/3 = 43/3 = 14.33 12
  • 13. 13 STF Scheduling (Shortest Task First) Task 1Task 2Task 3 Time  0 3 8 18 Processor Task Length Arrival 1 10 0 2 5 0 3 3 0 • Maintain all tasks in a queue, in increasing order of running time • When processor free, dequeue head and schedule 13
  • 14. 14 STF Is Optimal!  Average completion of STF is the shortest among all scheduling approaches!  For our example on previous slides,  Average completion time of STF = (Task 1 + Task 2 + Task 3)/3 = (18+8+3)/3 = 29/3 = 9.66 (versus 14.33 for FIFO/FCFS)  In general, STF is a special case of priority scheduling  Instead of using time as priority, scheduler could use user-provided priority 14
  • 15. 15 Round-Robin Scheduling Time  0 6 8 Processor Task Length Arriv al 1 10 0 2 5 6 3 3 8 • Use a quantum (say 1 time unit) to run portion of task at queue head • Pre-empts processes by saving their state, and resuming later • After pre-empting, add to end of queue Task 1 15 (Task 3 done) … 15
  • 16. 16 Round-Robin vs. STF/FIFO  Round-Robin preferable for  Interactive applications  User needs quick responses from system  FIFO/STF preferable for Batch applications  User submits jobs, goes away, comes back to get result 16
  • 17. 17 Summary  Single processor scheduling algorithms  FIFO/FCFS  Shortest task first (optimal!)  Priority  Round-robin  Many other scheduling algorithms out there!  What about cloud scheduling?  Next! 17
  • 18. 18 Hadoop Scheduling  A Hadoop job consists of Map tasks and Reduce tasks  Only one job in entire cluster => it occupies cluster  Multiple customers with multiple jobs  Users/jobs = “tenants”  Multi-tenant system  => Need a way to schedule all these jobs (and their constituent tasks)  => Need to be fair across the different tenants  Hadoop YARN has two popular schedulers  Hadoop Capacity Scheduler  Hadoop Fair Scheduler 18
  • 19. 19 Hadoop Capacity Scheduler  Contains multiple queues  Each queue contains multiple jobs  Each queue guaranteed some portion of the cluster capacity E.g.,  Queue 1 is given 80% of cluster  Queue 2 is given 20% of cluster  Higher-priority jobs go to Queue 1  For jobs within same queue, FIFO typically used  Administrators can configure queues Source: http://hadoop.apache.org/docs/r2.3.0/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html 19
  • 20. 20 Elasticity in HCS  Administrators can configure each queue with limits  Soft limit: how much % of cluster is the queue guaranteed to occupy  (Optional) Hard limit: max % of cluster given to the queue  Elasticity  A queue allowed to occupy more of cluster if resources free  But if other queues below their capacity limit, now get full, need to give these other queues resources  Pre-emption not allowed!  Cannot stop a task part-way through  When reducing % cluster to a queue, wait until some tasks of that queue have finished 20
  • 21. 21 Other HCS Features  Queues can be hierarchical  May contain child sub-queues, which may contain child sub-queues, and so on  Child sub-queues can share resources equally  Scheduling can take memory requirements into account (memory specified by user) 21
  • 22. 22 Hadoop Fair Scheduler  Goal: all jobs get equal share of resources  When only one job present, occupies entire cluster  As other jobs arrive, each job given equal % of cluster  E.g., Each job might be given equal number of cluster-wide YARN containers  Each container == 1 task of job Source: http://hadoop.apache.org/docs/r1.2.1/fair_scheduler.html 22
  • 23. 23 Hadoop Fair Scheduler (2)  Divides cluster into pools  Typically one pool per user  Resources divided equally among pools  Gives each user fair share of cluster  Within each pool, can use either  Fair share scheduling, or  FIFO/FCFS  (Configurable) 23
  • 24. 24 Pre-emption in HFS  Some pools may have minimum shares  Minimum % of cluster that pool is guaranteed  When minimum share not met in a pool, for a while  Take resources away from other pools  By pre-empting jobs in those other pools  By killing the currently-running tasks of those jobs  Tasks can be re-started later  Ok since tasks are idempotent!  To kill, scheduler picks most-recently-started tasks  Minimizes wasted work 24
  • 25. 25 HFS Features  Can also set limits on  Number of concurrent jobs per user  Number of concurrent jobs per pool  Number of concurrent tasks per pool  Prevents cluster from being hogged by one user/job 25
  • 26. 26 Estimating Task Lengths  HCS/HFS use FIFO  May not be optimal (as we know!)  Why not use shortest-task-first instead? It‟s optimal (as we know!)  Challenge: Hard to know expected running time of task (before it‟s completed)  Solution: Estimate length of task  Some approaches  Within a job: Calculate running time of task as proportional to size of its input  Across tasks: Calculate running time of task in a given job as average of other tasks in that given job (weighted by input size)  Lots of recent research results in this area! 26
  • 27. 27 Summary  Hadoop Scheduling in YARN  Hadoop Capacity Scheduler  Hadoop Fair Scheduler  Yet, so far we‟ve talked of only one kind of resource  Either processor, or memory  How about multi-resource requirements?  Next! 27
  • 28. 28 Challenge  What about scheduling VMs in a cloud (cluster)?  Jobs may have multi-resource requirements  Job 1‟s tasks: 2 CPUs, 8 GB  Job 2‟s tasks: 6 CPUs, 2 GB  How do you schedule these jobs in a “fair” manner?  That is, how many tasks of each job do you allow the system to run concurrently?  What does fairness even mean? 28
  • 29. 29 Dominant Resource Fairness (DRF)  Proposed by researchers from U. California Berkeley  Proposes notion of fairness across jobs with multi- resource requirements  They showed that DRF is  Fair for multi-tenant systems  Strategy-proof: tenant can‟t benefit by lying  Envy-free: tenant can‟t envy another tenant‟s allocations 29
  • 30. 30 Where is DRF Useful?  DRF is  Usable in scheduling VMs in a cluster  Usable in scheduling Hadoop in a cluster  DRF used in Mesos, an OS intended for cloud environments  DRF-like strategies also used some cloud computing company‟s distributed OS‟s 30
  • 31. 31 How DRF Works  Our example  Job 1‟s tasks: 2 CPUs, 8 GB => Job 1‟s resource vector = <2 CPUs, 8 GB>  Job 2‟s tasks: 6 CPUs, 2 GB => Job 2‟s resource vector = <6 CPUs, 2 GB>  Consider a cloud with <18 CPUs, 36 GB RAM> 31
  • 32. 32 DRF Works (2)  Our example  Job 1‟s tasks: 2 CPUs, 8 GB => Job 1‟s resource vector = <2 CPUs, 8 GB>  Job 2‟s tasks: 6 CPUs, 2 GB => Job 2‟s resource vector = <6 CPUs, 2 GB>  Consider a cloud with <18 CPUs, 36 GB RAM>  Each Job 1‟s task consumes % of total CPUs = 2/18 = 1/9  Each Job 1‟s task consumes % of total RAM = 8/36 = 2/9  1/9 < 2/9  => Job 1’s dominant resource is RAM, i.e., Job 1 is more memory- intensive than it is CPU-intensive 32
  • 33. 33 How DRF Works (3)  Our example  Job 1‟s tasks: 2 CPUs, 8 GB => Job 1‟s resource vector = <2 CPUs, 8 GB>  Job 2‟s tasks: 6 CPUs, 2 GB => Job 2‟s resource vector = <6 CPUs, 2 GB>  Consider a cloud with <18 CPUs, 36 GB RAM>  Each Job 2‟s task consumes % of total CPUs = 6/18 = 6/18  Each Job 2‟s task consumes % of total RAM = 2/36 = 1/18  6/18 > 1/18  => Job 2’s dominant resource is CPU, i.e., Job 1 is more CPU- intensive than it is memory-intensive 33
  • 34. 34 DRF Fairness  For a given job, the % of its dominant resource type that it gets cluster-wide, is the same for all jobs  Job 1‟s % of RAM = Job 2‟s % of CPU  Can be written as linear equations, and solved 34
  • 35. 35 DRF Solution, For our Example  DRF Ensures  Job 1‟s % of RAM = Job 2‟s % of CPU  Solution for our example:  Job 1 gets 3 tasks each with <2 CPUs, 8 GB>  Job 2 gets 2 tasks each with <6 CPUs, 2 GB> • Job 1‟s % of RAM = Number of tasks * RAM per task / Total cluster RAM = 3*8/36 = 2/3 • Job 2‟s % of CPU = Number of tasks * CPU per task / Total cluster CPUs = 2*6/18 = 2/3 35
  • 36. 36 Other DRF Details  DRF generalizes to multiple jobs  DRF also generalizes to more than 2 resource types  CPU, RAM, Network, Disk, etc.  DRF ensures that each job gets a fair share of that type of resource which the job desires the most  Hence fairness 36
  • 37. 37 Summary: Scheduling  Scheduling very important problem in cloud computing  Limited resources, lots of jobs requiring access to these jobs  Single-processor scheduling  FIFO/FCFS, STF, Priority, Round-Robin  Hadoop scheduling  Capacity scheduler, Fair scheduler  Dominant-Resources Fairness 37
  • 38.
  • 39. Session Objectives Introduction, DBMS, Types HBASE Hive PIG Big table and Graph Database Summary
  • 40. 40 History of the World, Part 1 40  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache)
  • 41. 41 Scaling Up 41  Issues with scaling up when the dataset is just too big  RDBMS were not designed to be distributed  Began to look at multi-node database solutions  Known as „scaling out‟ or „horizontal scaling‟  Different approaches include:  Master-slave  Sharding
  • 42. 42 Scaling RDBMS – Master/Slave 42  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 43. 43 Scaling RDBMS - Sharding 43  Partition or sharding  Scales well for both reads and writes  Not transparent, application needs to be partition-aware  Can no longer have relationships/joins across partitions  Loss of referential integrity across shards
  • 44. 44 Other ways to scale RDBMS 44  Multi-Master replication  INSERT only, not UPDATES/DELETES  No JOINs, thereby reducing query time  This involves de-normalizing data  In-memory databases
  • 45. 45 What is NoSQL? 45  Stands for Not Only SQL  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
  • 46. 46 Why NoSQL? 46  For data storage, an RDBMS cannot be the be-all/end-all  Just as there are different programming languages, need to have other data storage tools in the toolbox  A NoSQL solution is more acceptable to a client now than even a year ago  Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago
  • 47. 47 How did we get here? 47  Explosion of social media sites (Facebook, Twitter) with large data needs  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Just as moving to dynamically-typed languages (Ruby/Groovy), a shift to dynamically-typed data with frequent schema changes  Open-source community
  • 48. 48 Dynamo and BigTable 48  Three major papers were the seeds of the NoSQL movement  BigTable (Google)  Dynamo (Amazon)  Gossip protocol (discovery and error detection)  Distributed key-value data store  Eventual consistency  CAP Theorem (discuss in a sec ..)
  • 49. 49 The Perfect Storm 49  Large datasets, acceptance of alternatives, and dynamically-typed data has come together in a perfect storm  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 50. 50 CAP Theorem 50  Three properties of a system: consistency, availability and partitions  You can have at most two of these three properties for any shared-data system  To scale out, you have to partition. That leaves either consistency or availability to choose from  In almost all cases, you would choose availability over consistency
  • 52. 52 CAP Theorem 52 Once a writer has written, all readers will see that write Consistency Partition tolerance Availability
  • 53. 53 Consistency 53  Two kinds of consistency:  strong consistency – ACID(Atomicity Consistency Isolation Durability)  weak consistency – BASE(Basically Available Soft- state Eventual consistency )
  • 54. 54 ACID Transactions 54 54  A DBMS is expected to support “ACID transactions,” processes that are:  Atomic : Either the whole process is done or none is.  Consistent : Database constraints are preserved.  Isolated : It appears to the user as if only one process executes at a time.  Durable : Effects of a process do not get lost if the system crashes.
  • 55. 55 Atomicity 55 55  A real-world event either happens or does not happen  Student either registers or does not register  Similarly, the system must ensure that either the corresponding transaction runs to completion or, if not, it has no effect at all  Not true of ordinary programs. A crash could leave files partially updated on recovery
  • 56. 56 Commit and Abort 56 56  If the transaction successfully completes it is said to commit  The system is responsible for ensuring that all changes to the database have been saved  If the transaction does not successfully complete, it is said to abort  The system is responsible for undoing, or rolling back, all changes the transaction has made
  • 57. 57 Database Consistency 57 57  Enterprise (Business) Rules limit the occurrence of certain real-world events  Student cannot register for a course if the current number of registrants equals the maximum allowed  Correspondingly, allowable database states are restricted cur_reg <= max_reg  These limitations are called (static) integrity constraints: assertions that must be satisfied by all database states (state invariants).
  • 58. 58 Database Consistency (state invariants) 58 58  Other static consistency requirements are related to the fact that the database might store the same information in different ways  cur_reg = |list_of_registered_students|  Such limitations are also expressed as integrity constraints  Database is consistent if all static integrity constraints are satisfied
  • 59. 59 Transaction Consistency 59 59  A consistent database state does not necessarily model the actual state of the enterprise  A deposit transaction that increments the balance by the wrong amount maintains the integrity constraint balance  0, but does not maintain the relation between the enterprise and database states  A consistent transaction maintains database consistency and the correspondence between the database state and the enterprise state (implements its specification)  Specification of deposit transaction includes balance = balance + amt_deposit , (balance is the next value of balance)
  • 60. 60 Dynamic Integrity Constraints (transition invariants) 60 60  Some constraints restrict allowable state transitions  A transaction might transform the database from one consistent state to another, but the transition might not be permissible  Example: A letter grade in a course (A, B, C, D, F) cannot be changed to an incomplete (I)  Dynamic constraints cannot be checked by examining the database state
  • 61. 61 Transaction Consistency 61 61  Consistent transaction: if DB is in consistent state initially, when the transaction completes:  All static integrity constraints are satisfied (but constraints might be violated in intermediate states) Can be checked by examining snapshot of database  New state satisfies specifications of transaction Cannot be checked from database snapshot  No dynamic constraints have been violated Cannot be checked from database snapshot
  • 62. 62 Isolation 62 62  Serial Execution: transactions execute in sequence  Each one starts after the previous one completes.  Execution of one transaction is not affected by the operations of another since they do not overlap in time  The execution of each transaction is isolated from all others.  If the initial database state and all transactions are consistent, then the final database state will be consistent and will accurately reflect the real-world state, but  Serial execution is inadequate from a performance perspective
  • 63. 63 Isolation 63 63  Concurrent execution offers performance benefits:  A computer system has multiple resources capable of executing independently (e.g., cpu’s, I/O devices), but  A transaction typically uses only one resource at a time  Hence, only concurrently executing transactions can make effective use of the system  Concurrently executing transactions yield interleaved schedules
  • 64. 64 Concurrent Execution 64 T1 T2 DBMS local computation local variables sequence of db operations output by T1op1,1 op1.2 op2,1 op2.2 op1,1 op2,1 op2.2 op1.2 interleaved sequence of db operations input to DBMS begin trans .. op1,1 .. op1,2 .. commit
  • 65. 65 Durability 65 65  The system must ensure that once a transaction commits, its effect on the database state is not lost in spite of subsequent failures  Not true of ordinary programs. A media failure after a program successfully terminates could cause the file system to be restored to a state that preceded the program’s execution
  • 66. 66 Implementing Durability 66  Database stored redundantly on mass storage devices to protect against media failure  Architecture of mass storage devices affects type of media failures that can be tolerated  Related to Availability: extent to which a (possibly distributed) system can provide service despite failure  Non-stop DBMS (mirrored disks)  Recovery based DBMS (log)
  • 67. 67 Consistency Model 67  A consistency model determines rules for visibility and apparent order of updates.  For example:  Row X is replicated on nodes M and N  Client A writes row X to node N  Some period of time t elapses.  Client B reads row X from node M  Does client B see the write from client A?  Consistency is a continuum with tradeoffs  For NoSQL, the answer would be: maybe  CAP Theorem states: Strict Consistency can't be achieved at the same time as availability and partition-tolerance.
  • 68. 68 Eventual Consistency 68  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID
  • 69. 69 The CAP Theorem 69 System is available during software and hardware upgrades and node failures. Consistency Partition toleranc e Availabilit y
  • 70. 70 Availability 70  Traditionally, thought of as the server/process available five 9‟s (99.999 %).  However, for large node system, at almost any point in time there‟s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 71. 71 The CAP Theorem 71 A system can continue to operate in the presence of a network partitions. Consistency Partition toleranc e Availabilit y
  • 72. 72 The CAP Theorem 72 Theorem: You can have at most two of these properties for any shared-data system Consistency Partition toleranc e Availabilit y
  • 73. 73 What kinds of NoSQL 73  NoSQL solutions fall into two major areas:  Key/Value or „the big hash table‟.  Amazon S3 (Dynamo)  Voldemort  Scalaris  Memcached (in-memory key/value store)  Redis  Schema-less which comes in multiple flavors, column-based, document-based or graph-based.  Cassandra (column-based)  CouchDB (document-based)  MongoDB(document-based)  Neo4J (graph-based)  HBase (column-based)
  • 74. 74 Key/Value 74 Pros:  very fast  very scalable  simple model  able to distribute horizontally Cons: - many data structures (objects) can't be easily modeled as key value pairs
  • 75. 75 Schema-Less 75 Pros: - Schema-less data model is richer than key/value pairs - eventual consistency - many are distributed - still provide excellent performance and scalability Cons: - typically no ACID transactions or joins
  • 76. 76 Common Advantages 76  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema  Can scale up and down  Relax the data consistency requirement (CAP)
  • 77. 77 What am I giving up? 77  joins  group by  order by  ACID transactions  SQL as a sometimes frustrating but still powerful query language  easy integration with other applications that support SQL
  • 78.
  • 79. 79 Types of DBMS 79  Hierarchical database model resembles a tree structure, similar to a folder architecture in your computer system. The relationships between records are pre-defined in a one to one manner, between 'parent and child' nodes. They require the user to pass a hierarchy in order to access needed data. Due to limitations, such databases may be confined to specific uses. Discover more about hierarchical  Network database models also have a hierarchical structure. However, instead of using a single-parent tree hierarchy, this model supports many to many relationships, as child tables can have more than one parent. See more on network databases  NoSQL or non-relational databases A popular alternative to relational databases, NoSQL databases take a variety of forms and allow you to store and manipulate large amounts of unstructured and semi-structured data. Examples include key-value stores, document stores and graph databases.  A flat file database A flat file database stores data in a plain text file, with each line of text typically holding one record. Delimiters such as commas or tabs separate fields. A flat file database uses a simple structure and, unlike a relational database, cannot contain multiple tables and relations.  object-oriented database systems in object-oriented databases, the information is represented as objects, with different types of relationships possible between two or more objects. Such databases use an object-oriented programming language for development. Find out more about object-
  • 80. 80 80  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache) Types of DBMS
  • 81. 81 Scaling Up 81  Issues with scaling up when the dataset is just too big  RDBMS were not designed to be distributed  Began to look at multi-node database solutions  Known as „scaling out‟ or „horizontal scaling‟  Different approaches include:  Master-slave  Sharding
  • 82. 82 Scaling RDBMS – Master/Slave 82  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 83. 83 Scaling RDBMS - Sharding 83  Partition or sharding  Scales well for both reads and writes  Not transparent, application needs to be partition-aware  Can no longer have relationships/joins across partitions  Loss of referential integrity across shards
  • 84. 84 Other ways to scale RDBMS 84  Multi-Master replication  INSERT only, not UPDATES/DELETES  No JOINs, thereby reducing query time  This involves de-normalizing data  In-memory databases
  • 85. 85 What is NoSQL? 85  Stands for Not Only SQL  RDBMS search mechanism is tuple wise  When to use SQL: Schema, Consistency and transaction  When to use NoSQL: Speed, Scalabilty,Flexibility  Types of NoSql: Colum oriented, Document, Key value stored,Graph oriented,  Class of non-relational data storage systems  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)  Example of the user uses Samsung and iPhone ( if you want individual specific data then SQL is preferred but if you want a data in bulk then you are preferring NoSQL HBASE: Colum oriented
  • 86. 86 Why NoSQL? 86  For data storage, an RDBMS cannot be the be-all/end-all  Just as there are different programming languages, need to have other data storage tools in the toolbox  A NoSQL solution is more acceptable to a client now than even a year ago  Think about proposing a Ruby/Rails or Groovy/Grails solution now versus a couple of years ago
  • 87. 87 How did we get here? 87  Explosion of social media sites (Facebook, Twitter) with large data needs  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Just as moving to dynamically-typed languages (Ruby/Groovy), a shift to dynamically-typed data with frequent schema changes  Open-source community
  • 88. 88 Dynamo and BigTable 88  Three major papers were the seeds of the NoSQL movement  BigTable (Google)  Dynamo (Amazon)  Gossip protocol (discovery and error detection)  Distributed key-value data store  Eventual consistency  CAP Theorem (discuss in a sec ..)
  • 89. 89 The Perfect Storm 89  Large datasets, acceptance of alternatives, and dynamically-typed data has come together in a perfect storm  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 90. 90 CAP Theorem 90  Three properties of a system: consistency, availability and partitions  You can have at most two of these three properties for any shared-data system  To scale out, you have to partition. That leaves either consistency or availability to choose from  In almost all cases, you would choose availability over consistency
  • 92. 92 Consistency 92  Two kinds of consistency:  strong consistency – ACID(Atomicity Consistency Isolation Durability)  weak consistency – BASE(Basically Available Soft- state Eventual consistency )
  • 93. 93 93 Databases and Querying (HBASE, Pig, and Hive)
  • 94. 94 HBASE 94  As hadoop also can prosess the dataset then why do we require HBASE? Hadoop uses batch processing and sequential data access. Then for search of small data with specific information we cannot go for one trillion tuples to be searched at once.  IS NoSQL Column oriented/ Column family oriented  But HABSE uses random data access permission. No need to search dataset in a batch processing that Hadoop do.  HA, Replication, Fault tolerance as it is installed on Hadoop it provides all hadoop features  When to use Hbase? When small database size that is in MB of Gb then RDBMS with SQL. But if database is in TB/Peta Bytes the HBASE. When you do not require Transaction, Rigid schema, Big queries, Complex joins, Speed, Scalability, Flexibility Who are using HBASE? Pinterest, Facebook, Adobe, Yahoo
  • 95. 95 HBASE 95  A distributed data store that can scale horizontally to 1,000s of commodity servers and petabytes of indexed storage.  Designed to operate on top of the Hadoop distributed file system (HDFS) or Kosmos File System (KFS, aka Cloudstore) for scalability, fault tolerance, and high availability.  Distributed storage  Table-like in data structure  multi-dimensional map  High scalability  High availability  High performance
  • 96. 96 HBASE 96  Started toward by Chad Walters and Jim  2006.11  Google releases paper on BigTable  2007.2  Initial HBase prototype created as Hadoop contrib.  2007.10  First useable HBase  2008.1  Hadoop become Apache top-level project and HBase becomes subproject  2008.10~  HBase 0.18, 0.19 released
  • 97. 97 HBASE is not a… 97  Tables have one primary index, the row key.  No join operators.  Limited atomicity and transaction support.  HBase supports multiple batched mutations of single rows only.  Data is unstructured and untyped.  No accessed or manipulated via SQL.  Programmatic access via Java, REST, or Thrift APIs.  Scripting via JRuby.  Scans and queries can select a subset of available columns, perhaps by using a wildcard.  There are three types of lookups:  Fast lookup using row key and optional timestamp.  Full table scan  Range scan from region start to end.
  • 98. 98 HBASE Advantages 98  No real indexes  Automatic partitioning  Scale linearly and automatically with new nodes  Commodity hardware  Fault tolerance  Batch processing
  • 99. 99 HBASE Data model 99  Tables are sorted by Row  Table schema only define it‟s column families .  Each family consists of any number of columns  Each column consists of any number of versions  Columns only exist when inserted, NULLs are free.  Columns within a family are sorted and stored together  Everything except table names are byte[]  (Row, Family: Column, Timestamp)  Value Row key Column Family valueTimeStamp
  • 100. 100 Members Master Responsible for monitoring region servers Load balancing for regions Redirect client to correct region servers The current SPOF Region server slaves Serving requests(Write/Read/Scan) of Client Send HeartBeat to Master Throughput and Region numbers are scalable by region servers 100
  • 102. 102 102  Region Default size: 256 MB, once region full new region is created. Why not to have one region to store all data. degrade performance  Region has write memory read memory HBASE Architecture
  • 103. 103 103 HBASE Architecture  Region Server handles multiple region , Each region has column family, Each reagion can have different table like employee, students, prodcts.  Region Default size: 256 MB, once region full new region is created. Why not to have one region to store all data. degrade performance  Data is written in Memstore/ Write ahead log: It‟s a file very region server maintain. For recovery purpose if data loss.  Memstore is write buffer. Default size is 100 MB. Once it is full it flush the data. Its segmented into very small hfiles in KB and stored on disk.  All these hfiles are zipped together by admin that is called as major compaction. It is generally done in non peak hours.  Few files zipped together by admin that is called as Minor compaction  Region has write memory read memory write memory Read memory
  • 104. 104 104 HBASE Architecture Hmaster Functions  Create, delete, update operations  Region Assignment in region server  Reassessing regions after load balancing  Manage region server failure (region server failure then recovery is also done by Hmaster)
  • 105. 105 HBASE 105 Zookeeper Functions  Active/ Inactive Hmaster, Region server ping/ Heart beat signal to zookeeper.  If active Hmaster crashes it won‟t send the heartbeat signal then zookeeper activate inactive Hmaster  Root table and meta tables are chandelled by zoo keeper  Complete cluster management task is under zoo keeper.  Root is only one and meta tables can be more ( Which data where, which region. Mestore, block cache .
  • 106. 106 HBASE:ZooKeeper 106  HBase depends on ZooKeeper and by default it manages a ZooKeeper instance as the authority on cluster state
  • 107. 107 HBASE:ZooKeeper:Operation 107 The -ROOT- table holds the list of .META. table regions The .META. table holds the list of all user- space regions.
  • 108. 108 HBASE:ZooKeeper 108 Installation (1) $ wget http://ftp.twaren.net/Unix/Web/apache/hadoop/hbase/hba se-0.20.2/hbase-0.20.2.tar.gz $ sudo tar -zxvf hbase-*.tar.gz -C /opt/ $ sudo ln -sf /opt/hbase-0.20.2 /opt/hbase $ sudo chown -R $USER:$USER /opt/hbase $ sudo mkdir /var/hadoop/ $ sudo chmod 777 /var/hadoop START Hadoop…
  • 109. 109 HBASE:ZooKeeper 109 Setup (1) $ vim /opt/hbase/conf/hbase-env.sh export JAVA_HOME=/usr/lib/jvm/java-6-sun export HADOOP_CONF_DIR=/opt/hadoop/conf export HBASE_HOME=/opt/hbase export HBASE_LOG_DIR=/var/hadoop/hbase-logs export HBASE_PID_DIR=/var/hadoop/hbase-pids export HBASE_MANAGES_ZK=true export HBASE_CLASSPATH=$HBASE_CLASSPATH:/opt/hadoop/conf $ cd /opt/hbase/conf $ cp /opt/hadoop/conf/core-site.xml ./ $ cp /opt/hadoop/conf/hdfs-site.xml ./ $ cp /opt/hadoop/conf/mapred-site.xml ./
  • 110. 110 HBASE:ZooKeeper 110 Setup (2) <configuration> <property> <name> name </name> <value> value </value> </property> </configuration> Name value hbase.rootdir hdfs://secuse.nchc.org.tw:9000/hbase hbase.tmp.dir /var/hadoop/hbase-${user.name} hbase.cluster.distributed true hbase.zookeeper.property .clientPort 2222 hbase.zookeeper.quorum Host1, Host2 hbase.zookeeper.property .dataDir /var/hadoop/hbase-data
  • 111. 111 HBASE 111 Startup & Stop $ start-hbase.sh $ stop-hbase.sh
  • 112. 112 HBASE 112 Testing (4) $ hbase shell > create 'test', 'data' 0 row(s) in 4.3066 seconds > list test 1 row(s) in 0.1485 seconds > put 'test', 'row1', 'data:1', 'value1' 0 row(s) in 0.0454 seconds > put 'test', 'row2', 'data:2', 'value2' 0 row(s) in 0.0035 seconds > put 'test', 'row3', 'data:3', 'value3' 0 row(s) in 0.0090 seconds > scan 'test' ROW COLUMN+CELL row1 column=data:1, timestamp=1240148026198, value=value1 row2 column=data:2, timestamp=1240148040035, value=value2 row3 column=data:3, timestamp=1240148047497, value=value3 3 row(s) in 0.0825 seconds > disable 'test' 09/04/19 06:40:13 INFO client.HBaseAdmin: Disabled test 0 row(s) in 6.0426 seconds > drop 'test' 09/04/19 06:40:17 INFO client.HBaseAdmin: Deleted test 0 row(s) in 0.0210 seconds > list 0 row(s) in 2.0645 seconds
  • 113. 113 HBASE 113 Connecting to HBase  Java client  get(byte [] row, byte [] column, long timestamp, int versions);  Non-Java clients  Thrift server hosting HBase client instance  Sample ruby, c++, & java (via thrift) clients  REST server hosts HBase client  TableInput/OutputFormat for MapReduce  HBase as MR source or sink  HBase Shell  JRuby IRB with “DSL” to add get, scan, and admin  ./bin/hbase shell YOUR_SCRIPT
  • 114. 114 HBASE 114 Thrift  a software framework for scalable cross-language services development.  By facebook  seamlessly between C++, Java, Python, PHP, and Ruby.  This will start the server instance, by default on port 9090  The other similar project “rest” $ hbase-daemon.sh start thrift $ hbase-daemon.sh stop thrift
  • 115. 115 Hive 115  What is hive ?  Its is Data warehouse package built on the top of Hadoop. Used for data visualization and analysis.  User with SQL background uses hive  No need of java Familiarizations  History?  FB Daily generate 78 TB/day, 1.5L queries per day, 300 M images /day  Facebook was using Backup strategy and import was done using schedule job(Cron Job)  ETL(Extract transform and load) using python  Oracle DBMS , MS SQL server was being used which caused lo of problems  So oracle was having SQL programmer so they developed Hive compatible with SQL which is called as HQL  Features  Tables can be created  JDBC/ODBC drivers are available  Data is only stored on Hadoop.  Uses Hadoop for the fault tolerance . as Hadoop provide the fault tolerance for all like pIG, HIVE, HBASE  Features  Data Mining  Document indexing ( Face book images indexing)  Video indexing  Predictive modeling
  • 116. 116 Hive 116 Need for High-Level Languages  Hadoop is great for large-data processing!  But writing Java programs for everything is verbose and slow  Not everyone wants to (or can) write Java code  Solution: develop higher-level data processing languages  Hive: HQL is like SQL  Pig: Pig Latin is a bit like Perl
  • 117. 117 Hive 117 Hive: data warehousing application in Hadoop Query language is HQL, variant of SQL Tables stored on HDFS as flat files Developed by Facebook, now open source Pig: large-scale data processing system Scripts are written in Pig Latin, a dataflow language Developed by Yahoo!, now open source Roughly 1/3 of all Yahoo! internal jobs Common idea: Provide higher-level language to facilitate large-data processing Higher-level language “compiles down” to Hadoop jobs
  • 118. 118 Hive 118 Hive: Example  Hive looks similar to an SQL database  Relational join on two tables:  Table of word counts from Shakespeare collection  Table of word counts from the bible Source: Material drawn from Cloudera training VM SELECT s.word, s.freq, k.freq FROM shakespeare s JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1 ORDER BY s.freq DESC LIMIT 10; the 25848 62394 I 23031 8854 and 19671 38985 to 18038 13526 of 16700 34654 a 14170 8057 you 12702 2720 my 11297 4135 in 10797 12445 is 8882 6884
  • 119. 119 Hive 119 Hive: Behind the Scenes SELECT s.word, s.freq, k.freq FROM shakespeare s JOIN bible k ON (s.word = k.word) WHERE s.freq >= 1 AND k.freq >= 1 ORDER BY s.freq DESC LIMIT 10; (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF shakespeare s) (TOK_TABREF bible k) (= (. (TOK_TABLE_OR_COL s) word) (. (TOK_TABLE_OR_COL k) word)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL s) word)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL s) freq)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL k) freq))) (TOK_WHERE (AND (>= (. (TOK_TABLE_OR_COL s) freq) 1) (>= (. (TOK_TABLE_OR_COL k) freq) 1))) (TOK_ORDERBY (TOK_TABSORTCOLNAMEDESC (. (TOK_TABLE_OR_COL s) freq))) (TOK_LIMIT 10))) (one or more of MapReduce jobs) (Abstract Syntax Tree)
  • 120. 120 Hive: Behind the Scenes 120 STAGE DEPENDENCIES: Stage-1 is a root stage Stage-2 depends on stages: Stage-1 Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: s TableScan alias: s Filter Operator predicate: expr: (freq >= 1) type: boolean Reduce Output Operator key expressions: expr: word type: string sort order: + Map-reduce partition columns: expr: word type: string tag: 0 value expressions: expr: freq type: int expr: word type: string k TableScan alias: k Filter Operator predicate: expr: (freq >= 1) type: boolean Reduce Output Operator key expressions: expr: word type: string sort order: + Map-reduce partition columns: expr: word type: string tag: 1 value expressions: expr: freq type: int Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} {VALUE._col1} 1 {VALUE._col0} outputColumnNames: _col0, _col1, _col2 Filter Operator predicate: expr: ((_col0 >= 1) and (_col2 >= 1)) type: boolean Select Operator expressions: expr: _col1 type: string expr: _col0 type: int expr: _col2 type: int outputColumnNames: _col0, _col1, _col2 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.SequenceFileInputFormat output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Stage: Stage-2 Map Reduce Alias -> Map Operator Tree: hdfs://localhost:8022/tmp/hive-training/364214370/10002 Reduce Output Operator key expressions: expr: _col1 type: int sort order: - tag: -1 value expressions: expr: _col0 type: string expr: _col1 type: int expr: _col2 type: int Reduce Operator Tree: Extract Limit File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: 10
  • 121. 121 Hive 121 Example Data Analysis Task user url time Amy www.cnn.com 8:00 Amy www.crap.com 8:05 Amy www.myblog.com 10:00 Amy www.flickr.com 10:05 Fred cnn.com/index.htm 12:00 url pagerank www.cnn.com 0.9 www.flickr.com 0.9 www.myblog.com 0.7 www.crap.com 0.2 Find users who tend to visit “good” pages. PagesVisits ... ... Pig Slides adapted from Olston et al.
  • 122. 122 Hive 122 System-Level Dataflow . . . . . . Visits Pages ... ... join by url the answer loadload canonicalize compute average pagerank filter group by user Pig Slides adapted from Olston et al.
  • 123. 123 Hive:MapReduce Code 123 i m p o r t j a v a . i o . I O E x c e p t i o n ; i m p o r t j a v a . u t i l . A r r a y L i s t ; i m p o r t j a v a . u t i l . I t e r a t o r ; i m p o r t j a v a . u t i l . L i s t ; i m p o r t o r g . a p a c h e . h a d o o p . f s . P a t h ; i m p o r t o r g . a p a c h e . h a d o o p . i o . L o n g W r i t a b l e ; i m p o r t o r g . a p a c h e . h a d o o p . i o . T e x t ; i m p o r t o r g . a p a c h e . h a d o o p . i o . W r i t a b l e ; im p o r t o r g . a p a c h e . h a d o o p . i o . W r i t a b l e C o m p a r a b l e ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . F i l e I n p u t F o r m a t ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . F i l e O u t p u t F o r m a t ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . J o b C o n f ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . K e y V a l u e T e x t I n p u t F o r m a t ; i m p o r t o r g . ap a c h e . h a d o o p . m a p r e d . M a p p e r ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . M a p R e d u c e B a s e ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . O u t p u t C o l l e c t o r ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e c o r d R e a d e r ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e d u c e r ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . R e p o r t e r ; i m po r t o r g . a p a c h e . h a d o o p . m a p r e d . S e q u e n c e F i l e I n p u t F o r m a t ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . S e q u e n c e F i l e O u t p u t F o r m a t ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . T e x t I n p u t F o r m a t ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . j o b c o n t r o l . J o b ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . j o b c o n t r o l . J o b Co n t r o l ; i m p o r t o r g . a p a c h e . h a d o o p . m a p r e d . l i b . I d e n t i t y M a p p e r ; p u b l i c c l a s s M R E x a m p l e { p u b l i c s t a t i c c l a s s L o a d P a g e s e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s M a p p e r < L o n g W r i t a b l e , T e x t , T e x t , T e x t > { p u b l i c v o i d m a p ( L o n g W r i t a b l e k , T e x t v a l , O u t p u t C o l l e c t o r < T e x t , T e x t > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / P u l l t h e k e y o u t S t r i n g l i n e = v a l . t o S t r i n g ( ) ; i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ; S t r i n g k e y = l i n e . s u bs t r i n g ( 0 , f i r s t C o m m a ) ; S t r i n g v a l u e = l i n e . s u b s t r i n g ( f i r s t C o m m a + 1 ) ; T e x t o u t K e y = n e w T e x t ( k e y ) ; / / P r e p e n d a n i n d e x t o t h e v a l u e s o w e k n o w w h i c h f i l e / / i t c a m e f r o m . T e x t o u t V a l = n e w T e x t ( " 1" + v a l u e ) ; o c . c o l l e c t ( o u t K e y , o u t V a l ) ; } } p u b l i c s t a t i c c l a s s L o a d A n d F i l t e r U s e r s e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s M a p p e r < L o n g W r i t a b l e , T e x t , T e x t , T e x t > { p u b l i c v o i d m a p ( L o n g W r i t a b l e k , T e x t v a l , O u t p u t C o l l e c t o r < T e x t , T e x t > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / P u l l t h e k e y o u t S t r i n g l i n e = v a l . t o S t r i n g ( ) ; i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ; S t r i n g v a l u e = l i n e . s u b s t r i n g (f i r s t C o m m a + 1 ) ; i n t a g e = I n t e g e r . p a r s e I n t ( v a l u e ) ; i f ( a g e < 1 8 | | a g e > 2 5 ) r e t u r n ; S t r i n g k e y = l i n e . s u b s t r i n g ( 0 , f i r s t C o m m a ) ; T e x t o u t K e y = n e w T e x t ( k e y ) ; / / P r e p e n d a n i n d e x t o t h e v a l u e s o we k n o w w h i c h f i l e / / i t c a m e f r o m . T e x t o u t V a l = n e w T e x t ( " 2 " + v a l u e ) ; o c . c o l l e c t ( o u t K e y , o u t V a l ) ; } } p u b l i c s t a t i c c l a s s J o i n e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s R e d u c e r < T e x t , T e x t , T e x t , T e x t > { p u b l i c v o i d r e d u c e ( T e x t k e y , I t e r a t o r < T e x t > i t e r , O u t p u t C o l l e c t o r < T e x t , T e x t > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / F o r e a c h v a l u e , f i g u r e o u t w h i c h f i l e i t ' s f r o m a n d s t o r e i t / / a c c o r d i n g l y . L i s t < S t r i n g > f i r s t = n e w A r r a y L i s t < S t r i n g > ( ) ; L i s t < S t r i n g > s e c o n d = n e w A r r a y L i s t < S t r i n g > ( ) ; w h i l e ( i t e r . h a s N e x t ( ) ) { T e x t t = i t e r . n e x t ( ) ; S t r i n g v a l u e = t . t oS t r i n g ( ) ; i f ( v a l u e . c h a r A t ( 0 ) = = ' 1 ' ) f i r s t . a d d ( v a l u e . s u b s t r i n g ( 1 ) ) ; e l s e s e c o n d . a d d ( v a l u e . s u b s t r i n g ( 1 ) ) ; r e p o r t e r . s e t S t a t u s ( " O K " ) ; } / / D o t h e c r o s s p r o d u c t a n d c o l l e c t t h e v a l u e s f o r ( S t r i n g s 1 : f i r s t ) { f o r ( S t r i n g s 2 : s e c o n d ) { S t r i n g o u t v a l = k e y + " , " + s 1 + " , " + s 2 ; o c . c o l l e c t ( n u l l , n e w T e x t ( o u t v a l ) ) ; r e p o r t e r . s e t S t a t u s ( " O K " ) ; } } } } p u b l i c s t a t i c c l a s s L o a d J o i n e d e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s M a p p e r < T e x t , T e x t , T e x t , L o n g W r i t a b l e > { p u b l i c v o i d m a p ( T e x t k , T e x t v a l , O u t p u t C o l l ec t o r < T e x t , L o n g W r i t a b l e > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / F i n d t h e u r l S t r i n g l i n e = v a l . t o S t r i n g ( ) ; i n t f i r s t C o m m a = l i n e . i n d e x O f ( ' , ' ) ; i n t s e c o n d C o m m a = l i n e . i n d e x O f ( ' , ' , f i r s tC o m m a ) ; S t r i n g k e y = l i n e . s u b s t r i n g ( f i r s t C o m m a , s e c o n d C o m m a ) ; / / d r o p t h e r e s t o f t h e r e c o r d , I d o n ' t n e e d i t a n y m o r e , / / j u s t p a s s a 1 f o r t h e c o m b i n e r / r e d u c e r t o s u m i n s t e a d . T e x t o u t K e y = n e w T e x t ( k e y ) ; o c . c o l l e c t ( o u t K e y , n e w L o n g W r i t a b l e ( 1 L ) ) ; } } p u b l i c s t a t i c c l a s s R e d u c e U r l s e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s R e d u c e r < T e x t , L o n g W r i t a b l e , W r i t a b l e C o m p a r a b l e , W r i t a b l e > { p u b l i c v o i d r e d u c e ( T e x t k ey , I t e r a t o r < L o n g W r i t a b l e > i t e r , O u t p u t C o l l e c t o r < W r i t a b l e C o m p a r a b l e , W r i t a b l e > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / A d d u p a l l t h e v a l u e s w e s e e l o n g s u m = 0 ; w hi l e ( i t e r . h a s N e x t ( ) ) { s u m + = i t e r . n e x t ( ) . g e t ( ) ; r e p o r t e r . s e t S t a t u s ( " O K " ) ; } o c . c o l l e c t ( k e y , n e w L o n g W r i t a b l e ( s u m ) ) ; } } p u b l i c s t a t i c c l a s s L o a d C l i c k s e x t e n d s M a p R e d u c e B a s e im p l e m e n t s M a p p e r < W r i t a b l e C o m p a r a b l e , W r i t a b l e , L o n g W r i t a b l e , T e x t > { p u b l i c v o i d m a p ( W r i t a b l e C o m p a r a b l e k e y , W r i t a b l e v a l , O u t p u t C o l l e c t o r < L o n g W r i t a b l e , T e x t > o c , R e p o r t e r r e p o r t e r )t h r o w s I O E x c e p t i o n { o c . c o l l e c t ( ( L o n g W r i t a b l e ) v a l , ( T e x t ) k e y ) ; } } p u b l i c s t a t i c c l a s s L i m i t C l i c k s e x t e n d s M a p R e d u c e B a s e i m p l e m e n t s R e d u c e r < L o n g W r i t a b l e , T e x t , L o n g W r i t a b l e , T e x t > { i n t c o u n t = 0 ; p u b l i cv o i d r e d u c e ( L o n g W r i t a b l e k e y , I t e r a t o r < T e x t > i t e r , O u t p u t C o l l e c t o r < L o n g W r i t a b l e , T e x t > o c , R e p o r t e r r e p o r t e r ) t h r o w s I O E x c e p t i o n { / / O n l y o u t p u t t h e f i r s t 1 0 0 r e c o r d s w h i l e ( c o u n t< 1 0 0 & & i t e r . h a s N e x t ( ) ) { o c . c o l l e c t ( k e y , i t e r . n e x t ( ) ) ; c o u n t + + ; } } } p u b l i c s t a t i c v o i d m a i n ( S t r i n g [ ] a r g s ) t h r o w s I O E x c e p t i o n { J o b C o n f l p = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ; l p . s et J o b N a m e ( " L o a d P a g e s " ) ; l p . s e t I n p u t F o r m a t ( T e x t I n p u t F o r m a t . c l a s s ) ; l p . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ; l p . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ; l p . s e t M a p p e r C l a s s ( L o a d P a g e s . c l a s s ) ; F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( l p , n e w P a t h ( " /u s e r / g a t e s / p a g e s " ) ) ; F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( l p , n e w P a t h ( " / u s e r / g a t e s / t m p / i n d e x e d _ p a g e s " ) ) ; l p . s e t N u m R e d u c e T a s k s ( 0 ) ; J o b l o a d P a g e s = n e w J o b ( l p ) ; J o b C o n f l f u = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ; l f u . se t J o b N a m e ( " L o a d a n d F i l t e r U s e r s " ) ; l f u . s e t I n p u t F o r m a t ( T e x t I n p u t F o r m a t . c l a s s ) ; l f u . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ; l f u . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ; l f u . s e t M a p p e r C l a s s ( L o a d A n d F i l t e r U s e r s . c l a s s ) ; F i l e I n p u t F o r m a t . a d dI n p u t P a t h ( l f u , n e w P a t h ( " / u s e r / g a t e s / u s e r s " ) ) ; F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( l f u , n e w P a t h ( " / u s e r / g a t e s / t m p / f i l t e r e d _ u s e r s " ) ) ; l f u . s e t N u m R e d u c e T a s k s ( 0 ) ; J o b l o a d U s e r s = n e w J o b ( l f u ) ; J o b C o n f j o i n = n e w J o b C o n f (M R E x a m p l e . c l a s s ) ; j o i n . s e t J o b N a m e ( " J o i n U s e r s a n d P a g e s " ) ; j o i n . s e t I n p u t F o r m a t ( K e y V a l u e T e x t I n p u t F o r m a t . c l a s s ) ; j o i n . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ; j o i n . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ; j o i n . s e t M a p p e r C l a s s ( I d e n t i t y M a pp e r . c l a s s ) ; j o i n . s e t R e d u c e r C l a s s ( J o i n . c l a s s ) ; F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( j o i n , n e w P a t h ( " / u s e r / g a t e s / t m p / i n d e x e d _ p a g e s " ) ) ; F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( j o i n , n e w P a t h ( " / u s e r / g a t e s / t m p / f i l t e r e d _ u s e r s " ) ) ; F i l e O u t p u t F o r m a t . s et O u t p u t P a t h ( j o i n , n e w P a t h ( " / u s e r / g a t e s / t m p / j o i n e d " ) ) ; j o i n . s e t N u m R e d u c e T a s k s ( 5 0 ) ; J o b j o i n J o b = n e w J o b ( j o i n ) ; j o i n J o b . a d d D e p e n d i n g J o b ( l o a d P a g e s ) ; j o i n J o b . a d d D e p e n d i n g J o b ( l o a d U s e r s ) ; J o b C o n f g r o u p = n e w J o b C o n f ( M R Ex a m p l e . c l a s s ) ; g r o u p . s e t J o b N a m e ( " G r o u p U R L s " ) ; g r o u p . s e t I n p u t F o r m a t ( K e y V a l u e T e x t I n p u t F o r m a t . c l a s s ) ; g r o u p . s e t O u t p u t K e y C l a s s ( T e x t . c l a s s ) ; g r o u p . s e t O u t p u t V a l u e C l a s s ( L o n g W r i t a b l e . c l a s s ) ; g r o u p . s e t O u t p u t F o r m a t ( S e q u e n c e F il e O u t p u t F o r m a t . c l a s s ) ; g r o u p . s e t M a p p e r C l a s s ( L o a d J o i n e d . c l a s s ) ; g r o u p . s e t C o m b i n e r C l a s s ( R e d u c e U r l s . c l a s s ) ; g r o u p . s e t R e d u c e r C l a s s ( R e d u c e U r l s . c l a s s ) ; F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( g r o u p , n e w P a t h ( " / u s e r / g a t e s / t m p / j o i n e d " ) ) ; F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( g r o u p , n e w P a t h ( " / u s e r / g a t e s / t m p / g r o u p e d " ) ) ; g r o u p . s e t N u m R e d u c e T a s k s ( 5 0 ) ; J o b g r o u p J o b = n e w J o b ( g r o u p ) ; g r o u p J o b . a d d D e p e n d i n g J o b ( j o i n J o b ) ; J o b C o n f t o p 1 0 0 = n e w J o b C o n f ( M R E x a m p l e . c l a s s ) ; t o p 1 0 0 . s e t J o b N a m e ( " T o p 1 0 0 s i t e s " ) ; t o p 1 0 0 . s e t I n p u t F o r m a t ( S e q u e n c e F i l e I n p u t F o r m a t . c l a s s ) ; t o p 1 0 0 . s e t O u t p u t K e y C l a s s ( L o n g W r i t a b l e . c l a s s ) ; t o p 1 0 0 . s e t O u t p u t V a l u e C l a s s ( T e x t . c l a s s ) ; t o p 1 0 0 . s e t O u t p u t F o r m a t ( S e q u e n c e F i l e O u t p u t Fo r m a t . c l a s s ) ; t o p 1 0 0 . s e t M a p p e r C l a s s ( L o a d C l i c k s . c l a s s ) ; t o p 1 0 0 . s e t C o m b i n e r C l a s s ( L i m i t C l i c k s . c l a s s ) ; t o p 1 0 0 . s e t R e d u c e r C l a s s ( L i m i t C l i c k s . c l a s s ) ; F i l e I n p u t F o r m a t . a d d I n p u t P a t h ( t o p 1 0 0 , n e w P a t h ( " / u s e r / g a t e s / t m p / g r o u p e d " ) ) ; F i l e O u t p u t F o r m a t . s e t O u t p u t P a t h ( t o p 1 0 0 , n e w P a t h ( " / u s e r / g a t e s / t o p 1 0 0 s i t e s f o r u s e r s 1 8 t o 2 5 " ) ) ; t o p 1 0 0 . s e t N u m R e d u c e T a s k s ( 1 ) ; J o b l i m i t = n e w J o b ( t o p 1 0 0 ) ; l i m i t . a d d D e p e n d i n g J o b ( g r o u p J o b ) ; J o b C o n t r o l j c = n e w J o b C o n t r o l ( " F i n d t o p1 0 0 s i t e s f o r u s e r s 1 8 t o 2 5 " ) ; j c . a d d J o b ( l o a d P a g e s ) ; j c . a d d J o b ( l o a d U s e r s ) ; j c . a d d J o b ( j o i n J o b ) ; j c . a d d J o b ( g r o u p J o b ) ; j c . a d d J o b ( l i m i t ) ; j c . r u n ( ) ; } }
  • 124. 124 Hive 124 Data Flows  Moving HBase data HBase Prod Imported in parallel into HBase MRCopyTable MR job Read in parallel * HBase replication currently only works for a single slave cluster, in our case HBase replicates to a backup cluster.
  • 125. 125 Hive Architecture 125  Command line interface/ Hive web interface/ Thrift server are used to access hive or firing query  If you want to access hive on other machine you can access using thrift sever using C, C++, Java that is cross language interface.  Metadata of tables and hive meta data is stored in metastore .  Meta store types: Embeded metastore (Driver- MS(Meta store)-Derby), Local Metastore (Driver- My SQL), Remote meatstore (Driver- My SQL),  Its is Data warehouse package built on the top of Hadoop. Used for data visualization and analysis.  User with SQL background uses hive  No need of java Familiarizations  Limitations: Donot use for row level updates, Latency of Hive queries are high, Not designed for OLTP(insert update delete)
  • 128. 128 Hive Data Model 128 Database: path: user/warehouse/hive its folder is created : Table: Table created employee then folder is created in database folder: Partition: Date wise portioning is created under table folder , So the searching becomes faster. Buckets or clusters:Similar data allocated together depending on the has value. Types of Tables: Internal and external
  • 129. 129 PIG 129 Developed by:  Abstraction of for large datasets. Why pig? No need of Java.  Code reducibility Multi query approach Provides nested data types
  • 130. 130 PIG 130 Pig Latin Script Visits = load ‘/data/visits’ as (user, url, time); Visits = foreach Visits generate user, Canonicalize(url), time; Pages = load ‘/data/pages’ as (url, pagerank); VP = join Visits by url, Pages by url; UserVisits = group VP by user; UserPageranks = foreach UserVisits generate user, AVG(VP.pagerank) as avgpr; GoodUsers = filter UserPageranks by avgpr > ‘0.5’; store GoodUsers into '/data/good_users';
  • 131. 131 PIG 131 Java vs. Pig Latin 0 20 40 60 80 100 120 140 160 180 Hadoop Pig 1/20 the lines of code 0 50 100 150 200 250 300 Hadoop PigMinutes 1/16 the development time Performance on par with raw Hadoop!
  • 132. 132 PIG 132 Pig takes care of…  Schema and type checking  Translating into efficient physical dataflow  (i.e., sequence of one or more MapReduce jobs)  Exploiting data reduction opportunities  (e.g., early partial aggregation via a combiner)  Executing the system-level dataflow  (i.e., running the MapReduce jobs)  Tracking progress, errors, etc.
  • 133. 133 PIG 133 Integration  Reasons to use Hive on HBase:  A lot of data sitting in HBase due to its usage in a real-time environment, but never used for analysis  Give access to data in HBase usually only queried through MapReduce to people that don‟t code (business analysts)  When needing a more flexible storage solution, so that rows can be updated live by either a Hive job or an application and can be seen immediately to the other  Reasons not to do it:  Run SQL queries on HBase to answer live user requests (it‟s still a MR job)  Hoping to see interoperability with other SQL analytics systems
  • 134. 134 PIG 134 Integration  How it works:  Hive can use tables that already exist in HBase or manage its own ones, but they still all reside in the same HBase instance HBase Hive table definitions Points to an existing table Manages this table from Hive
  • 135. 135 PIG 135 Integration  How it works:  When using an already existing table, defined as EXTERNAL, you can create multiple Hive tables that point to it HBaseHive table definitions Points to some column Points to other columns, different names
  • 136. 136 PIG 136 Integration  How it works:  Columns are mapped however you want, changing names and giving types HBase tableHive table definition name STRING age INT siblings MAP<string, string> d:fullname d:age d:address f: persons people
  • 137. 137 PIG 137 Integration  Drawbacks (that can be fixed with brain juice):  Binary keys and values (like integers represented on 4 bytes) aren‟t supported since Hive prefers string representations, HIVE-1634  Compound row keys aren‟t supported, there‟s no way of using multiple parts of a key as different “fields”  This means that concatenated binary row keys are completely unusable, which is what people often use for HBase  Filters are done at Hive level instead of being pushed to the region servers  Partitions aren‟t supported
  • 138. 138 PIG 138 Data Flows  Data is being generated all over the place:  Apache logs  Application logs  MySQL clusters  HBase clusters
  • 139. 139 PIG 139 Data Flows  Moving application log files Wild log file Read nightly Transforms format Dumped into HDFS Tail’ed continuou sly Inserted into HBaseParses into HBase format
  • 140. 140 PIG 140 Data Flows  Moving MySQL data MySQL Dumped nightly with CSV import HDFS Tungsten replicator Inserted into HBaseParses into HBase format
  • 141. 141 PIG 141 Use Cases  Front-end engineers  They need some statistics regarding their latest product  Research engineers  Ad-hoc queries on user data to validate some assumptions  Generating statistics about recommendation quality  Business analysts  Statistics on growth and activity  Effectiveness of advertiser campaigns  Users‟ behavior VS past activities to determine, for example, why certain groups react better to email communications  Ad-hoc queries on stumbling behaviors of slices of the user base
  • 142. 142 PIG 142 Use Cases  Using a simple table in HBase: CREATE EXTERNAL TABLE blocked_users( userid INT, blockee INT, blocker INT, created BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,f:blockee,f:blocker,f:created") TBLPROPERTIES("hbase.table.name" = "m2h_repl-userdb.stumble.blocked_users"); HBase is a special case here, it has a unique row key map with :key Not all the columns in the table need to be mapped
  • 143. 143 PIG 143 Use Cases  Using a complicated table in HBase: CREATE EXTERNAL TABLE ratings_hbase( userid INT, created BIGINT, urlid INT, rating INT, topic INT, modified BIGINT) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b@0,:key#b@1,:key#b@2,default:rating#b,default:topic#b,default:modified#b") TBLPROPERTIES("hbase.table.name" = "ratings_by_userid"); #b means binary, @ means position in composite key (SU-specific hack)
  • 144. 144 PIG Architecture 144  Grunt Shell / pig server : If you want access PIG using any program then you will be using pig server.  Then the code goes to parser for syntax change. If it is error free then creates a logical plan that is DAG( Directed acyclic graph).  DAG which has logical operators. This logical plan is forwarded to optimizer to optimize.  After this optimized code is sent to compiler.  Compiler output is a series of map reduce job which is given to execution engine.  Execution engine takes care to execute the job on map reduce
  • 145. 145 Data Model 145  A table in Bigtable is a sparse, distributed, persistent multidimensional sorted map  Map indexed by a row key, column key, and a timestamp  (row:string, column:string, time:int64)  uninterpreted byte array  Supports lookups, inserts, deletes  Single row transactions only Image Source: Chang et al., OSDI 2006
  • 146. 146 Rows and Columns 146  Rows maintained in sorted lexicographic order  Applications can exploit this property for efficient row scans  Row ranges dynamically partitioned into tablets  Columns grouped into column families  Column key = family:qualifier  Column families provide locality hints  Unbounded number of columns
  • 147. 147 Bigtable Building Blocks 147  GFS  Chubby  SSTable
  • 148. 148 SSTable 148  Basic building block of Bigtable  Persistent, ordered immutable map from keys to values  Stored in GFS  Sequence of blocks on disk plus an index for block lookup  Can be completely mapped into memory  Supported operations:  Look up value associated with key  Iterate key/value pairs within a key range Index 64K block 64K block 64K block SSTable Source: Graphic from slides by Erik Paulson
  • 149. 149 Tablet 149  Dynamically partitioned range of rows  Built from multiple SSTables Index 64K block 64K block 64K block SSTable Index 64K block 64K block 64K block SSTable Tablet Start:aardvark End:apple Source: Graphic from slides by Erik Paulson
  • 150. 150 Table 150  Multiple tablets make up the table  SSTables can be shared SSTable SSTable SSTable SSTable Tablet aardvark apple Tablet apple_two_E boat Source: Graphic from slides by Erik Paulson
  • 151. 151 Architecture 151  Client library  Single master server  Tablet servers
  • 152. 152 Bigtable Master 152  Assigns tablets to tablet servers  Detects addition and expiration of tablet servers  Balances tablet server load  Handles garbage collection  Handles schema changes
  • 153. 153 Bigtable Tablet Servers 153  Each tablet server manages a set of tablets  Typically between ten to a thousand tablets  Each 100-200 MB by default  Handles read and write requests to the tablets  Splits tablets that have grown too large
  • 154. 154 Tablet Location 154 Upon discovery, clients cache tablet locations Image Source: Chang et al., OSDI 2006
  • 155. 155 Tablet Assignment 155  Master keeps track of:  Set of live tablet servers  Assignment of tablets to tablet servers  Unassigned tablets  Each tablet is assigned to one tablet server at a time  Tablet server maintains an exclusive lock on a file in Chubby  Master monitors tablet servers and handles assignment  Changes to tablet structure  Table creation/deletion (master initiated)  Tablet merging (master initiated)  Tablet splitting (tablet server initiated)
  • 156. 156 Tablet Serving 156 Image Source: Chang et al., OSDI 2006 “Log Structured Merge Trees”
  • 157. 157 Compactions 157  Minor compaction  Converts the memtable into an SSTable  Reduces memory usage and log traffic on restart  Merging compaction  Reads the contents of a few SSTables and the memtable, and writes out a new SSTable  Reduces number of SSTables  Major compaction  Merging compaction that results in only one SSTable  No deletion records, only live data
  • 158. 158 Bigtable Applications 158  Data source and data sink for MapReduce  Google‟s web crawl  Google Earth  Google Analytics
  • 159. 159 Cassandra 159 Why Cassandra?  Lots of data  Copies of messages, reverse indices of messages, per user data.  Many incoming requests resulting in a lot of random reads and random writes.  No existing production ready solutions in the market meet these requirements.
  • 160. 160 Cassandra 160 Design Goals  High availability  Eventual consistency  trade-off strong consistency in favor of high availability  Incremental scalability  Optimistic Replication  “Knobs” to tune tradeoffs between consistency, durability and latency  Low total cost of ownership  Minimal administration
  • 161. 161 Cassandra 161 innovation at scale  google bigtable (2006)  consistency model: strong  data model: sparse map  clones: hbase, hypertable  amazon dynamo (2007)  O(1) dht  consistency model: client tune-able  clones: riak, voldemort cassandra ~= bigtable + dynamo
  • 162. 162 Cassandra 162 proven  The Facebook stores 150TB of data on 150 nodes web 2.0  used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others
  • 163. 163 Cassandra 163 Data Model KEY ColumnFamily1 Name : MailList Type : Simple Sort : Name Name : tid1 Value : <Binary> TimeStamp : t1 Name : tid2 Value : <Binary> TimeStamp : t2 Name : tid3 Value : <Binary> TimeStamp : t3 Name : tid4 Value : <Binary> TimeStamp : t4 ColumnFamily2 Name : WordList Type : Super Sort : Time Name : aloha ColumnFamily3 Name : System Type : Super Sort : Name Name : hint1 <Column List> Name : hint2 <Column List> Name : hint3 <Column List> Name : hint4 <Column List> C1 V1 T1 C2 V2 T2 C3 V3 T3 C4 V4 T4 Name : dude C2 V2 T2 C6 V6 T6 Column Families are declared upfront Columns are added and modified dynamically SuperColumns are added and modified dynamically Columns are added and modified dynamically
  • 164. 164 Cassandra 164 Write Operations  A client issues a write request to a random node in the Cassandra cluster.  The “Partitioner” determines the nodes responsible for the data.  Locally, write operations are logged and then applied to an in-memory version.  Commit log is stored on a dedicated disk local to the machine.
  • 166. 166 Cassandra 166 Write cont‟d Key (CF1 , CF2 , CF3) Commit Log Binary serialized Key ( CF1 , CF2 , CF3 ) Memtable ( CF1) Memtable ( CF2) Memtable ( CF2) • Data size • Number of Objects • Lifetime Dedicated Disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> --- --- --- --- <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> BLOCK Index <Key Name> Offset, <Key Name> Offset K128 Offset K256 Offset K384 Offset Bloom Filter (Index in memory) Data file on disk
  • 167. 167 Cassandra 167 Write Properties  No locks in the critical path  Sequential disk access  Behaves like a write back Cache  Append support without read ahead  Atomicity guarantee for a key “Always Writable” accept writes during failure scenarios
  • 168. 168 Cassandra 168 Read Query Closest replica Cassandra Cluster Replica A Result Replica B Replica C Digest Query Digest Response Digest Response Result Client Read repair if digests differ
  • 170. 170 Cassandra 170 Cluster Membership and Failure Detection  Gossip protocol is used for cluster membership.  Super lightweight with mathematically provable properties.  State disseminated in O(logN) rounds where N is the number of nodes in the cluster.  Every T seconds each member increments its heartbeat counter and selects one other member to send its list to.  A member merges the list with its own list .
  • 175. 175 Cassandra 175 Accrual Failure Detector  Valuable for system management, replication, load balancing etc.  Defined as a failure detector that outputs a value, PHI, associated with each process.  Also known as Adaptive Failure detectors - designed to adapt to changing network conditions.  The value output, PHI, represents a suspicion level.  Applications set an appropriate threshold, trigger suspicions and perform appropriate actions.  In Cassandra the average time taken to detect a failure is 10-15 seconds with the PHI threshold set at 5.
  • 177. 177 Cassandra 177 Performance Benchmark  Loading of data - limited by network bandwidth.  Read performance for Inbox Search in production: Search Interactions Term Search Min 7.69 ms 7.78 ms Median 15.69 ms 18.27 ms Average 26.13 ms 44.41 ms
  • 178. 178 Cassandra 178 MySQL Comparison  MySQL > 50 GB Data Writes Average : ~300 ms Reads Average : ~350 ms  Cassandra > 50 GB Data Writes Average : 0.12 ms Reads Average : 15 ms
  • 179. 179 Cassandra 179 Lessons Learnt  Add fancy features only when absolutely required.  Many types of failures are possible.  Big systems need proper systems-level monitoring.  Value simple designs
  • 180. 180 Graph Databases 180 NEO4J (Graphbase) • A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. • Attach properties (key-value pairs) on nodes and relationships •Relationships connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. • A graph database can be thought of as a key-value store, with full support for relationships. • http://neo4j.org/
  • 187. 187 History of the World, Part 1 187 NEO4J Features • Dual license: open source and commercial •Well suited for many web use cases such as tagging, metadata annotations, social networks, wikis and other network-shaped or hierarchical data sets • Intuitive graph-oriented model for data representation. Instead of static and rigid tables, rows and columns, you work with a flexible graph network consisting of nodes, relationships and properties. • Neo4j offers performance improvements on the order of 1000x or more compared to relational DBs. • A disk-based, native storage manager completely optimized for storing graph structures for maximum performance and scalability • Massive scalability. Neo4j can handle graphs of several billion nodes/relationships/properties on a single machine and can be sharded to scale out across multiple machines •Fully transactional like a real database •Neo4j traverses depths of 1000 levels and beyond at millisecond speed. (many orders of magnitude faster than relational systems)