Mobile data is becoming the new source for data. Managing data in the mobile devices has become easier with NoSQL Couchbase Lite mobile database. Making sense, analyzing, scaling to exabytes has also become easier with LexisNexis Big Data platform HPCC Systems.
8. How Couchbase Lite tackles the Mobile Myths
Local data is always faster
I need to save the data non-locally
,but
9. How Couchbase Lite tackles the Mobile Myths
Local data is always faster
I need to save the data non-locally
I need to send data to another mobile devices
,but
and/or
10. EZ Data Syncing with !
Couchbase Sync Gateway
https://github.com/couchbase/sync_gateway
15. LexisNexis is a provider of legal,
tax, regulatory, news, business
information, and analysis to
legal, corporate, government,!
accounting and academic
markets. !
!
!
!
LexisNexis has been in
business since 1977 with over
30,000 employees worldwide.
What is HPCC Systems?Who is ?
LexisNexis Risk is the division
of the LexisNexis which focuses
on data, Big Data processing,
linking and vertical expertise
and supports HPCC Systems
as an open source project
under Apache 2.0 License.
16. Comparison
JAVA C++
Petabytes
1-80,000 Jobs/day
Since 2005
Exabytes
Since 2000
Indexed: 2K-3K Jobs/sec*
? ? ? ? ? ?
Thor Roxie
Block Based File Based
In-Memory: 30 - 40 Jobs/min*
Non-Indexed: 4-1,040,000 Jobs/day
*based on job (size / result set / complexity)
17. “I’m sub-second
fast.”
“I can query all
or part of your
data.”
Thor Roxie
Single Threaded
Hard Disk
Index(optional)
Multi-Threaded
Hard Disk
Index(optional)
In-memory
SSD
Either/Both
Architecture
19. 300GB File
Kevin CA 45
Mark MI 27
Sara FL 64
Name State Age
How is Data Stored on !
HPCC Systems ?!
Example
Customer Data May 2010
20. K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
Store Data
File Name
~/customers_2010-05
Data is distributed
evenly in the cluster
with replica copies
and is seen as a
file (example below).
21. K.. CA 45 M.. MI 27 S.. FL 64
Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
Store Data
Dali
File Location & Job Scheduler
File locations are
stored on disk.
File Name
~/customers_2010-05
22. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
1a.
2.
File Location & Job Scheduler
1.a A pre-compiled
query is triggered.
(Mostly used in Roxie)
1b. Ad-hoc query.
!
2.Query is sent to Dali
to get file locations.
1b.
23. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP3.
File Location & Job Scheduler
3. Job is placed in
que to be sent to
Thor Master. Thor
Master coordinates
job execution on
Thor Slave nodes.
24. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
File Location & Job Scheduler
Job are done
locally on slaves
and/or
coordinated by
master globally.
25. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
4.
4.
MI 500
CA 120
FL 7
File Location & Job Scheduler
4.Job is returned with
optional grouped by &
sorted by at run time.
26. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Dali
What state do most people live in?
ESP
MI 500
CA 120
FL 7
File Location & Job Scheduler
SORT!
GROUP!
DEDUP!
JOIN!
MERGE!
BETWEEN!
LENGTH!
REGEX!
ROUND!
SUM!
COUNT!
TRIM!
WHEN!
AVE!
CASE!
NORMALIZE!
DENORMALIZE!
K-MEANS!
more ….
Multiple other actions can be
done on the data in a single job.
27. Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of
~/facebook_2013
Query is Completed in a Single Job!
Asynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
optional
28. K CA 45 M MI 27 S FL 64Thor Master
Thor Slaves
Kevin CA 45
Mark MI 27
Sara FL 64
CA row #3
MI row #17
MI row #4
FL row #5
Speed - Part 1
Indexing
IndexIndexIndex
• index per file
• customize by field(s)
File Name
~/customers_2010-05
File Name
~/customers_2010-05_index
30. 1 40
Non-Indexed
1 200
To
Indexed
male row #345
female row #4
male row #97
female row #267
CA row #3
MI row #17
MI row #4
FL row #5
Example Index Example Index
31. Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory
Index Index Index
32. Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Index In-Memory & Part or All Data
Index Index Index
or
Index In-Memory
33. Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
34. Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
SSD are OK - write few / read many
35. Speed - Part 2
Roxie
K CA 45 M MI 27 S FL 64Roxie Master
Roxie Slaves
Roxie is Multi-Threaded
Index In-Memory & Part or All Data
or
Index In-Memory
Index Index Index
2004
36. Thor Master
Thor Slaves
Dali ESP
Roxie Master
Roxie Slaves
Common Cluster
Data is a mix of structured
and unstructured. Use
Thor to do ETL and send
results to Roxie for user
queries.
40. {“data”:”yes”} Couchbase Lite to !
HPCC Systems !
Transport
A simple Python web server
that can catch all the HTTP POST
from Sync Gateway and writes it
to a file for HPCC Systems to store.
42. INSTALL!
in 5 Minutes
Download
Source Code
Learning More - Couchbase Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA
San Francisco ,CA
http://developer.couchbase.com/
mobile/get-started/get-started-
mobile/index.html
43. INSTALL!
in 5 Minutes
Download
or
Source Code
https://github.com/hpcc-systems
http://hpccsystems.com/download/
Learning More - HPCC Systems
Atlanta, GA
Mountain View, CA
https://youtu.be/8SV43DCUqJg