4. Customers trusting in Open Source Business Intelligence
Private Sector
Public Sector
5. Open Source Big Data - Stratebi
Understanding information…Understanding information…
6. Open Source Big Data - Stratebi
Data was not stored
Beginning of the use of DBs
and basic reports
Business Intelligence.
Great variety of visual resources
to analyze data
7. Open Source Big Data- Stratebi
Data analysis profits:
Competitive advantages
Customer satisfaction evaluation
Business process improvement
Increase sales
…
8. Open Source Big Data - Stratebi
New data analysis techniques and processesNew data analysis techniques and processes
New BI solutions
New visual resources
New data sources
Cloud solutions
Latest trends
Social Intelligence
Mailing intelligence
…
9. Open Source Big Data - Stratebi
Corporations and organizations noticeCorporations and organizations notice
that…that…
16. Open Source Big Data - Stratebi
Big Data ArchitectureBig Data Architecture
17. Open Source Big Data - Stratebi
Scalability
Vertical
+ CPU
+ RAM
Data types
Structured
Unstructured
Current challengesCurrent challenges
Horizontal
More nodes
18. Open Source Big Data - Stratebi
Unstructured
Structured
Data typesData types
A data structure is a particular way of storing and organizing
data in a computer so that it can be used efficiently.
List: http://en.wikipedia.org/wiki/List_of_data_structures
Primitive data types: Boolean, chart, float, double …
Unstructured information refers to information that either does
not have a pre-defined data model or is not organized in a
pre-defined manner.
19. Open Source Big Data - Stratebi
Data read
High data read cost in
JOINS
Massive Joins
Relational model
Current challengesCurrent challenges
Transactional
Are transactions required
and consistent?
Can it be represented as a relational
model?
20. Open Source Big Data - Stratebi
Types of Big Data DBs. Not Only SQL (NoSQL)Types of Big Data DBs. Not Only SQL (NoSQL)
In response to these problems a NoSQL paradigm appeared.
NoSQL is not a substitute for relational databases
Instead it is used in other specific scenarios
Not all problems can be solved using a RDBMS
Developer has a range of possibilities and can select the best to deal
with a specific problem
There are several NoSQL systems focusing on typical issues (scaling,
increasing performance…) in a different way
21. Open Source Big Data - Stratebi
Types of Big Data DBs. Not Only SQL (NoSQL)Types of Big Data DBs. Not Only SQL (NoSQL)
Key-Value data stores
Columnar
databases
Document-oriented
databases
Graph databasesObject oriented
databases
Do not replace relational model. Specific scenarios.Do not replace relational model. Specific scenarios.
22. Open Source Big Data - Stratebi
Key-Value stores
Easy to use
Value stored in a collection of binary
data (BLOB)
Content is not relevant to database,
only the key and its associated value
are important
No schema required (columns, data
types) to store information
Scalability: from key X to X+100 in Server 1, from X+101 to X+200 in Server2
23. Open Source Big Data - Stratebi
Document-oriented databases
Key-value store with the special feature that store is not stored
with a predefined format and not as a binary field.
24. Open Source Big Data - Stratebi
Object oriented databases
Systems in which information is represented in the form
of objects
Based in OID and not in primary keys
Hierarchical relations can be represented
Object-oriented database management systems never had the expected
impact, but have several market niches such as some scientific applications
25. Open Source Big Data - Stratebi
Graph databases
Graph structures with nodes, edges, and properties used
to represent and store data
Compared with relational databases, graph databases
are often faster for associative data sets
Only useful if your data can be represented using a
network
26. Open Source Big Data - Stratebi
Columnar databases
Column databases store data tables as sections of columns of
data rather than as rows of data.
Reduce read time
Inefficient on writing operations
Used in data warehouses and
Business Intelligence systems
Ideal for calculating indicators
over aggregated data
27. Open Source Big Data - Stratebi
Are these DBs?Are these DBs?
28. Open Source Big Data - Stratebi
A brief historical review…
First Google implementations needed multiplying
huge matrices to calculate PageRanks
In order to manage big data sets algorithms and frameworks
capable of processing terabytes were created
An early application able
to carry out MapReduce
data processing paradigm
was implemented in
Hadoop, initially designed
by Doug Cutting
29. Open Source Big Data - Stratebi
Software framework that supports distributed applications,
licensed under the Apache v2 license.
Hadoop was derived from Google's MapReduce and
Google File System papers
is the largest contributor to the project
Written in the Java programming language
Hadoop is based in a file system and is not a database
About Apache HadoopAbout Apache Hadoop
30. Open Source Big Data - Stratebi
About Apache HadoopAbout Apache Hadoop
31. Open Source Big Data - Stratebi
Why use Hadoop?Why use Hadoop?
Need to compress data
Nodes fail every day
Common infrastructure
Efficient
Easy to use
Open Source
32. Open Source Big Data - Stratebi
Why use Hadoop?Why use Hadoop?
33. Open Source Big Data - Stratebi
Common usesCommon uses
Searches
Log processing
Recommendation systems
Analytics (Facebook, Linkedin)
Image and video processing (NASA)
Data retention
34. Open Source Big Data - Stratebi
Hadoop ComponentsHadoop Components
35. Open Source Big Data - Stratebi
HDFS file systemHDFS file system
36. Open Source Big Data - Stratebi
HDFS file systemHDFS file system
Hadoop Distributed File System (HDFS) is a distributed file
system
Each node in a Hadoop instance typically has a single
data node
Uses the TCP/IP layer for communication
Achieves reliability by replicating the data across
multiple hosts
Data nodes can talk to each other to rebalance data,
to move copies around, and to keep the replication of
data high
37. Open Source Big Data - Stratebi
MAP ReduceMAP Reduce
Consists in a Job
Tracker
Job Tracker assigns a
task to idle Task Tracker
nodes in the cluster
38. Open Source Big Data - Stratebi
How to do MapReduce?How to do MapReduce?
Map
The Map function is applied in parallel to every pair
in the input dataset and produces a list of pairs for
each call
Map (key1, value1) –> list (key2, value2)
39. Open Source Big Data - Stratebi
How to do MapReduce?How to do MapReduce?
Reduce
Reduce phase collects all pairs with the same key
from all lists and groups them together, creating
one group for each key
Reduce function is then applied in parallel to each
group created by Map() function and produces a
collection of values in the same domain
Thus the MapReduce framework converts a list of
(key, value) pairs into a list of values
Reduce (key2, list(value2)) –> list(value3)
42. Open Source Big Data - Stratebi
MapReduce WordCount exampleMapReduce WordCount example
43. Open Source Big Data - Stratebi
MapReduce WordCount exampleMapReduce WordCount example
bin/hadoop jar hadoop-*-examples.jar wordcount [-m <#maps>] [-r <#reducers>] <in-dir> <out-dir>
44. Open Source Big Data - Stratebi
MapReduce WordCount exampleMapReduce WordCount example
45. Open Source Big Data - Stratebi
Sounds difficultSounds difficult
Are there anyAre there any
tools to help us?tools to help us?
46. Open Source Big Data - Stratebi
What is HBase?What is HBase?
HBase is an open source distributed database modeled
after Google's BigTable
Hbase allows linear scaling by adding more servers to
the system
Runs on top of HDFS, providing BigTable-like capabilities
for Hadoop
HBase is written in Java
47. Open Source Big Data - Stratebi
What is HBase?What is HBase?
Hbase is suitable when you require high read/write
speeds in a BigData infrastructure.
HBase is able to store enormous tables (billions of rows
and millions of columns) in a cluster composed by basic
nodes
Working modes
48. Open Source Big Data - Stratebi
What is HBase?What is HBase?
Hbase commands
49. Open Source Big Data - Stratebi
What is Hive?What is Hive?
Hive is a data warehouse infrastructure built on top of
Hadoop for providing data summarization, query, and
analysis
Provides an SQL-like language called HiveQL while
maintaining full support for map/reduce
Built-in user defined functions (UDFs) to manipulate
dates, strings, and other data-mining tools.
Hive supports extending the UDF set to handle use-
cases not supported by built-in functions
50. Open Source Big Data - Stratebi
I am a complete JavaI am a complete Java
noob and need help…noob and need help…
What can I do?What can I do?
51. Open Source Big Data - Stratebi
Graphical ETL tool
included in Pentaho suite
Built to help in processes
of Extracting, Transporting,
Transforming and Loading
data.
Supports deployment on
single node computers as
well as on a cloud, or
cluster.
What is Kettle?What is Kettle?
52. Open Source Big Data - Stratebi
• View perspective:
• Database connections
• Steps
• Hops
• Slave server
• Kettle cluster schemas
• Design perspective:
• Inputs
• Outputs
• Lookups
• Transform
• Joins
• Scripting
• Data Warehouse
• Mapping
• Job
• Inline
• Experimental
53. Open Source Big Data - Stratebi
Main Big Data steps in KettleMain Big Data steps in Kettle
54. Open Source Big Data - Stratebi
Word Count exampleWord Count example
55. Open Source Big Data - Stratebi
Word Count exampleWord Count example
Configuring MapReduceConfiguring MapReduce
56. Open Source Big Data - Stratebi
Word Count exampleWord Count example
Configuring MapReduceConfiguring MapReduce
57. Open Source Big Data - Stratebi
Word Count exampleWord Count example
58. Open Source Big Data - Stratebi
Word Count exampleWord Count example
59. Open Source Big Data - Stratebi
Configuring MapReduce with HbaseConfiguring MapReduce with Hbase
60. Open Source Big Data - Stratebi
Configuring MapReduce with HbaseConfiguring MapReduce with Hbase
61. Open Source Big Data - Stratebi
Using Hive as data sourceUsing Hive as data source
62. Open Source Big Data - Stratebi
Big Data project and Business IntelligenceBig Data project and Business Intelligence
63. Open Source Big Data - Stratebi
Big Data project and Business IntelligenceBig Data project and Business Intelligence
64. Open Source Big Data - Stratebi
Big Data project and Business Intelligence.Big Data project and Business Intelligence.
Smart City Case StudySmart City Case Study
65. Open Source Big Data - Stratebi
Visualization – Social Media dashboardsVisualization – Social Media dashboards
66. Open Source Big Data - Stratebi
Visualization – Operational dashboardVisualization – Operational dashboard
67. Open Source Big Data - Stratebi
Visualization – Operational dashboardVisualization – Operational dashboard
68. Open Source Big Data - Stratebi
Visualization- Geographic dashboardVisualization- Geographic dashboard
69. Open Source Big Data - Stratebi
Visualization – Advanced charts (Treemap, Sunburst ...)Visualization – Advanced charts (Treemap, Sunburst ...)
71. Open Source Big Data - Stratebi
Stratebi is a Spanish company located in Madrid,
Barcelona and with a delegation in Sao Paulo, we are
a group of professionals with a wide experience in
Information systems and Technologic solutions related
to the field of open source software and Business
Intelligence.
Contact details:
info@stratebi.com
www.stratebi.com
Phones: (+34) 917883410 - (+34) 931844325
About usAbout us