Best Practices for Development Apps for Big Data

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 12
1

Developing a Successful
Big Data Strategy
Best Practices for Development
Raul Goycoolea S.
Solution Architect Manager
Oracle Latin America
Architecture Team
Mexico Developer Day, Apr 2014

3
<Insert Picture Here>
Twitter
http://twitter.com/raul_goycoolea
Raul Goycoolea Seoane
Keep in Touch
Facebook
http://www.facebook.com/raul.goycoolea
Linkedin
http://www.linkedin.com/in/raulgoy
Blog
http://blogs.oracle.com/raulgoy/
Raul Goycoolea S.
Multiprocessor Programming
3
16 February 2012

4
Agenda
 Introduction
 Architecture/Design Pattern
 Use Cases

5
Who are you?
http://goo.gl/XkwxwM

6
MEDIA/
ENTERTAINMENT
Viewers / advertising
effectiveness
Cross Sell
COMMUNICATIONS
Location-based
advertising
EDUCATION &
RESEARCH
Experiment
sensor analysis
Retail / CPG
Sentiment analysis
Hot products
Optimized Marketing
HEALTH CARE
Patient sensors,
monitoring, EHRs
Quality of care
LIFE
SCIENCES
Clinical trials
Genomics
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty analysis
OIL & GAS
Drilling
exploration
sensor analysis
FINANCIAL
SERVICES
Risk & portfolio analysis
New products
AUTOMOTIVE
Auto sensors
reporting
location,
problems
Games
Adjust to
player
behavior
In-Game Ads
LAW
ENFORCEMENT
& DEFENSE
Threat analysis -
social media
monitoring, photo
analysis
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer sentiment
UTILITIES
Smart Meter
analysis for
network
capacity,
Sample of Big Data Use Cases Today
ON-LINE
SERVICES /
SOCIAL
MEDIA
People & career
matching
Web-site
optimization
What is the main difference in this data?
Volume, Velocity, Variety
These Characteristics Challenge Existing
Architectures

Make
Better
Decisions
Using
Big Data
Big Data In Action
ANALYZE
DECIDE ACQUIRE
ORGANIZE

Analyze all your
data, at once
Big Data in Action
ANALYZE
DECIDE ACQUIRE
ORGANIZE
ANALYZE

Strategic Transformations
Unified View
Real-time,
predictive
Insight-driven
optimization
TO
Fragmented
View
Historical
Reporting
Results
FROM

Traditional Data Sources - Reporting

Big Data Analysis Characteristics
• Integrate
– Traditional and New data
• Explore
– More data, More sources
• Discover
– Plan, Visualize, Model, Act

Big Data Analysis In Retail: The Problem
Fashion retailer sees flat
and declining sales
No apparent differences
by geography or standard
demographics
New marketing program
didn’t help

Step 1: New Segmentation
• Analyze weblog files
– Response rates
– Frequency and duration of visits
– Shopping cart activity
– Devices used to access
• Cross reference with demographics
– Affinity program
– Online profiles
• New insight: younger, affluent women are not buying

Step 2: Sentiment Analysis
• Analyze all comments
– Social media, forums
• Cross reference with customer information
– Affinity programs
– Online activity
– Sales records
• New insight: new segment expresses “out of stock”

Step 3: Inventory Analysis
• Analyze promoted products
– No stocking problems
• Cross-reference with all shopper activities
– Online shopping cart activity
– Affinity program
– Shopper location information
– “Out of stock” comments
• Key insight: matching accessories are out of stock

Big Data Analysis In Retail: The Answer
Young women with higher
disposable income (and
smart phones) did not buy
a designer sweater when
the matching sleeveless
top was out of stock.

Exadata Exalytics
Oracle Big Data Platform
ACQUIRE ORGANIZE DECIDE
ANALYZE
Big Data
Appliance

Oracle Exadata Database Machine
• Fastest Data Warehouse & OLTP
• Best Cost/Performance Data Warehouse & OLTP
• Optimized Hardware (per rack)
• Processor: up to128 Intel Cores and 2 TB DRAM
• Network: 880 Gb/Sec Throughput
• Storage: 5 TB Flash and up to 336 TB Disk
• Software Breakthroughs
• Exadata Smart Storage Grid
• Smart Flash Cache
• Hybrid Columnar Compression
• Parallel Scale-Out Database and Storage
• Scales from ¼ Rack to 8 Full Racks
Data Warehousing, Transaction Processing, Consolidation

Oracle In-Database Analytics Platform
XML Relational OLAP Spatial
Data Layer RDF Media
Parallel Processing Engine
Oracle R
Enterprise
Oracle
Data Mining
Text and
Search
Spatial
Analytics
SQL
Analytics
Oracle
MapReduce

Oracle In-Database Analytics
New: Oracle Advanced Analytics
2 miles
Statistical
Data Mining
Text
Graph
Spatial
Semantic

Oracle Exalytics In-Memory Machine
First engineered
system for analytics
Visual Analysis
without limits
Smarter analytic
applications

End-user Experience with Exalytics
Speed of Thought Interactive Analysis
Interactive Analysis
Free Exploration
Dense Visualizations
Fully Mobile

Over 80 Analytic Applications Run on Exalytics
No application changes required
Financials, HR
Sales, marketing
Planning, forecasting
Many industries

Analyzing Big Data
• Comprehensive
• Enterprise ready
• Engineered to work together
• Optimized for extreme analytics

26
Oracle
Exadata
Oracle
Exalytics
Oracle Big Data Platform
Stream Acquire Organize Discover & Analyze
Oracle Big Data
Appliance
Oracle
Big Data
Connectors Optimized for
Analytics & In-Memory Workloads
“System of Record”
Optimized for DW/OLTP
Optimized for Hadoop,
R, and NoSQL Processing
Oracle Enterprise
Performance
Management
Oracle Business
Intelligence Applications
Oracle Business
Intelligence Tools
Oracle Endeca
Information Discovery
Hadoop
Open Source R
Applications
Oracle NoSQL
Database
Oracle Big Data
Connectors
Oracle Data
Integrator
Data
Warehouse
Oracle Advanced
Analytics
Oracle
Database

27
Use Case Introduction
 Oracle MoviePlex is an on-line movie
streaming company
 Like many other on-line stores, they needed
a cost effective approach to tackle their “big
data” challenges
 They recently implemented Oracle’s Big
Data Platform to better manage their
business, identify key opportunities and
enhance customer satisfaction

28
Common Big Data Challenge
 Applications are generating massive
volumes of unstructured data that
describe user behavior and application
performance
 Today, most companies are unable to
fully capitalize on this potentially valuable
information due to cost and complexity
 How do you capitalize on this raw data to
gain better insights into your customers,
enhance their user experience and
ultimately improve profitability?
{"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}
{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}
{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}

29
Common Big Data Challenge
 Applications are generating massive
volumes of unstructured data that
describe user behavior and application
performance
 Today, most companies are unable to
fully capitalize on this potentially valuable
information due to cost and complexity
 How do you capitalize on this raw data to
gain better insights into your customers,
enhance their user experience and
ultimately improve profitability?
{"custId":1143971,"movieId":1017161,"genreId":44,"time":"2012-07-
01:00:04:00","recommended":"Y","activity":7}
How can you get answers to….

30
Derive Value from Big Data
 Make the right movie offers at the right time?
 Better understand the viewing trends of various customer
segments?
 Optimize marketing spend by targeting customers with optimal
promotional offers?
 Minimize infrastructure spend by understanding bandwidth usage
over time?
 Prepare to answer questions that you haven’t thought of yet!
How can you ….

31
Oracle Exadata
Oracle Big Data Appliance
MoviePlex Architecture
Application
Log
Log of all activity
on site
Capture activity nec.
for MoviePlex site
Streamed into
HDFS using
Flume
Load Recommendations
Customer Profile
(e.g. recommended
movies)
Oracle NoSQL DB
HDFS
Map Reduce
ORCH - CF Recs.
Map Reduce
Hive - Activities
Map Reduce
Pig - Sessionize
Clustering/Market Basket
Oracle Advanced
Analytics
Oracle Exalytics
Endeca
Information
Discovery
Oracle Business
Intelligence EE
“Mood”
Recommendations
Load Session & Activity Data
Oracle Big Data
Connectors

32
Acknowledgements
 Movie information courtesy of The Internet Movie
Database (http://www.imdb.com). Used with permission.
 Movie images provided by the TMDb API but is not
endorsed or certified by TMDb
 All customer information and session details are
completely fictitious

33
ANALYZE
DECIDE
ACQUIRE
ORGANIZE
DISCOVER
VISUALIZE
Oracle’s Big Data Platform
STREAM
OPERATIONALIZE

34
Program Agenda
 Oracle SQL Connector for HDFS
– Brief Overview
– Hands-on Exercises
 Oracle Loader for Hadoop
– Brief Overview
– Hands-on Exercises
 (Optional exercise): Use both connectors together

35
Loading and Accessing Data from Hadoop
SHUFFLE
/SORT
SHUFFLE
/SORT
MAP
MAP
MAP
MAP
SHUFFLE
/SORT
REDUCE
REDUCE
INPUT
2
INPUT
1
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
REDUCE
Oracle SQL Connector
for HDFS
Oracle Loader
for Hadoop
Oracle Database
LOG FILES
REDUCE

36
Hadoop Oracle Database
Oracle Big Data
Connectors

37
ACQUIRE ORGANIZE ANALYZE
Oracle Big Data
Connectors
Hadoop
Big Data Connectors
work with
• Oracle’s
engineered
systems, and
• Other hardware
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE

38
Oracle SQL Connector for
HDFS

39
Oracle SQL access to
Hive tables and HDFS
files
Automated generation of
external table to access
the data
Query data in-place or
load
Access or load data in
parallel
Oracle SQL Connector for HDFS
High Performance Access and Load from Hadoop with Oracle SQL
External
Table
ODCH
ODCH
OSCH
SQL Query
Hadoop Oracle Database

40
Part 1a: Reading Hive Tables with
 cd /home/oracle/movie/moviework/osch
– This directory contains the scripts
genloc_moviefact_hive.sh, moviefact_hive.xml
 Execute the script
 sh genloc_moviefact_hive.sh
– (the password is: welcome1)

41
Part 1a
 The script sh genloc_moviefact_hive.sh
hadoop jar $OSCH_HOME/jlib/orahdfs.jar
oracle.hadoop.exttab.ExternalTable
-conf /home/oracle/movie/moviework/osch/moviefact_hive.xml
-createTable

42
Part 1
 Examine the Hadoop configuration properties
– more moviefact_hive.xml
moviefact_hive.xml

43
Part 1a
 Query the table
– sqlplus moviework/oracle
SQL> select count(*) from movie_fact_ext_tab_hive;
SQL> select custid from movie_fact_ext_tab_hive where
rownum < 10;
SQL> select custid, title from movie_fact_ext_tab_hive p,
movie q where p.movieid = q.movieid and
rownum < 10;

44
Installing Oracle SQL Connector for HDFS
External
Table
Hadoop Cluster Oracle Database System
OSCH Hadoop
Client
Hive Client
OSCH

45
Part 1b: Reading Text Files on HDFS with
 cd /home/oracle/movie/moviework/osch
– This directory contains the scripts
genloc_moviefact_text.sh, moviefact_text.xml
 Execute the script
 sh genloc_moviefact_text.sh
– (the password is: welcome1)

46
Part 1b
 The script genloc_moviefact_text.sh
hadoop jar $OSCH_HOME/jlib/orahdfs.jar
oracle.hadoop.exttab.ExternalTable
-conf /home/oracle/movie/moviework/osch/moviefact_text.xml
-createTable

47
Part 1b
– more moviefact_file.xml
moviefact_text.xml

48
Performance Comparison
0
1
2
3
4
5
6
Fuse-DFS Oracle Direct Connector for
HDFS
Load rate (TB/hour)
Load speed comparison CPU usage comparison
Fuse DFS
0
20
40
60
80
100
120
140
160
180
Fuse-DFS Oracle Direct Connector
for HDFS
CPU
seconds
used
per
GB
CPU Usage

49
Oracle Loader for Hadoop

50
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
High Performance
Loader
Convert data into Oracle
ready data types on Hadoop
Offload data pre-processing
from the database server to
Hadoop
Load pre-processed data
online or offline
Automatically handle input
data skew
Works with a range of input
formats
Connect to the database from
reducer nodes, load into
database partitions in
parallel
Partition, sort, and convert
into Oracle data types on
Hadoop

51
Part 2: Oracle Loader for Hadoop
 Examine the data files on HDFS
– hadoop fs -ls /user/oracle/moviedemo/session

52
Part 2: Oracle Loader for Hadoop
 cd /home/oracle/movie/moviework/olh
– This directory contains all the necessary scripts
moviesession.sql, moviesession.xml,
loaderMap_moviesession.xml, runolh_session.sh

53
Part 2
 Create the table data will be loaded into
– sqlplus moviedemo/welcome1
SQL> @moviesession.sql

54
Part 2
 Submit the Oracle Loader for Hadoop MapReduce job
– sh runolh_session.sh
hadoop jar ${OLH_HOME}/jlib/oraloader.jar
oracle.hadoop.loader.OraLoader
-conf home/oracle/movie/moviework/olh/moviesession.xml

55
Part 2
– more moviesession.xml
moviesession.xml

56
Part 2
 Examine the loaderMap file
– more loaderMap_moviesession.xml
loaderMap_moviesession.xml

57
Installing Oracle Loader for Hadoop
Target
Table
Hadoop Cluster Oracle Database System
Hive Client
OLH

58
Performance Comparison
Load speed comparison CPU usage comparison
Third party products
0
0.5
1
1.5
2
2.5
Comparable third party
product
Load rate (TB/hour)
0
100
200
300
400
500
600
700
Comparable third party
product
CPU
seconds
used
per
GB
CPU Usage

59
Versions
 Certified Versions
– Oracle Database 11.2.0.2 and higher
– Hadoop distributions
 CDH3, CDH4 (versions of Cloudera’s Distribution including Apache Hadoop)
 Apache Hadoop 1.0.x, 1.1.1
 Should work with Hadoop distros based on certified Apache Hadoop
versions

60
Oracle Loader for Hadoop and Oracle SQL Connector for
HDFS
SHUFFLE
/SORT
SHUFFLE
/SORT
REDUCE
REDUCE
REDUCE
MAP
MAP
MAP
MAP
MAP
MAP
REDUCE
REDUCE
ORACLE LOADER FOR HADOOP
External
Table
ODCH
ODCH
OSCH
SQL Query
HDFS
Client
Oracle Database
ORACLE SQL CONNECTOR FOR HDFS
Offline load: Data pre-
processed and written
as Oracle Data Pump
format in HDFS.
Oracle Data Pump files in
HDFS queried (and
loaded if necessary)
with Oracle SQL
Connector of HDFS.

61
Thank You!
http://goo.gl/XkwxwM

62
<Insert Picture Here>
Twitter
http://twitter.com/raul_goycoolea
Raul Goycoolea Seoane
Keep in Touch
Facebook
http://www.facebook.com/raul.goycoolea
Linkedin
http://www.linkedin.com/in/raulgoy
Blog
http://blogs.oracle.com/raulgoy/
Raul Goycoolea S.
Multiprocessor Programming
62
16 February 2012

63

64

Best Practices for Development Apps for Big Data

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (19)

Similaire à Best Practices for Development Apps for Big Data

Similaire à Best Practices for Development Apps for Big Data (20)

Dernier

Dernier (20)

Best Practices for Development Apps for Big Data

Notes de l'éditeur