The document discusses big data and Oracle technologies. It provides an overview of big data, describing what it is and examples of big data in different industries. It then discusses several Oracle technologies for working with big data, including Oracle NoSQL Database for scalable key-value storage, Oracle R for statistical analysis and connecting to Hadoop, and Oracle Endeca for information discovery.
4. What Is Big Data?
Big Data – is data that becomes large
enough that it cannot be processed
using conventional methods
Big Data – is the new generation
of data warehousing and
business analysis systems
010101101010100101010101010101010010101010100101010101001010101010010101010101010100101
010100101010101010101001010101010010101010101010010101001010101001010101001010101001010
101010101001010101010100101010100101010100101010100101010101001010101001010101001010101
010010101010100101010010101001010101001010101001010100101010010101010010101010010101010
010101010010101001010101010101010101010101010010101010010101010010101010010101010010101
010010101010010101010010101010010101010010101010010101010010100101010100101010100101010
5. A Wider Variety of Data
Internet
Data
Clickstream
Social media
Social media stream
Web site logs
Research
Data
Experiments
Observations
Surveys
Marketplace data
Healthcare
Data
Treatment data
Telehealth
National Electronic Health Records
Procedures
Image
Data
Image
Video
Satellite image
Surveillance
Device
Data
RF Devices
Sensors
EDI
Telemetry
6. Why Is Big Data Important?
Big Data - Just another buzzword
or powerful business & science enabler?
SQL
Analytics
• Count
• Mean
• OLAP
Descriptive
Analytics
• Univariate
distribution
• Central
tendency
• Dispersion
Data Mining
• Association
rules
• Clustering
• Feature
extraction
Predictive
Analytics
• Classification
• Regression
• Forecasting
• Spatial
• Machine
Learning
• Text
Analytics
Simulation
• Monte Carlo
• Agent-based
modeling
• Discrete
event
modeling
Optimizatio
n
• Linear
Optimization
• Non-Linear
Optimization
Business Intelligence Advanced Analytics
8. Marketing & Sales + Big Data
TO DELIVER AN ANSWER
100 milliseconds
COUNT OF ADS
100,000 per SECOND
http://www.dataxu.com/
ADVERTISING
PLATFORM
Clickstream, Behavior
9. Retail + Big Data
CAPTURE
1,000 tweets per SECONDS
INCREASE OF DATA
+10 TB per DAY
http://www.walmart.com/
WAL-MART ONLINE
MARKETING
Social Media
10. Health Care + Big Data
INCREASE OF DATA EACH MONTH
+10 TB
PATIENTS INVOLVED
10,000
https://cghub.ucsc.edu/index.html/
CANCER GENOMICS
HUB
DNA and RNA data
11. Science + Big Data
SEVEN TELESCOPES CAPTURE
2 MB per SECOND
IN NEXT 10-15 YEARS ALL
TELESCOPES WILL RECEIVE
30 TB per SECOND
http://www.skatelescope.org/
THE CATALOG OF
UNIVERSE
Data from Telescope
13. Oracle NoSQL
Hadoop Distributed File
System (HDFS)
Oracle NoSQL Database
File System Database
Parallel scanning Indexed storage
No inherent structure Simple data structure
High volume writes
High volume random reads
and writes
Batch Oriented Real-Time
Big Data Storage Choices
14. Oracle NoSQL
• RDBMS
– High value, high density,
complex data
– Complex data relationships
– Schema-centric
– Designed to scale up & out
– Lots of general purpose
features/functionality
High overhead ($ per
operation)
• NoSQL architectures
– Low value, low density, simple
data
– Very simple relationships
– Schema-free, unstructured or
semi-structured data
– Distributed storage and
processing
– Stripped down, special
purpose data store
Lower overhead ($ per
operation)
15. Oracle NoSQL
Simple Data Model
Small, distributed footprint
Highly scalable, available
Transparent load
balancing
Integrates with Oracle
Stack
Application
Storage Nodes
Datacenter B
Storage Nodes
Datacenter A
NoSQL Database
Driver
Application
NoSQL Database
Driver
A Distributed, Scalable Key-Value Database
16. Oracle NoSQL
Key-value pairs
• Simple data model – key-value pair (major+minor-key paradigm)
• Simple operations – read/insert/update/delete, RMW support
• Scope of transaction – records within a major key, single API call
• Unordered scan of all data (non-transactional)
userid
addresssubscriptions
email idphone #expiration date
Major key:
Sub key:
Value:
Strings
Byte Array
18. Oracle NoSQL
Getting Started with Oracle NoSQL DB
1. Download from OTN:
www.oracle.com/technetwork/products/nosqldb/
downloads/index.html
2. Review Quick Start & Getting Started
Guide
3. Review Programmatic API Guide
4. Start writing Java code
19. What is R?
• R is an Open Source language and
environment for statistical computing
and graphics
http://www.R-project.org/
• Started in 1994 as an Alternative to
SAS, SPSS & Other proprietary
Statistical Environments
• The R environment
– R is an integrated suite of software facilities for data
manipulation, calculation and graphical display
• Around 2 million R users worldwide
– Widely taught in Universities
– Many Corporate Analysts know and use R
• Thousands of open sources R
packages to enhance productivity such
as:
– Bioinformatics
– Spatial Statistics
– Financial Market Analysis
20. Why statisticians/data analysts use
R?
R environment is ..
• Powerful
• Extensible
• Graphical
• Extensive statistics
• OOTB functionality with
many ‘knobs’ but
smart defaults
• Ease of installation and use
• Free
21. Limitations of R
• R is a client and server bundled together as 1 executable
– Single user tool, like Excel
– Single-threaded
– Cannot leverage multi-CPU capacity without use of special
packages and coding
• R requires data to be loaded into memory first
– Loading data may not be a limitation given RAM available on
laptops/desktops
– R’s call by value semantics means that as data flows into functions,
for each function invocation, a complete copy of the data is made
– As a result you can quickly run into memory limits
22. Oracle R Connector for Hadoop
• Provides transparent access to Hadoop Cluster, which
consists of MapReduce and HDFS-resident data
• R users not required to learn new language or interface to
work with Hadoop
• R users can execute jobs on a Hadoop cluster without
requiring knowledge of Hadoop internals, Hadoop CLI, or
IT infrastructure
• Ability to leverage open source contributed R packages to
work on HDFS-resident data
23. Oracle R Enterprise
• Provides familiar R environment to operate on database-
resident data
• Overloads base R functions for scalable execution in
Oracle Database
– Automatically generates SQL from R and submits query to
database
– Leverages table parallelism where applicable
• Enables embedded execution of R scripts at Oracle
Database server
– Provides database-controlled data-parallel execution framework
– Enables leveraging CRAN open source R packages
• Enables integration of structured results and graphics with
OBIEE dashboards and BI Publisher documents
24. Oracle R Links
• Blog: https://blogs.oracle.com/R/
• Forum: https://forums.oracle.com/forums/forum.jspa?forumID=1397
• Oracle R Distribution:
http://www.oracle.com/technetwork/indexes/downloads/r-distribution-1532464.html
• ROracle:
http://cran.r-project.org/web/packages/ROracle
• Oracle R Enterprise:
http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise
• Oracle R Connector for Hadoop:
http://www.oracle.com/us/products/database/big-data-connectors/overview
25. Other Oracle Big Data Products
Oracle Endeca Information Discovery
http://www.oracle.com/us/solutions/business-analytics/business-
intelligence/endeca/overview/index.html
Oracle Data Integrator Application Adapter for Hadoop
http://www.oracle.com/us/products/middleware/data-
integration/hadoop/overview/index.html
Oracle Loader for Hadoop
http://www.oracle.com/technetwork/bdc/hadoop-loader/learnmore/index.html
26. The End
The best way to predict the future is to
create it!
- Peter F. Drucker