The document outlines Oracle's Big Data Appliance product. It discusses how businesses can use big data to gain insights and make better decisions. It then provides an overview of big data technologies like Hadoop and NoSQL databases. The rest of the document details the hardware, software, and applications that come pre-installed on Oracle's Big Data Appliance - including Hadoop, Oracle NoSQL Database, Oracle Data Integrator, and tools for loading and analyzing data. The summary states that the Big Data Appliance provides a complete, optimized solution for storing and analyzing less structured data, and integrates with Oracle Exadata for combined analysis of all data sources.
2. The following is intended to outline our general product
direction. It is intended for information purposes only, and
may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing
decisions.
The development, release, and timing of any features or
functionality described for Oracle’s products remain at the
sole discretion of Oracle.
3. Agenda
• The Business of Big Data
• Big Data Technology
• Inside the Big Data Appliance
• Overview
• Applications
• Summary
• Q&A
5. Big Data: Acting on New Data
“I think” “I want”
Retail
Decisions
Stores
Web
Search
Social
Networks
Catalog/
Call
Center
“I found it”
Looking back
“PAST”
Looking ahead
“FUTURE”
60%
Potential increase in
retailers’ operating margins
possible with Big Data
McKinsey Global Institute:
Big DataThe next frontier for innovation, competition and productivity (May 2011)
6. Tapping into Diverse Data Sets
Transactions
Information
Architectures
Today:
Decisions based
on database data
Big Data:
Decisions based
on all your data
Video and Images
Machine-Generated Data
Social Data
Documents
7. Case: On-line Ads and Content
NoSQL
DB
Expert
System
Real-time: Determine
best ad to place
on page for this user
Input into
Lookup user
profile
Add user
if not present
Web
logs
HDFS
Profiles
NoSQL DB
High scale
data reductions BI and
Analytics
Billing
Predictions
on browsing
Actual
ads
served
Low
Latency
Batch
8. Case: On-line Adds and Content
NoSQL DB
HDFS
Hadoop
RDBMS
• Dynamic and rapidly changing schema
• Scalable single record lookup
• Low cost, high scale storage
• Write once, read many times
• High scale batch processing
• Highly customizable infrastructure
• Deep analytics and BI value add
• Reporting for large user community
10. • Deep Analytics
• Agile Development
• Massive Scalability
• Real Time Results
• High Throughput
• In-Place Preparation
• All Data Sources/Structures
• Low, predictable Latency
• High Transaction Volume
• Flexible Data Structures
Big Data: Infrastructure Requirements
Acquire Organize Analyze
11. Divided Solution Spectrum
Acquire AnalyzeOrganize
MapReduce
Solutions
DBMS
(DW)
DBMS
(OLTP)
Advanced
Analytics
Distributed
File Systems
Transaction
(Key-Value)
Stores
ETL
NoSQL
Flexible
Specialized
Developer
Centric
SQL
Trusted
Secure
Administered
Schema-less
Unstructured
Data
Variety
Schema
Information
Density
16. •18 Sun X4270 M2 Servers
– 48 GB memory per node = 864 GB memory
– 12 Intel cores per node = 216 cores
– 24 TB storage per node = 432 TB storage
•40 Gb p/sec InfiniBand
•10 Gb p/sec Ethernet
Oracle Engineered SystemsOracle Big Data Appliance Hardware
17. Big Data Appliance
Cluster of industry standard servers for Hadoop and NoSQL Database
• Focus on Scalability and Availability at low cost
Compute and Storage
• 18 High-performance low-cost
servers acting as Hadoop
nodes
• 24 TB Capacity per node
• 2 6-core CPUs per node
• Hadoop triple replication
• NoSQL Database triple
replication
10GigE Network
• 8 10GigE ports
• Datacenter connectivity
InfiniBand Network
• Redundant 40Gb/s switches
• IB connectivity to Exadata
18. Big Data Appliance Building Block
• High-performance storage server built from
industry standard components
• 12 disks - 2TB 7200 RPM
High Capacity SAS
• 2 Six-Core Intel Xeon Processors (L5640)
• Dual ported 40 Gb/sec InfiniBand
• Optimized software layout:
• Hadoop HDFS
• HBase and Hive
• NoSQL Database and Replicas
• Hardware by Sun
• Software by Oracle
19. Scale Out to Infinity
Scale out by connecting racks
to each other using Infiniband
•60 Nodes
•864 Cores
•1.7 PB Storage
20. •Oracle Linux 5.6
•Java Hotspot VM
•Apache Hadoop Distribution v0.20.x
•R Distribution
•Oracle NoSQL Database Enterprise
Edition
•Oracle Data Integrator Application
Adapter for Hadoop
•Oracle Loader for Hadoop
Oracle Big Data Appliance Software
21. Why Open-Source Apache Hadoop?
• Fast evolution in critical features
• Built by the Hadoop experts in the community
• Practical instead of esoteric
• Focus on what is needed for large clusters
• Proven at very large scale
• In production at all the large consumers of Hadoop
• Extremely stable in those environments
• Well-understood by practitioners
22. Software Layout
• Node 1:
• M: Name Node, Balancer & HBase Master
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 2:
• M: Secondary Name Node, Management,
Zookeeper, MySQL Slave
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 3:
• M: JobTracker, MySQL Master, ODI Agent,
Hive Server
• S: HDFS Data Node, NoSQL DB Storage Node
• Node 4 – 18:
• S: HDFS Data Nodes, Task Tracker, HBase
Region Server, NoSQL DB Storage Nodes
• Your MapReduce runs here!
23. Big Data Appliance
Usage Model
Oracle
Big Data Appliance
Oracle
Exadata
InfiniBand
Acquire Organize Analyze & VisualizeStream
Oracle
Exalytics
InfiniBand
24. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate
your lower information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and
software set
26. Oracle NoSQL DB
A distributed, scalable key-value database
• Simple Data Model
• Key-value pair with major+sub-key paradigm
• Read/insert/update/delete operations
• Scalability
• Dynamic data partitioning and distribution
• Optimized data access via intelligent driver
• High availability
• One or more replicas
• Disaster recovery through location of replicas
• Resilient to partition master failures
• No single point of failure
• Transparent load balancing
• Reads from master or replicas
• Driver is network topology & latency aware
• Elastic (Planned for Release 2)
• Online addition/removal of Storage Nodes
• Automatic data redistribution
Storage Nodes
Data Center A
Storage Nodes
Data Center B
NoSQLDB Driver
Application
NoSQLDB Driver
Application
27. NoSQL DB
Big Data Appliance
System Layout
Master Node
Replicas
Note: For illustration purposes only!
28. Oracle NoSQL DB Differentiation
• Commercial Grade Software and Support
• General-purpose
• Reliable – Based on proven Berkeley DB JE HA
• Easy to install and configure
• Scalable throughput, bounded latency
• Simple Programming and Operational Model
• Simple Major + Sub key and Value data structure
• ACID transactions
• Configurable consistency & durability
• Easy Management
• Web-based console, API accessible
• Manages and Monitors: Topology; Load; Performance; Events; Alerts
• Completes Oracle large scale data storage offerings
30. Streaming Access to HDFS
HDFS
HDFS
HDFS
HDFS
HDFS
Datafile_part_1
Datafile_part_2
Datafile_part_m
Datafile_part_n
Datafile_part_x
Oracle Database
FUSE
External
Table
View
Or
Table
Function
Reduce
Map
Query
31. Oracle Data Integrator
Easily integrate data from any source
Expanded functionality:
=> Construct Hadoop jobs to transform and load
data into Oracle
=> Leverage Oracle Loader for Hadoop and/or
Hive
33. Big Data
• Big data can improve your
top line today!
• Big data can make you
much more agile
• Provides an edge over your
competitors
Opportunity
Threat
• Big data is here – now
• Your competitors will not
miss out on the opportunity
• Act now! Start building a
big data platform for your
organization
34. Big Data Appliance and Exadata
Big Data for the Enterprise
NoSQL DB
HDFS
Hadoop
RDBMS
35. Big Data Appliance
Big Data for the Enterprise
• Optimized and Complete
• Everything you need to store and integrate your lower
information density data
• Integrated with Oracle Exadata
• Analyze all your data
• Easy to Deploy
• Risk Free, Quick Installation and Setup
• Single Vendor Support
• Full Oracle support for the entire system and software
set