Contenu connexe Similaire à Big data and its impact on SOA (20) Big data and its impact on SOA1. Big Data
& its impact on SOA
Demed L’Her
Sr Director, Product Management, Oracle
demed.lher@oracle.com (twitter: @demed)
1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
2. Demed L’Her
• Senior Director, Product Management at Oracle –
Engineering team
• Based in Redwood Shores, California
• Team in charge of Oracle SOA Suite: Adapters, Service
Bus, BPEL, Event Processing, SOA Suite for Healthcare
(Java CAPS and WebLogic Integration)
• Responsible for product roadmap, execution
• With Oracle since 2006
• Co-author http://snipurl.com/soa11gbook
• Twitter: @demed | email: demed.lher@oracle.com
2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
3. Program Agenda
1. Big Data Trends
2. Big Data and SOA
3. Integration Patterns for Big Data
4. Fast Data
3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
4. Introduction to Big Data:
Problems, Trends
& Technology
4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
5. Data
Explosion
Web & social
networks
experienced it
first…
Infographic by Go-gulf.com
5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
6. … but enterprises are now facing it too … but
• Retail and web transaction data (to refine
enterprises are
recommendations, detect trends etc.) also facing it
• “Sensor” data: now
• GPS in mobile phones
• RFIDs
• NFC
• SmartMeters
• Etc.
• Log file monitoring and analysis
• Security monitoring
Utilities deploying smart meters?
200x information flowing to data center!
6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
7. 4 V’s of Big Data
Defining Big Data
Volume: large
Velocity: high
Variety: complex
(txn, files, media, machine data)
Value: variable signal-noise
ratio
7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
8. Storage was the obvious problem
but Analysis is the important one
Storage is the first obvious “Big Data Is Not the
Created Content, nor Is
problem. It Even Its Consumption
Analysis is next. — It Is the Analysis of
All the Data
Surrounding or Swirling
Around It “
Source: IDC's Digital Universe Study, sponsored by EMC, June 2011
8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 13
http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
9. Companies have realized that there is competitive
advantage in this information and that now is the time
to put this data to work.
An Architect’s Guide to Big Data
An Oracle White Paper in Enterprise Architecture
http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf
9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
10. Emergence of Hadoop
To address Big Data challenges – storage and processing
licensed under the Apache v2 license
created by Doug Cutting and Michael J. Cafarella
Based on papers by Google from 2004 (MapReduce and GFS)
Key advances around distributed processing and distributed storage
First Apache release: 2007
Yahoo! Contributed all its code in 2009
Current release (May. 2012): 1.0.3
10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
11. Hadoop: commercial offering rapidly ramping
up to respond to demand
Market Growth
“New research from International Hortonworks Datameer
Data Corporation (IDC) shows that
revenues for the worldwide Cloudera Platfora
Hadoop-MapReduce ecosystem
software market are considered to Oracle Etc.
be $77 million in 2011 and are
expected to grow to $812.8 million IBM
in 2016 for a compound annual
growth rate (CAGR) of 60.2%.” MapR
IDC Releases First Worldwide Hadoop-MapReduce Ecosystem Software Forecast, Strong Growth Will Continue to Accelerate as Talent and Tools Develop
11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
Insert Information Protection Policy Classification from Slide 13
07 May 2012, http://www.idc.com/getdoc.jsp?containerId=prUS23471212
12. Kernel of Hadoop
CLIENT
Storage: HDFS
NAME NODE
Hadoop Distributed File System
Runs on clusters of commodity hardware
(cheap, readily available, direct attached DATANODE DATANODE DATANODE
storage)
Fault tolerant, Easy to expand DATANODE DATANODE DATANODE
Designed for very large files
DATANODE DATANODE DATANODE
(default block size = 64MB)
Write-once/Read-many-times, simple semantics
Flat file model accommodate both structured
and unstructed data RACK RACK RACK
12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
13. Kernel of Hadoop
MAP
Analysis: MapReduce
Defined by Google in 2004 MAP REDUCE
Break problem up into smaller
sub-problems MAP REDUCE
Able to distribute data workloads
across thousands of nodes
MAP REDUCE
Programmed via
Java/scripting/C++ or higher-level
OUTPUT
languages such as Pig or Hive INPUT DATA
SHUFFLE
DATA
MAP
/SORT
13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
14. Map/Reduce Example
Compute re-tweet counts on Twitter data – a simple measure of social influence
Input Data Map Shuffle/Sort Reduce Output
Execute parallel copies of System groups all mapped Execute parallel copies of
RT @oracle: #CIO's: user-provided “Map” key/value pairs with the user-provided “Reduce”
How are you going to act
function, transform same key together function to distill groups of
on all that data you have? Turn
it into insight w/our #BigData segments of input into data to output
Guide key/value pairs @oracle, 1
RT @oracle_biee: Register @oracle, 3
@oracle, 1
to access the OBIEE Live @oracle, 1
Mobile Demo server @oracle, 1
RT @oracle - 10 Amazing @oracle_biee, 1
Scenes From Oracle's @oracle, 1
@AmericasCup World Series @oracle, 3
courtesy of Sarah Kimmel
RT @oracleretail: Oracle
@oracleretail, 1 @oracleretail, 1 @oracleretail, 1
@oracle_biee, 2
Upgrades Analytics in
Oracle Retail Data Model @oracleretail, 1
(News Release) @oracle_biee, 1
RT @oracle_biee: The Oracle @oracle_biee, 1
Exalytics v1 Patch Set 1 is now @oracle, 1 @oracle_biee, 2
generally available (GA) @oracle_biee, 1
RT @oracle: Transform your
data, Transform your business!
Live Q&A to learn Oracle
GoldenGate 11g's new
features!
14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
15. Hadoop Ecosystem
Rich and evolving
PIG SQL-like (HiveQL)
Scripting for query language
exploring
Bulk data transfers large datasets
between Hadoop and ZOOKEEPER
structured datastores Configuration
Management &
Coordination
Data serialization
HDFS / MapReduce
Storage & Analysis Column-oriented
Machine-learning,
database
data mining
OOZIE CASSANDRA
Collect, aggregate,
stream log data into Workflow & text search engine
HDFS coordination
15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
16. What does SOA have to do
with Big Data?
16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
17. SOA Deployments Generate Big Data
Big Data is not just in Social Networks or Science Projects
SOA infrastructures are (quietly) handling increasingly
massive amount of transactions
Transactions contain key business information:
purchases, inventory levels, package tracking information,
profile updates, etc.
Multi-tenancy, private and public clouds are accelerating
data growth
17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
18. SOA Big Data Example
Logistics Company
Oracle SOA Suite customer Specific process data captured in star schema
Millions of BPEL processes/day
for analytics
Transaction systems involved
analytics limited by a-priori decisions
duplication of data
5 terabytes of database
Purge job every 4 hours
18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
19. Typical Usage of Datastores by SOA Platforms
Today
XML MTOM
XA • headers
• timestamps
CSV
JSON
XML
• Etc. BLOB
Process state Metadata Full Payloads User Data
structured unstructured
Size - +
Many read/write Write once, read-many
19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
20. Typical Usage of Datastores by SOA Platforms
Tomorrow
XML MTOM
XA • headers
• timestamps
CSV
JSON
XML
• Etc. BLOB
Process state Metadata Full Payloads User Data
RDBMS Offload to Hadoop
or NoSQL
20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
21. “Finding answers where there are
yet to be questions” *
SOA infra
runtime
Analytics Analytics
SOA infra
runtime
(Pre-determined)
Universe is
copy Intelligence
the limit!
constrained SOA audit
SOA infra OLAP
by available big data store
database dataset
21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
22. Impact of Big Data:
New Integration Patterns
22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
23. Pattern 1: Usage of MapReduce data
Async BPEL process
Data Query
synchronous interaction not an
2. Wait for
option due to Hadoop typical 1. Start Job_done
latencies (minutes to hours) MapReduce job notification
Getting data is not as simple as a
sync “select” SQL statement
Split query: start job, wait for 3. Get Data
notification, get data
Complex to implement for process
developer
23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
24. Pattern 2: Query data (noSQL or HBase)
Data Query
Synchronous query against 1. Scheduled
job initiates
NoSQL or HBase
Getting data from batch-
processed Hadoop output 3. Sync query
of NoSQL
Not operating on absolute latest
dataset
NoSQL
Familiar pattern, easy to
2. Result set
implement for process designer loaded into
NoSQL
24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
25. Pattern 2: Initiate process on data availability
Initiate process
1. Scheduled
MapReduce job creates dataset job initiates
and drops it on filesystem (ex: 2. Result
set appears
in JSON format) as file in
given
BPEL process + file adapter location
watches directory for new data
BPEL process kicks in, parse
JSON and execute 3. File adapter
detects result
set and
initiates new
process
25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
26. Fast Data
Get Ahead of the Curve
26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
27. Working with Big Data: some challenges
1. Big data ≠ Infinite storage
Yes, storage is cheap but it helps to have
clean data, with context and less redundancy
2. Hadoop is batch-oriented and there is
inherent latency
"With the paths that go through Hadoop [at
Yahoo!], the latency is about fifteen minutes
[…] it will never be true real-time. " *
Raymie Stata, Yahoo! CTO
(June 2011)
minutes
*: http://www.theregister.co.uk/2011/06/30/yahoo_hadoop_and_realtime/
27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
28. Get ahead of the curve
Use Event Processing techniques
Filter out,
correlate
1. Filter out noise (ex: data ticks with
no change), add context (by
correlating multiple sources),
increase relevance
2. Identify critical conditions as
you insert data in warehouse
(not after)
Move time-critical analysis
to front of process
28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
29. Fast Data
Get Ahead of the Curve
Example:
Fast Data Big Data analysis of traffic
patterns and
ms minutes congestion times
for urban planning
Historical
shallow
depth:
Historical depth: deep
Example:
monitoring of traffic
cameras to ensure
given license plate
not in use on
multiple vehicles Add “depth” to your fast data by merging
output of MapReduce to stream processing
29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
30. How Fast is Fast? DPI
equipment
IP allocation
servers
Fast enough to support explosion of
smartphones in largest markets
Mobile provider usage <-> IP@ IP@ <-> user
Billing smartphone data based on usage
Using OEP to correlate users to packets
through dynamically allocated IP addresses
Coherence as fast in-memory grid of user
<-> IP addresses
Usage <-> user
Processes over 800,000 records/s
Billing
30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
31. Putting it all together
Big Data, Fast Data & SOA
31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
32. Oracle’s solution: Big Data, Fast Data & SOA
Endeca
Information
Discovery
Oracle Oracle
Big Data Appliance Exadata
Oracle
Big Data
Processing
Connectors
Oracle
Event
Oracle
InfiniBand InfiniBand Exalytics
Oracle
Real-Time
Decisions
Acquire Organize Analyze Decide
Act, orchestrate response
Oracle SOA Suite
32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
33. Oracle’s solution: Big Data, Fast Data & SOA
Endeca
Example: Information Example:
monitoring of traffic Discovery search for last
cameras to ensure sighting of
Oracle
given license plate Oracle specific vehicles
not in use Appliance
Big Dataon Exadata
multiple vehicles Oracle
Big Data Example:
Processing
Connectors analysis of traffic
Oracle
Event
Oracle
patterns and
InfiniBand InfiniBand Exalytics
congestion times
for urban planning
Oracle
Real-Time
Example: Decisions
Example: coordinate Police
display real-time and Emergency Example:
situation using Acquire Organize Analyze Decide
response using traffic rerouting
BAM BPEL & Human suggestions
Workflow
Act, orchestrate response
Oracle SOA Suite
33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
34. Conclusion
Big Data has reached the enterprise
SOA platforms are evolving to leverage Big Data technology
Service developers need to understand how to insert and access
data in Hadoop
Time-critical conditions can be detected as data is inserted in
Hadoop using event processing techniques – Fast Data
Expect Big Data, Fast Data to become ubiquitous in SOA
environments – much like RDBMS are already
34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
35. 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
Notes de l'éditeur All kinds of data Large volumes Valuable insight, but difficult to extract (structured and unstructured data) Often extremely time sensitive Most of the vast data types portrayed here is consumer data and while the business will want to leverage Oracle Event Processing for business and application data, they are also impacted by this consumer data and information from the vast array or sensors where stream events showing temperatures in a container mid-pacific may destroy high cost food goods unless immediate action is taken or ….. For Starbucks immediately analyzing tweats after launching a new coffee, seeing spikes of negative comments, and very quickly figuring out that the negative reactions came from stores that were serving a particular warmed cheese sandwich, whose aroma did not go with the new coffee smell….. Huge ROI due to quick analysis and specific targeted response. And as you can see from the Spanish (La Caxia) bank solution, a customers Tweets are also being analzed by Oracle Event Processing and stored in Big Data to augment his preferences and influence his/her real time targetted campaigns Scripting languages supported via Hadoop Streaming, equivalent to Unix streaming Facebook, Google, Netflix, etc.Hadron Collider, NSF, etc. Being able to preserve info over long term (without copy/filtering) could be very interesting for historical analysis, shipping & process optimization SmartMeter example: want all data to do in-depth energy usage analysis but also want real-time analysis for things like leak detection. Technologist & citizen