SlideShare une entreprise Scribd logo
1  sur  35
Big Data
& its impact on SOA


Demed L’Her
Sr Director, Product Management, Oracle
demed.lher@oracle.com (twitter: @demed)


1   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Demed L’Her

                                                                                     •          Senior Director, Product Management at Oracle –
                                                                                                Engineering team
                                                                                     •          Based in Redwood Shores, California
                                                                                     •          Team in charge of Oracle SOA Suite: Adapters, Service
                                                                                                Bus, BPEL, Event Processing, SOA Suite for Healthcare
                                                                                                (Java CAPS and WebLogic Integration)
                                                                                     •          Responsible for product roadmap, execution
                                                                                     •          With Oracle since 2006
                                                                                     •          Co-author http://snipurl.com/soa11gbook
                                                                                     •          Twitter: @demed | email: demed.lher@oracle.com



2   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Program Agenda


        1. Big Data Trends
        2. Big Data and SOA
        3. Integration Patterns for Big Data
        4. Fast Data




3   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Introduction to Big Data:
        Problems, Trends
        & Technology




4   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Data
                                                                                                                                               Explosion

                                                                                                                                               Web & social
                                                                                                                                               networks
                                                                                                                                               experienced it
                                                                                                                                               first…




                                                                                                                                               Infographic by Go-gulf.com



5   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
… but enterprises are now facing it too … but
         • Retail and web transaction data (to refine
                                                                                                                                               enterprises are
           recommendations, detect trends etc.)                                                                                                also facing it
         • “Sensor” data:                                                                                                                      now
              • GPS in mobile phones
              • RFIDs
              • NFC
              • SmartMeters
              • Etc.
         • Log file monitoring and analysis
         • Security monitoring

                                  Utilities deploying smart meters?
                                   200x information flowing to data center!


6   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
4 V’s of Big Data

    Defining Big Data


    Volume: large
    Velocity: high
    Variety: complex
    (txn, files, media, machine data)
    Value: variable signal-noise
    ratio



7   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Storage was the obvious problem
      but Analysis is the important one


        Storage is the first obvious                                                                                        “Big Data Is Not the
                                                                                                                            Created Content, nor Is
        problem.                                                                                                            It Even Its Consumption
        Analysis is next.                                                                                                   — It Is the Analysis of
                                                                                                                            All the Data
                                                                                                                            Surrounding or Swirling
                                                                                                                            Around It “




       Source: IDC's Digital Universe Study, sponsored by EMC, June 2011
8   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                        Insert Information Protection Policy Classification from Slide 13
       http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
Companies have realized that there is competitive
           advantage in this information and that now is the time
           to put this data to work.

           An Architect’s Guide to Big Data
           An Oracle White Paper in Enterprise Architecture
           http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf




9   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Emergence of Hadoop
         To address Big Data challenges – storage and processing


            licensed under the Apache v2 license
            created by Doug Cutting and Michael J. Cafarella
            Based on papers by Google from 2004 (MapReduce and GFS)
            Key advances around distributed processing and distributed storage
            First Apache release: 2007
            Yahoo! Contributed all its code in 2009
            Current release (May. 2012): 1.0.3



10   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Hadoop: commercial offering rapidly ramping
         up to respond to demand
         Market Growth
       “New research from International                                                  Hortonworks                                   Datameer
       Data Corporation (IDC) shows that
       revenues for the worldwide                                                        Cloudera                                      Platfora
       Hadoop-MapReduce ecosystem
       software market are considered to                                                 Oracle                                        Etc.
       be $77 million in 2011 and are
       expected to grow to $812.8 million                                                IBM
       in 2016 for a compound annual
       growth rate (CAGR) of 60.2%.”                                                     MapR



      IDC Releases First Worldwide Hadoop-MapReduce Ecosystem Software Forecast, Strong Growth Will Continue to Accelerate as Talent and Tools Develop
11   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.
                                                     Insert Information Protection Policy Classification from Slide 13
      07 May 2012, http://www.idc.com/getdoc.jsp?containerId=prUS23471212
Kernel of Hadoop

                                                                                                                                                 CLIENT
         Storage: HDFS
                                                                                                                                                NAME NODE
          Hadoop Distributed File System

          Runs on clusters of commodity hardware
              (cheap, readily available, direct attached                                                                                        DATANODE    DATANODE   DATANODE
              storage)
          Fault tolerant, Easy to expand                                                                                                       DATANODE    DATANODE   DATANODE

          Designed for very large files
                                                                                                                                                DATANODE    DATANODE   DATANODE
              (default block size = 64MB)
          Write-once/Read-many-times, simple semantics
          Flat file model accommodate both structured
              and unstructed data                                                                                                                  RACK       RACK       RACK




12   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Kernel of Hadoop

                                                                                                                                                MAP
         Analysis: MapReduce

          Defined by Google in 2004                                                                                                            MAP             REDUCE


          Break problem up into smaller
              sub-problems                                                                                                                      MAP             REDUCE

          Able to distribute data workloads
              across thousands of nodes
                                                                                                                                                MAP             REDUCE
          Programmed via
              Java/scripting/C++ or higher-level
                                                                                                                                                                         OUTPUT
              languages such as Pig or Hive                                                                              INPUT DATA
                                                                                                                                                      SHUFFLE
                                                                                                                                                                          DATA
                                                                                                                                                MAP
                                                                                                                                                       /SORT




13   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Map/Reduce Example
         Compute re-tweet counts on Twitter data – a simple measure of social influence
          Input Data                                           Map                                             Shuffle/Sort                     Reduce                          Output
                                                              Execute parallel copies of                       System groups all mapped         Execute parallel copies of
          RT @oracle: #CIO's:                                 user-provided “Map”                              key/value pairs with the         user-provided “Reduce”
          How are you going to act
                                                              function, transform                              same key together                function to distill groups of
          on all that data you have? Turn
          it into insight w/our #BigData                      segments of input into                                                            data to output
          Guide                                               key/value pairs                                        @oracle, 1
          RT @oracle_biee: Register                                                                                                                @oracle, 3
                                                                                                                     @oracle, 1
          to access the OBIEE Live                                   @oracle, 1
          Mobile Demo server                                                                                         @oracle, 1
          RT @oracle - 10 Amazing                                @oracle_biee, 1
          Scenes From Oracle's                                       @oracle, 1
          @AmericasCup World Series                                                                                                                                              @oracle, 3
          courtesy of Sarah Kimmel
         RT @oracleretail: Oracle
                                                                                                                     @oracleretail, 1              @oracleretail, 1              @oracleretail, 1
                                                                                                                                                                                 @oracle_biee, 2
          Upgrades Analytics in
          Oracle Retail Data Model                              @oracleretail, 1
          (News Release)                                        @oracle_biee, 1
          RT @oracle_biee: The Oracle                                                                                @oracle_biee, 1
          Exalytics v1 Patch Set 1 is now                           @oracle, 1                                                                     @oracle_biee, 2
          generally available (GA)                                                                                   @oracle_biee, 1
          RT @oracle: Transform your
          data, Transform your business!
          Live Q&A to learn Oracle
          GoldenGate 11g's new
          features!




14   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Hadoop Ecosystem
            Rich and evolving
                                                                                                                    PIG                                SQL-like (HiveQL)
                                                                                                                Scripting for                           query language
                                                                                                                  exploring
                       Bulk data transfers                                                                     large datasets
                      between Hadoop and                                                                                                                            ZOOKEEPER
                      structured datastores                                                                                                                         Configuration
                                                                                                                                                                    Management &
                                                                                                                                                                     Coordination

                           Data serialization
                                                                                           HDFS / MapReduce
                                                                                           Storage & Analysis                                                          Column-oriented
Machine-learning,
                                                                                                                                                                          database
  data mining
                                                                OOZIE                                                                                      CASSANDRA
                                         Collect, aggregate,
                                         stream log data into Workflow &                                                              text search engine
                                               HDFS           coordination

   15   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
What does SOA have to do
         with Big Data?




16   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
SOA Deployments Generate Big Data

            Big Data is not just in Social Networks or Science Projects
            SOA infrastructures are (quietly) handling increasingly
             massive amount of transactions
            Transactions contain key business information:
             purchases, inventory levels, package tracking information,
             profile updates, etc.
            Multi-tenancy, private and public clouds are accelerating
             data growth




17   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
SOA Big Data Example

         Logistics Company

          Oracle SOA Suite customer                                                                                          Specific process data captured in star schema
          Millions of BPEL processes/day
                                                                                                                              for analytics

          Transaction systems involved
                                                                                                                               analytics limited by a-priori decisions
                                                                                                                               duplication of data
          5 terabytes of database
          Purge job every 4 hours




18   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Typical Usage of Datastores by SOA Platforms
         Today



                                                                        XML                                      MTOM

                             XA                                             • headers
                                                                            • timestamps
                                                                                                                 CSV
                                                                                                                 JSON
                                                                                                                 XML
                                                                            • Etc.                               BLOB



                        Process state                                   Metadata                               Full Payloads                    User Data
                        structured                                                                      unstructured

                          Size -                                                                                                      +

                          Many read/write                                          Write once, read-many




19   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Typical Usage of Datastores by SOA Platforms
         Tomorrow



                                                                        XML                                      MTOM

                             XA                                             • headers
                                                                            • timestamps
                                                                                                                 CSV
                                                                                                                 JSON
                                                                                                                 XML
                                                                            • Etc.                               BLOB



                        Process state                                   Metadata                               Full Payloads                    User Data




                       RDBMS                                                Offload to Hadoop
                      or NoSQL




20   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
“Finding answers where there are
         yet to be questions” *

           SOA infra
            runtime


                                                                                    Analytics                                                                                Analytics
                                                                                                                                                SOA infra
                                                                                                                                                 runtime

                              (Pre-determined)
                                                                                                                                                                             Universe is
                                         copy                                     Intelligence
                                                                                                                                                                              the limit!
                                                                                  constrained                                                                SOA audit
            SOA infra                                            OLAP
                                                                                  by available                                                              big data store
            database                                                                dataset




21   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Impact of Big Data:
      New Integration Patterns




22   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Pattern 1: Usage of MapReduce data

                                                                                                                                                  Async BPEL process
         Data Query
          synchronous interaction not an
                                                                                                                                                                 2. Wait for
           option due to Hadoop typical                                                                                                            1. Start        Job_done
           latencies (minutes to hours)                                                                                                         MapReduce job   notification

          Getting data is not as simple as a
           sync “select” SQL statement
          Split query: start job, wait for                                                                                                                             3. Get Data
           notification, get data
          Complex to implement for process
           developer




23   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Pattern 2: Query data (noSQL or HBase)

         Data Query
          Synchronous query against                                                                                                     1. Scheduled
                                                                                                                                        job initiates
           NoSQL or HBase
          Getting data from batch-
           processed Hadoop output                                                                                                                               3. Sync query
                                                                                                                                                                    of NoSQL
          Not operating on absolute latest
           dataset
                                                                                                                                                        NoSQL
          Familiar pattern, easy to
                                                                                                                                                        2. Result set
           implement for process designer                                                                                                                loaded into
                                                                                                                                                            NoSQL




24   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Pattern 2: Initiate process on data availability

         Initiate process
                                                                                                                                 1. Scheduled
          MapReduce job creates dataset                                                                                        job initiates
              and drops it on filesystem (ex:                                                                                                     2. Result
                                                                                                                                                set appears
              in JSON format)                                                                                                                    as file in
                                                                                                                                                     given
          BPEL process + file adapter                                                                                                             location
              watches directory for new data
          BPEL process kicks in, parse
              JSON and execute                                                                                                                            3. File adapter
                                                                                                                                                           detects result
                                                                                                                                                               set and
                                                                                                                                                            initiates new
                                                                                                                                                               process




25   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Fast Data
      Get Ahead of the Curve




26   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Working with Big Data: some challenges



         1.         Big data ≠ Infinite storage
                    Yes, storage is cheap but it helps to have
                    clean data, with context and less redundancy
         2.         Hadoop is batch-oriented and there is
                    inherent latency
                    "With the paths that go through Hadoop [at
                    Yahoo!], the latency is about fifteen minutes
                    […] it will never be true real-time. " *
                    Raymie Stata, Yahoo! CTO
                    (June 2011)


                                                                                                                                                minutes
           *: http://www.theregister.co.uk/2011/06/30/yahoo_hadoop_and_realtime/


27   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Get ahead of the curve

 Use Event Processing techniques
                                                                                                                               Filter out,
                                                                                                                               correlate
         1.         Filter out noise (ex: data ticks with
                    no change), add context (by
                    correlating multiple sources),
                    increase relevance
         2.         Identify critical conditions as
                    you insert data in warehouse
                    (not after)
                    Move time-critical analysis
                    to front of process




28   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Fast Data
         Get Ahead of the Curve
                                                                                                                                                                         Example:
                             Fast Data                                                                                        Big Data                                   analysis of traffic
                                                                                                                                                                         patterns and
                                         ms                                                                                          minutes                             congestion times
                                                                                                                                                                         for urban planning
         Historical

         shallow
         depth:




                                                                                                                                                Historical depth: deep
      Example:
      monitoring of traffic
      cameras to ensure
      given license plate
      not in use on
      multiple vehicles                                           Add “depth” to your fast data by merging
                                                                  output of MapReduce to stream processing


29   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
How Fast is Fast?                                                                                                                                  DPI
                                                                                                                                                         equipment
                                                                                                                                                                     IP allocation
                                                                                                                                                                        servers
         Fast enough to support explosion of
         smartphones in largest markets


            Mobile provider                                                                                                                    usage <-> IP@               IP@ <-> user

            Billing smartphone data based on usage
            Using OEP to correlate users to packets
                through dynamically allocated IP addresses
            Coherence as fast in-memory grid of user
                <-> IP addresses
                                                                                                                                                                Usage <-> user
            Processes over 800,000 records/s
                                                                                                                                                                                     Billing



30   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Putting it all together
      Big Data, Fast Data & SOA




31   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Oracle’s solution: Big Data, Fast Data & SOA
                                                                                                                                                       Endeca
                                                                                                                                                     Information
                                                                                                                                                      Discovery

                                                   Oracle                                                   Oracle
                                             Big Data Appliance                                             Exadata
                                                                                   Oracle
                                                                                  Big Data
                       Processing


                                                                                 Connectors
                         Oracle
                         Event



                                                                                                                                                                    Oracle
                                                                                       InfiniBand                                   InfiniBand                     Exalytics



                                                                                                                                                                    Oracle
                                                                                                                                                                   Real-Time
                                                                                                                                                                   Decisions

                                                    Acquire                      Organize                              Analyze                   Decide


                                                                        Act, orchestrate response
                                                                          Oracle SOA Suite


32   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Oracle’s solution: Big Data, Fast Data & SOA
                                                                                                                                                           Endeca
                                        Example:                                                                                                         Information               Example:
                                        monitoring of traffic                                                                                             Discovery                search for last
                                        cameras to ensure                                                                                                                          sighting of
                                                 Oracle
                                        given license plate                                                 Oracle                                                                 specific vehicles
                                        not in use Appliance
                                          Big Dataon                                                        Exadata
                                        multiple vehicles     Oracle
                                                                                  Big Data                                                                                            Example:
                       Processing


                                                                                 Connectors                                                                                           analysis of traffic
                         Oracle
                         Event



                                                                                                                                                                        Oracle
                                                                                                                                                                                      patterns and
                                                                                       InfiniBand                                   InfiniBand                         Exalytics
                                                                                                                                                                                      congestion times
                                                                                                                                                                                      for urban planning
                                                                                                                                                                        Oracle
                                                                                                                                                                       Real-Time
                                                                                                                                                 Example:              Decisions
 Example:                                                                                                                                        coordinate Police
 display real-time                                                                                                                               and Emergency                      Example:
 situation using                                    Acquire                      Organize                              Analyze                       Decide
                                                                                                                                                 response using                     traffic rerouting
 BAM                                                                                                                                             BPEL & Human                       suggestions
                                                                                                                                                 Workflow
                                                                        Act, orchestrate response
                                                                          Oracle SOA Suite


33   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
Conclusion

            Big Data has reached the enterprise
            SOA platforms are evolving to leverage Big Data technology
            Service developers need to understand how to insert and access
             data in Hadoop
            Time-critical conditions can be detected as data is inserted in
             Hadoop using event processing techniques – Fast Data
            Expect Big Data, Fast Data to become ubiquitous in SOA
             environments – much like RDBMS are already



34   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13
35   Copyright © 2012, Oracle and/or its affiliates. All rights reserved.   Insert Information Protection Policy Classification from Slide 13

Contenu connexe

Tendances

Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Hortonworks
 

Tendances (20)

2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Creando un Portal Oracle para una Empresa
Creando un Portal Oracle para una EmpresaCreando un Portal Oracle para una Empresa
Creando un Portal Oracle para una Empresa
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
Beyond a Big Data Pilot: Building a Production Data Infrastructure - Stampede...
 
Teradata Aster Discovery Platform
Teradata Aster Discovery PlatformTeradata Aster Discovery Platform
Teradata Aster Discovery Platform
 
Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25Dataguise hortonworks insurance_feb25
Dataguise hortonworks insurance_feb25
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lon
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
 
Talend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data PlatformTalend Open Studio and Hortonworks Data Platform
Talend Open Studio and Hortonworks Data Platform
 
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
Accelerating the Value of Big Data Analytics for P&C Insurers with Hortonwork...
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 

Similaire à Big data and its impact on SOA

Data Mining
Data MiningData Mining
Data Mining
swami920
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
David Linthicum
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
dmurph4
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
dmurph4
 
Mapping Manager Product Overview
Mapping Manager Product OverviewMapping Manager Product Overview
Mapping Manager Product Overview
Rakesh Kumar
 
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
InSync2011
 

Similaire à Big data and its impact on SOA (20)

Data Mining
Data MiningData Mining
Data Mining
 
Oracle Fusion applications 101 [2010 OAUG Collaborate]
Oracle Fusion applications 101 [2010 OAUG Collaborate]Oracle Fusion applications 101 [2010 OAUG Collaborate]
Oracle Fusion applications 101 [2010 OAUG Collaborate]
 
Odi ireland rittman
Odi ireland rittmanOdi ireland rittman
Odi ireland rittman
 
Fusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliardFusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliard
 
Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline Governance as Sustainability in the Enterprise Architecture Discipline
Governance as Sustainability in the Enterprise Architecture Discipline
 
WebLogic 12c Developer Deep Dive at Oracle Develop India 2012
WebLogic 12c Developer Deep Dive at Oracle Develop India 2012WebLogic 12c Developer Deep Dive at Oracle Develop India 2012
WebLogic 12c Developer Deep Dive at Oracle Develop India 2012
 
All Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the CloudAll Grown Up: Maturation of Analytics in the Cloud
All Grown Up: Maturation of Analytics in the Cloud
 
Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2Getting Cloud Architecture Right the First Time Ver 2
Getting Cloud Architecture Right the First Time Ver 2
 
Soeren okfn greece meetup
Soeren okfn greece meetupSoeren okfn greece meetup
Soeren okfn greece meetup
 
Metadata Use Cases
Metadata Use CasesMetadata Use Cases
Metadata Use Cases
 
Metadata Use Cases You Can Use
Metadata Use Cases You Can UseMetadata Use Cases You Can Use
Metadata Use Cases You Can Use
 
Extending The Value Of Oracle Crm On Demand Through Cloud Based Extensibility
Extending The Value Of Oracle Crm On Demand Through Cloud Based ExtensibilityExtending The Value Of Oracle Crm On Demand Through Cloud Based Extensibility
Extending The Value Of Oracle Crm On Demand Through Cloud Based Extensibility
 
How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02How to develop a data scientist – What business has requested v02
How to develop a data scientist – What business has requested v02
 
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
 
Future of Data Strategy
Future of Data StrategyFuture of Data Strategy
Future of Data Strategy
 
Mapping Manager Product Overview
Mapping Manager Product OverviewMapping Manager Product Overview
Mapping Manager Product Overview
 
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataSelf-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
 
Oracle Optimized Datacenter - Storage
Oracle Optimized Datacenter - StorageOracle Optimized Datacenter - Storage
Oracle Optimized Datacenter - Storage
 
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
Database & Technology 2 _ Damien Bootsma _ best Practices for capturing meta ...
 
Milton smith 2013
Milton smith 2013Milton smith 2013
Milton smith 2013
 

Big data and its impact on SOA

  • 1. Big Data & its impact on SOA Demed L’Her Sr Director, Product Management, Oracle demed.lher@oracle.com (twitter: @demed) 1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 2. Demed L’Her • Senior Director, Product Management at Oracle – Engineering team • Based in Redwood Shores, California • Team in charge of Oracle SOA Suite: Adapters, Service Bus, BPEL, Event Processing, SOA Suite for Healthcare (Java CAPS and WebLogic Integration) • Responsible for product roadmap, execution • With Oracle since 2006 • Co-author http://snipurl.com/soa11gbook • Twitter: @demed | email: demed.lher@oracle.com 2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 3. Program Agenda 1. Big Data Trends 2. Big Data and SOA 3. Integration Patterns for Big Data 4. Fast Data 3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 4. Introduction to Big Data: Problems, Trends & Technology 4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 5. Data Explosion Web & social networks experienced it first… Infographic by Go-gulf.com 5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 6. … but enterprises are now facing it too … but • Retail and web transaction data (to refine enterprises are recommendations, detect trends etc.) also facing it • “Sensor” data: now • GPS in mobile phones • RFIDs • NFC • SmartMeters • Etc. • Log file monitoring and analysis • Security monitoring Utilities deploying smart meters?  200x information flowing to data center! 6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 7. 4 V’s of Big Data Defining Big Data Volume: large Velocity: high Variety: complex (txn, files, media, machine data) Value: variable signal-noise ratio 7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 8. Storage was the obvious problem but Analysis is the important one Storage is the first obvious “Big Data Is Not the Created Content, nor Is problem. It Even Its Consumption Analysis is next. — It Is the Analysis of All the Data Surrounding or Swirling Around It “ Source: IDC's Digital Universe Study, sponsored by EMC, June 2011 8 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
  • 9. Companies have realized that there is competitive advantage in this information and that now is the time to put this data to work. An Architect’s Guide to Big Data An Oracle White Paper in Enterprise Architecture http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf 9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 10. Emergence of Hadoop To address Big Data challenges – storage and processing  licensed under the Apache v2 license  created by Doug Cutting and Michael J. Cafarella  Based on papers by Google from 2004 (MapReduce and GFS)  Key advances around distributed processing and distributed storage  First Apache release: 2007  Yahoo! Contributed all its code in 2009  Current release (May. 2012): 1.0.3 10 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 11. Hadoop: commercial offering rapidly ramping up to respond to demand Market Growth “New research from International  Hortonworks  Datameer Data Corporation (IDC) shows that revenues for the worldwide  Cloudera  Platfora Hadoop-MapReduce ecosystem software market are considered to  Oracle  Etc. be $77 million in 2011 and are expected to grow to $812.8 million  IBM in 2016 for a compound annual growth rate (CAGR) of 60.2%.”  MapR IDC Releases First Worldwide Hadoop-MapReduce Ecosystem Software Forecast, Strong Growth Will Continue to Accelerate as Talent and Tools Develop 11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 07 May 2012, http://www.idc.com/getdoc.jsp?containerId=prUS23471212
  • 12. Kernel of Hadoop CLIENT Storage: HDFS NAME NODE  Hadoop Distributed File System  Runs on clusters of commodity hardware (cheap, readily available, direct attached DATANODE DATANODE DATANODE storage)  Fault tolerant, Easy to expand DATANODE DATANODE DATANODE  Designed for very large files DATANODE DATANODE DATANODE (default block size = 64MB)  Write-once/Read-many-times, simple semantics  Flat file model accommodate both structured and unstructed data RACK RACK RACK 12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 13. Kernel of Hadoop MAP Analysis: MapReduce  Defined by Google in 2004 MAP REDUCE  Break problem up into smaller sub-problems MAP REDUCE  Able to distribute data workloads across thousands of nodes MAP REDUCE  Programmed via Java/scripting/C++ or higher-level OUTPUT languages such as Pig or Hive INPUT DATA SHUFFLE DATA MAP /SORT 13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 14. Map/Reduce Example Compute re-tweet counts on Twitter data – a simple measure of social influence Input Data Map Shuffle/Sort Reduce Output Execute parallel copies of System groups all mapped Execute parallel copies of RT @oracle: #CIO's: user-provided “Map” key/value pairs with the user-provided “Reduce” How are you going to act function, transform same key together function to distill groups of on all that data you have? Turn it into insight w/our #BigData segments of input into data to output Guide key/value pairs @oracle, 1 RT @oracle_biee: Register @oracle, 3 @oracle, 1 to access the OBIEE Live @oracle, 1 Mobile Demo server @oracle, 1 RT @oracle - 10 Amazing @oracle_biee, 1 Scenes From Oracle's @oracle, 1 @AmericasCup World Series @oracle, 3 courtesy of Sarah Kimmel  RT @oracleretail: Oracle @oracleretail, 1 @oracleretail, 1 @oracleretail, 1 @oracle_biee, 2 Upgrades Analytics in Oracle Retail Data Model @oracleretail, 1 (News Release) @oracle_biee, 1 RT @oracle_biee: The Oracle @oracle_biee, 1 Exalytics v1 Patch Set 1 is now @oracle, 1 @oracle_biee, 2 generally available (GA) @oracle_biee, 1 RT @oracle: Transform your data, Transform your business! Live Q&A to learn Oracle GoldenGate 11g's new features! 14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 15. Hadoop Ecosystem Rich and evolving PIG SQL-like (HiveQL) Scripting for query language exploring Bulk data transfers large datasets between Hadoop and ZOOKEEPER structured datastores Configuration Management & Coordination Data serialization HDFS / MapReduce Storage & Analysis Column-oriented Machine-learning, database data mining OOZIE CASSANDRA Collect, aggregate, stream log data into Workflow & text search engine HDFS coordination 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 16. What does SOA have to do with Big Data? 16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 17. SOA Deployments Generate Big Data  Big Data is not just in Social Networks or Science Projects  SOA infrastructures are (quietly) handling increasingly massive amount of transactions  Transactions contain key business information: purchases, inventory levels, package tracking information, profile updates, etc.  Multi-tenancy, private and public clouds are accelerating data growth 17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 18. SOA Big Data Example Logistics Company  Oracle SOA Suite customer Specific process data captured in star schema  Millions of BPEL processes/day for analytics  Transaction systems involved  analytics limited by a-priori decisions  duplication of data  5 terabytes of database  Purge job every 4 hours 18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 19. Typical Usage of Datastores by SOA Platforms Today XML MTOM XA • headers • timestamps CSV JSON XML • Etc. BLOB Process state Metadata Full Payloads User Data structured unstructured Size - + Many read/write Write once, read-many 19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 20. Typical Usage of Datastores by SOA Platforms Tomorrow XML MTOM XA • headers • timestamps CSV JSON XML • Etc. BLOB Process state Metadata Full Payloads User Data RDBMS Offload to Hadoop or NoSQL 20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 21. “Finding answers where there are yet to be questions” * SOA infra runtime Analytics Analytics SOA infra runtime (Pre-determined) Universe is copy Intelligence the limit! constrained SOA audit SOA infra OLAP by available big data store database dataset 21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 22. Impact of Big Data: New Integration Patterns 22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 23. Pattern 1: Usage of MapReduce data Async BPEL process Data Query  synchronous interaction not an 2. Wait for option due to Hadoop typical 1. Start Job_done latencies (minutes to hours) MapReduce job notification  Getting data is not as simple as a sync “select” SQL statement  Split query: start job, wait for 3. Get Data notification, get data  Complex to implement for process developer 23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 24. Pattern 2: Query data (noSQL or HBase) Data Query  Synchronous query against 1. Scheduled job initiates NoSQL or HBase  Getting data from batch- processed Hadoop output 3. Sync query of NoSQL  Not operating on absolute latest dataset NoSQL  Familiar pattern, easy to 2. Result set implement for process designer loaded into NoSQL 24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 25. Pattern 2: Initiate process on data availability Initiate process 1. Scheduled  MapReduce job creates dataset job initiates and drops it on filesystem (ex: 2. Result set appears in JSON format) as file in given  BPEL process + file adapter location watches directory for new data  BPEL process kicks in, parse JSON and execute 3. File adapter detects result set and initiates new process 25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 26. Fast Data Get Ahead of the Curve 26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 27. Working with Big Data: some challenges 1. Big data ≠ Infinite storage Yes, storage is cheap but it helps to have clean data, with context and less redundancy 2. Hadoop is batch-oriented and there is inherent latency "With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes […] it will never be true real-time. " * Raymie Stata, Yahoo! CTO (June 2011) minutes *: http://www.theregister.co.uk/2011/06/30/yahoo_hadoop_and_realtime/ 27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 28. Get ahead of the curve Use Event Processing techniques Filter out, correlate 1. Filter out noise (ex: data ticks with no change), add context (by correlating multiple sources), increase relevance 2. Identify critical conditions as you insert data in warehouse (not after) Move time-critical analysis to front of process 28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 29. Fast Data Get Ahead of the Curve Example: Fast Data Big Data analysis of traffic patterns and ms minutes congestion times for urban planning Historical shallow depth: Historical depth: deep Example: monitoring of traffic cameras to ensure given license plate not in use on multiple vehicles Add “depth” to your fast data by merging output of MapReduce to stream processing 29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 30. How Fast is Fast? DPI equipment IP allocation servers Fast enough to support explosion of smartphones in largest markets  Mobile provider usage <-> IP@ IP@ <-> user  Billing smartphone data based on usage  Using OEP to correlate users to packets through dynamically allocated IP addresses  Coherence as fast in-memory grid of user <-> IP addresses Usage <-> user  Processes over 800,000 records/s Billing 30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 31. Putting it all together Big Data, Fast Data & SOA 31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 32. Oracle’s solution: Big Data, Fast Data & SOA Endeca Information Discovery Oracle Oracle Big Data Appliance Exadata Oracle Big Data Processing Connectors Oracle Event Oracle InfiniBand InfiniBand Exalytics Oracle Real-Time Decisions Acquire Organize Analyze Decide Act, orchestrate response Oracle SOA Suite 32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 33. Oracle’s solution: Big Data, Fast Data & SOA Endeca Example: Information Example: monitoring of traffic Discovery search for last cameras to ensure sighting of Oracle given license plate Oracle specific vehicles not in use Appliance Big Dataon Exadata multiple vehicles Oracle Big Data Example: Processing Connectors analysis of traffic Oracle Event Oracle patterns and InfiniBand InfiniBand Exalytics congestion times for urban planning Oracle Real-Time Example: Decisions Example: coordinate Police display real-time and Emergency Example: situation using Acquire Organize Analyze Decide response using traffic rerouting BAM BPEL & Human suggestions Workflow Act, orchestrate response Oracle SOA Suite 33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 34. Conclusion  Big Data has reached the enterprise  SOA platforms are evolving to leverage Big Data technology  Service developers need to understand how to insert and access data in Hadoop  Time-critical conditions can be detected as data is inserted in Hadoop using event processing techniques – Fast Data  Expect Big Data, Fast Data to become ubiquitous in SOA environments – much like RDBMS are already 34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 35. 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13

Notes de l'éditeur

  1. All kinds of data Large volumes Valuable insight, but difficult to extract (structured and unstructured data) Often extremely time sensitive Most of the vast data types portrayed here is consumer data and while the business will want to leverage Oracle Event Processing for business and application data, they are also impacted by this consumer data and information from the vast array or sensors where stream events showing temperatures in a container mid-pacific may destroy high cost food goods unless immediate action is taken or ….. For Starbucks immediately analyzing tweats after launching a new coffee, seeing spikes of negative comments, and very quickly figuring out that the negative reactions came from stores that were serving a particular warmed cheese sandwich, whose aroma did not go with the new coffee smell….. Huge ROI due to quick analysis and specific targeted response. And as you can see from the Spanish (La Caxia) bank solution, a customers Tweets are also being analzed by Oracle Event Processing and stored in Big Data to augment his preferences and influence his/her real time targetted campaigns  
  2. Scripting languages supported via Hadoop Streaming, equivalent to Unix streaming
  3. Facebook, Google, Netflix, etc.Hadron Collider, NSF, etc.
  4. Being able to preserve info over long term (without copy/filtering) could be very interesting for historical analysis, shipping &amp; process optimization
  5. SmartMeter example: want all data to do in-depth energy usage analysis but also want real-time analysis for things like leak detection.
  6. Technologist &amp; citizen