SlideShare une entreprise Scribd logo
1  sur  29
Raptor: Real-time Analytics on Hadoop


                     Soundar Velu
                     Product Architect, Advanced Technology, SunGard

                     Anil Kumar Batchu
                     Research Engineer, Applied Research, SunGard




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices
Who We Are, Who Am I, What We Do?


                       Who We Are & What We Do?
                                           Fortune 500 Company
                                           Financial Services Firm
                                           Provide software & consulting services across the industry
                                           Exploring impacts of Big Data approach for last 2+ years


                       Who Am I?
                                    Part of Applied Research & Consulting group based out of
                                     Bangalore, India
                                    We focus on latest technology trends




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   1
Agenda


                      Financial Services and Data Problems

                      Raptor Architecture Overview

                      Raptor Components

                      Benchmarks

                      Future Enhancements




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   2
Financial Services & Typical “data intensive” Problems




                                                                                                                                         Wholesale Markets
                   Portfolio Construction                                                                           Price Forecasting



                                                                                       Risk Calculation



              Compliance Surveillance                                                                               Batch Processing




                                                                                                                                        Retail Markets
                                                                                       Fraud Detection



                        Targeted Marketing                                                                       Demand Forecasting


Proprietary and Confidential. Not to be distributed or reproduced without permission          www.sungard.com/globalservices                                 3
Implementation Constraints in Financial Services



                                                                                                                        RDBMS
                               Legacy Burden
                                                                                                                        Centricity

                     Architectures, languages,                                                              Constrained semantics,
                       tools, systems, skills                                                                rigidity and specificity




                                        Data Silos                                                                Cost of Change


                 Governance “You don’t own                                                               Evolution is the only answer
                      that data, I do”


Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices                   4
What Did We Need?


                      A generic, reliable, and cost effective analytics solution
                       with a wide range of application areas
                      Query execution and analytics at soft real-time
                       windows (acceptable and consistent latencies)
                      Minimum customization, seamless integration and
                       ease of use
                      Policy around data storage and processing
                      Adaptive segmentation algorithms for optimized data
                       search (Indexing, partitioning and filtering)



Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   5
Some Limitations with Traditional Hadoop

                    Jobs are executed in a brute force fashion, causing
                     complete scans of files for every single query
                    Long warm up time for jobs, performs poorly for
                     relatively smaller data sets.
                    Scheduling imbalance in HDFS operations and job
                     execution
                    Limitations around the kind of jobs that can be executed
                     with MapReduce, (non equality joins not supported)
                    Open bugs and lacking features, memory management
                     bottlenecks
                    Only for batch mode based applications, does not fit
                     real-time analytics scenarios
Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   6
Raptor Application Stack




                Z                                       Flume                                   Scribe                       Sqoop
                o
                o                      Hue                                             Raptor                                   Oozie
                K
                e                                                                           Hive/Pig
                e
                p                                 HBase                                Map Reduce                       Quantum Processor
                e
                r                                                                             HDFS


Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices                       7
Raptor Architecture




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   8
Raptor Digester - Level 1 Segmentation

                       Database                                               Data Stream                    Logs/Files
                                                                                                                                       Data load adaptors


               Data Cleanser
                 Call back                                                        Digester                                    Raptor Client


                                                                                                                                       Table specific data
                                                                                                                                       transformer and
                                                                                                                                       partitioner, partition
                                                                                                                                       based on
          Data Transformer
              Call back
                                                                                                                                       demographics,
                                                                          Storage buckets                                              customers, IP address
                                                                                                                                       or composite columns.



                                                                                                                                       Store partitioned
                                                                                  HDFS                                                 data into
                                                                                                                                       respective file
                                                                                                                                       buckets in HDFS.
Proprietary and Confidential. Not to be distributed or reproduced without permission         www.sungard.com/globalservices                                     9
Raptor Block Analyzer – Level 2 Segmentation

                                                                                       Write Block

       Raptor client
                                                                      DataXceiver
                                                                                                                Offer BlockID for
                                                                                                                async-Indexing
     IndexFactory                                                                                                                     BlockReceiver
            Get Indexer for




                                                                     Asynchronously
                 Block




                                                                   analyze/Index Block
                                                                                                                                                      DataBlock
                                                                                                Index
                                                                                                Queue

                              Block Analyzer                                                                                                          DataBlock



        Indexer                                    BloomFilter

                                            Store column meta information,                                       Every block is asynchronously analyzed
                                            Block index map and column
                                            level bloom filters in ThorCache.
                                                                                                                 and the resulting column level blooms,
                                                                                                                 index tree and column level meta
                                                                                                                 information is stored in ThorCache. All this
                          Thor Block                                                                             happens within the DataNode Context.
                         Index Cache


Proprietary and Confidential. Not to be distributed or reproduced without permission                 www.sungard.com/globalservices                               10
Raptor Client Framework

                                                                                       Raptor Client is similar to dfsclient or jobclient, it stands as a
                                                                                       primary interface into Raptor.
                   Block Analyzer
                   • Get Indexer for block
                   • Mirror Indexer to replica nodes
                                                                                          Code Generator                             NameNode
                                                                                                         Deploy Generated        • Get block info for file
                 Raptor Executor                                                                         Code                    • Get file stats

                     • Execute internal task
                     • Clean up task
                     • Get result segments


                          Digester                                                        Raptor Client                               DataNode
                     • Get Digester for table
                                                                                                                                 • Invoke Quantum Processor Services
                     • Get Analyzer for table
                                                                                                                                 • Manage Indexes
                                                                                                                                 • Execute Jobs
                                                                                                                                 • Get Block analyzers
                Raptor Optimizer                                                                                                 • Deploy generated infrastructure patch

                                                                                                       Re-Index Block
                     • Get analyzer for table
                                                                                                       Re-Balance blocks
                     • Get Indexer for table
                     • Get adaptive metrics
                                                                                          Adaptive Engine                           Raptor-Hive
                                                                                                                                 • Get indexer for block/table
                  Policy Manager                                                                                                 • Get Hive Meta-data for table
                                                                                                                                 • Get adaptive metrics for table
                     • Auto shard buckets
                                                                                                                                 • Get Raptor Meta Data for table
                     • Drop indexes for buckets
                                                                                                                                 • Get digester for Table
                     • Archive Buckets
                                                                                                                                 • Deploy generated infrastructure patch

Proprietary and Confidential. Not to be distributed or reproduced without permission          www.sungard.com/globalservices                                               11
Getting Results Out – End to End Flow

          Client                              Execute HQL                                                                   Thrift
        Application                              Statement                                                                  Client
                                                                                       Hive JDBC
                                                                                       Connection                            Thrift
                                      ResultSet                                                                              Client




      Client applications create a hive JDBC
                                                                                                                                           Thrift Hive Server
      connection to Raptor-HiveServer. Client fires HQL
      Queries over the connection. The query is
      executed by Raptor and a Hive ResultSet is
      returned as Result. This is the interface used by                                                                                      Raptor-Hive
      Clients to access data in HDFS.


  e.g. “Select Name, CNO, TxID, floor(TxAmt)                                                                                         Fetch Task          Execution
  from Credit_Tx order by CNO”

                                                                                                                                Fetch Results            Write Results
                                                                           Execute Non-HQL queries
                                                                           directly with raptor Client,
                                                                                                                                                  HDFS
                                                                                       Raptor Client                                       Quantum Processor



Proprietary and Confidential. Not to be distributed or reproduced without permission                www.sungard.com/globalservices                                       12
Raptor-Hive Processing Engine

                                                                                                             HiveServer
                                                                                                                                             Get Table Block
    JDBC Client                                                                                                                                 Analyzers
                                                                                                                                                                Datanodes
                                                                                          Compile


                                                                                          Optimize                             Raptor Optimizer

          Hive
        MetaStore                                                                      Execution Plan                                      Metrics published
                                                                                                                                           to Adaptive engine
                                                                                                                                                                 Raptor
                                                                                                                                                                MetaStore


                                                                                            Raptor                                 Raptor Adaptive
                                                                                           Executor                                    Engine




                                                                                          Raptor Client


         MapReduce                                                                          Raptor                                           HBase
                                                                                       Quantum Processor

Proprietary and Confidential. Not to be distributed or reproduced without permission              www.sungard.com/globalservices                                            13
Quantum Processor Framework
                        Hive Executor
                                                                                                                            QuantumNode is a service similar to
                                                                                                                            DataXceiver on DataNode.        All
                       RaptorClient                                                    RaptorProtocolHelper                 Raptor operations are executed by
                                                                                                                            the Quantum Processor framework.

                 Quantum Node
                                                                                        QuantumProcessor



                                                      GetNodeStats                                                            SearchBlock



                                              ReadResultSegment                                                              Manage Indexes



                                             ReadBoundaryRecord                                                            ExecuteInternalTask



                                                 Get File analyzers                                                        ExecuteExternalTask



Proprietary and Confidential. Not to be distributed or reproduced without permission      www.sungard.com/globalservices                                      14
Raptor Block Searcher/Processor

                                                                                                                                        Block Searcher analyzes the
                                                                                                                                        block indexes and bloom filters
Hive Executor                                                                     Process/Search
                                        RaptorClient                                                  QuantumProcessor                  and extracts the resultant
                                                                                       Block                                            indexed records via index
                                                                                                                                        based offset reader or if no
                                                                                                                                        indexes are available then it
                                                                                                                                        does a complete block scan.
                                             Thor Block Index                                             BlockSearcher
                                                  Cache




             Block Index                                                                                       Block Index         No
                                                                                                                 /Bloom                       Seq Block Reader
                                                                                                                Available
                                                                                                                                                         Process All
           Column Level                                                                                                                                   Records
              Bloom
                                                                                                                        Yes                   Quantum Operator
           Column Meta                                                                                 Search Index Tree
               Data
                                                                                                                                                 Result Writer
                                                                                                      Index Block Reader

                                                                                                                                                    HDFS
                                                                                                                       Process Result
                                                                                                                          Records
Proprietary and Confidential. Not to be distributed or reproduced without permission           www.sungard.com/globalservices                                          15
Quantum Operator

    Quantum Operator takes block records as input and performs the required field inspections, and applies
    UDFs and aggregation. The resultant records are either written to local segment files or to HDFS directly
    based on the operator type.


                                                                                          Evaluate rows

                    Block Searcher                                                                                 Quantum Operator                        Read Remote Seg

                                                                                         Offer result rows                                                Read k-remote segment
                                                                                                                                                          files, if reduce operator.

  Write Result


                                                                                          Inspect Rows                      Apply Join/Merge                    Apply UDF

                         Has                                  Yes
                       reduce
                        child?

                                                                                               Write results to k-segmented
                                                                                               local files, if child is a reduce
                                  No                                                           operator.


                                                                                          Local                               Quantum Operators: SelectOperator, JoinOperator,
                       HDFS                                                            Segment File                           SortByOperator, GropuByOperator, GPUOperator, etc…




Proprietary and Confidential. Not to be distributed or reproduced without permission                 www.sungard.com/globalservices                                                    16
Adaptive Indexing

                                            Execute Hive Queries                                                     Collect Execution Metrics
                                                                                       Raptor Adaptive
                                                                                           Engine

                                                                                                      Asynchronous
                                                                                                      Adaptive                   Raptor Metadata
  Raptor adaptive engine collects metrics                                                             Indexer
  from the executed queries. Metrics
  include the columns and tables involved                                                  Adaptive
  in the query, the types of aggregation                                                     Index
  executed on columns etc… these metrics                                                   Scheduler

  are used by the adaptive index scheduler
  to schedule re-indexing of the blocks
  based on these usage metrics.                                                                   Re Analyze/Index Blocks
                                                                                                  based on usage/query Metrics


                                                                                           HDFS



                                                                DataNode                   DataNode                           DataNode             DataNode



                                                                                                                                                   ThorIndex
                                                                                                                                                     Cache

Proprietary and Confidential. Not to be distributed or reproduced without permission    www.sungard.com/globalservices                                         17
Code Generation and Data Definition Framework

                                                                                                                 User supplies table schema in an XML format.
                                                                                                                 Based on the Xml table schema and hints,
                                                                                                                 Raptor code generator generates metadata in
                                                                             Raptor Code Generator               hive metadata store, HBase and raptor metadata
                                                                                                                 store. It also generates table specific indexer
                                                                                                                 column analyzers and bloom filter classes.

                                                                                                                                Create Hive Table Definitions
                                                                                                                HiveServer


                                                                                                                                              Hive Metadata
                                                                                                                   HBase


                                                                                                                                 Create Raptor Table Metadata
                                                                                                             Raptor Adaptive
                                                                                                                 Engine

                                                                                                                                             Raptor Metadata

                                                                                                              Raptor Client



                                                              Generate Table Specific
                                                              Indexers, Bloom Filter , Column                                    Deploy Generated code to
                                                              Analyzers and digester                                             Raptor- HiveServer and
                                                                                                                                 DataNodes

Proprietary and Confidential. Not to be distributed or reproduced without permission         www.sungard.com/globalservices                                     18
Raptor Policy Manager – Time Based Shard

Time based auto sharding, as per the                                                       Raptor Policy
                                                                                            Manager
policy defined by the user. Table buckets
are moved from highly segmented time
shards to sparsely segmented archive                                                                                      Raptor Metadata
store over time.

                                                                                                           HDFS


                Completely                                              Bucket-1               Bucket-2                  Bucket-3             Primary
                Segmented                                                                                                                   Time-Shard



                 Partially
                                                                        Bucket-1               Bucket-2                  Bucket-3        Secondary
                Segmented
                                                                                                                                        Time-Shard-1

                                                                                                               …

     All Segments Merged
     and Indexes dropped                                                Bucket-1               Bucket-2                  Bucket-3       Archive Store


 Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices                                   19
Infrastructure Enhancements


             Adaptive compression based on network congestion statistics

             Scheduling of jobs via raptor client (via Hive) in conjunction
                       with enhanced NameNode block policy

             Computation intensive jobs scheduled on GPU enabled nodes

             Object Pools across the Hadoop-Raptor ecosystem

             Hand-shake mechanism between clients and DataNodes to
                       avoid imminent operation failures

             Interactive user console for managing the cluster, tasks, data
                       policy, filters etc…


Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   20
Benchmark Raptor

  Benchmark carried out on a 5 node cluster with commodity hardware { Core2-Duo, 4GB
  RAM, 1TB storage, 100Mbps NIC, 64MB block size, Replication factor 3, Network
  Compression enabled}

  Table with 13 columns and mixed types. Sample data generated with node.js with moderate
  entropy. (table with 1 group index{3 columns} and 0 column bloom filters, 1 bucket}


     5 Node Cluster                                                                                 Raptor                              Map/Reduce
   OperationTable Size                                                                256MB            1GB           10GB      256MB      1GB       10GB
   Load Time                                                                            4.8s            22s            4.2m     54.6s      3.5m    36.0m
   Simple select wihout predicate                                                       6.9s             22s           3.2m     21.6s      50.5s     9.5m
   Select with complex Predicate                                                        0.8s            1.7s           0.9m      9.2s      32.2s     4.1m

   Select with Order By                                                                1.6s            6.5s            1.1m     41.0s      3.2m      5.6m
   Select with Group By                                                                 0.6s            1.2s          0.8m      61.3s      1.5m      4.0m
   Simple Join                                                                          2.9s           5.7s           1.7m      1.2m       3.4m      9.2m
   User Defined row level Function                                                      0.6s            1.1s           0.9m     13.2s       49s      7.3m
Proprietary and Confidential. Not to be distributed or reproduced without permission           www.sungard.com/globalservices                               21
Case Study: Credit Card Fraud Detection

                    Scribe                 Scribe                   Scribe
                    Clients                Clients                  Clients




                                  Scribe Server                                          Raptor Digester

      Cache                                                                                            Segment Data into
                                                                                                       Subsets/Buckets/
                                                                                                       Partitions



                                                                                       Raptor-Hadoop                                                               Raptor Policy Manager
                                                                                                                                          Pruning - Removes
                                                                                                                                          redundant Classifiers from
                                                                Mapreduce                        Quantum                                  ensemble
                                                                                                 Processor



                                  Apply Data Mining
                                  Techniques to generate                                                                                Periodic Fraud Report
                                  Classifiers in parallel                                                                               Generation
                                                                               Raptor Classifier                                                                     Raptor-Hive
                                                                                                          Combine resultant base
                                                                                                          models to generate a
                                                         uses Meta-Classifiers                            Meta-Classifier
                                                         to Flag Transactions
                                                                                                                                                                Analytics Results & Reports
                                                                                  Flagging Cluster


                                                                                                            Sends out Notifications
                                                                               Notifications Cluster                                                                    PG SQL

                   Credit Card Fraud Application
Proprietary and Confidential. Not to be distributed or reproduced without permission                   www.sungard.com/globalservices                                                         22
Other Raptor Use Cases


               Smart customer care solution

               Financial fraud analytics

               Media usage log analytics

               Computation intensive jobs using GPUs

               Predictive trading




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   23
Roadmap


           Merge Raptor into Next Generation MapReduce

           Dimensions and Cubes (MRCubes)

           Cloud ready Raptor

           Roles and security

           Job failover management

           Rules and triggers

           Planning to open source

           Starter Kits/Examples of various Use Cases

Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   24
Benefits


             Query responses at soft real-time windows.

             Code generation framework, for table specific raptor
                       infrastructure code.

             Zero down time, with hot deployment of generated
                       infrastructure code.

             Seamless integration into existing infrastructure with
                       multiple ingress options.

             Automatic time based sharding and data archival.

             Distribute & execute non-MR jobs on the Hadoop Cluster
Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices   25
Contacts

                 Soundar Velu                                                                                  Anil Batchu
                 Product Architect                                                                             Research Engineer
                 Soundararajan.velu@sungard.com                                                                AnilKumar.B@sungard.com
                 Twitter: @greyquest




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices
References


             Optimizing Distributed Joins with Bloom Filters
                 www.l3s.de/web/upload/documents/1/analysis.pdf
             Apache Hadoop Goes Realtime at Facebook
                 borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf
             Data Mining with MAPREDUCE: Graph and Tensor Algorithms with MR
                 www.ml.cmu.edu/research/dap-papers/tsourakakisdap.pdf
             Distributed Cube Materialization on Holistic Measures∗
                 www.eecs.umich.edu/~congy/work/icde11a.pdf
             HadoopDB in Action: Building Real World Applications
                     www.cs.yale.edu/homes/dna/papers/hadoopdb-demo.pdf
             TeraByte Sort on Apache Hadoop
                     sortbenchmark.org/YahooHadoop.pdf
             Mahouth in Action
                     www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684
             Web-Scale K-Means Clustering
                     www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf
             Hive – A Petabyte Scale Data Warehouse Using Hadoop
                     infolab.stanford.edu/~ragho/hive-icde2010.pdf                                                      27
Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices
CONFIDENTIALITY STATEMENT

                     Copyright © 2011 by SunGard Data Systems (or its subsidiaries, “SunGard”). All rights reserved. No parts of this
                     document may be reproduced, transmitted or stored electronically without SunGard’s prior written permission.

                     This document contains SunGard's confidential or proprietary information. By accepting this document, you
                     agree that: (A)(1) if a pre-existing contract containing disclosure and use restrictions exists between your company
                     and SunGard, you and your company will use this information subject to the terms of the pre-existing contract; or
                     (2) if no such pre-existing contract exists, you and your Company agree to protect this information and not
                     reproduce or disclose the information in any way; and (B) SunGard makes no warranties, express or implied, in this
                     document, and SunGard shall not be liable for damages of any kind arising out of use of this document.




Proprietary and Confidential. Not to be distributed or reproduced without permission   www.sungard.com/globalservices                       28

Contenu connexe

Similaire à Hadoop World 2011: Raptor: Real-time Analytics on Hadoop - Soundararajan Velu - Sungard

SunGard Enterprise Cloud Services @ Cloud Connect 2011
SunGard Enterprise Cloud Services @ Cloud Connect 2011SunGard Enterprise Cloud Services @ Cloud Connect 2011
SunGard Enterprise Cloud Services @ Cloud Connect 2011Satish Hemachandran
 
Making the Move to SaaS: 10 Key Technical Considerations
Making the Move to SaaS: 10 Key Technical Considerations Making the Move to SaaS: 10 Key Technical Considerations
Making the Move to SaaS: 10 Key Technical Considerations OpSource
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureDataWorks Summit
 
Securing and Governing Cloud APIs
Securing and Governing Cloud APIsSecuring and Governing Cloud APIs
Securing and Governing Cloud APIsCA API Management
 
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned FIT Solutions
 
Secure Enterprise Cloud
Secure Enterprise CloudSecure Enterprise Cloud
Secure Enterprise CloudIndu Kodukula
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightHortonworks
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOpenStorageSummit
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesDataWorks Summit
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London SeminarHortonworks
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisOW2
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks
 
CommVault Intro SureSkills Simpana 10 Event 2013
CommVault Intro SureSkills Simpana 10 Event 2013CommVault Intro SureSkills Simpana 10 Event 2013
CommVault Intro SureSkills Simpana 10 Event 2013Google
 
Cs itools06 csia_reporting_features_v10
Cs itools06 csia_reporting_features_v10Cs itools06 csia_reporting_features_v10
Cs itools06 csia_reporting_features_v10HermansJohan
 
Overview of XRF2 - A Unilog Product
Overview of XRF2 - A Unilog ProductOverview of XRF2 - A Unilog Product
Overview of XRF2 - A Unilog Productbasuchit
 

Similaire à Hadoop World 2011: Raptor: Real-time Analytics on Hadoop - Soundararajan Velu - Sungard (20)

London hug
London hugLondon hug
London hug
 
SunGard Enterprise Cloud Services @ Cloud Connect 2011
SunGard Enterprise Cloud Services @ Cloud Connect 2011SunGard Enterprise Cloud Services @ Cloud Connect 2011
SunGard Enterprise Cloud Services @ Cloud Connect 2011
 
Making the Move to SaaS: 10 Key Technical Considerations
Making the Move to SaaS: 10 Key Technical Considerations Making the Move to SaaS: 10 Key Technical Considerations
Making the Move to SaaS: 10 Key Technical Considerations
 
Hadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the FutureHadoop Turns a Corner and Sees the Future
Hadoop Turns a Corner and Sees the Future
 
Securing and Governing Cloud APIs
Securing and Governing Cloud APIsSecuring and Governing Cloud APIs
Securing and Governing Cloud APIs
 
Soa Primer
Soa PrimerSoa Primer
Soa Primer
 
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned
SAP Inside Track Istanbul 2011 - Sybase Mobile Sales Project - Lessons Learned
 
Secure Enterprise Cloud
Secure Enterprise CloudSecure Enterprise Cloud
Secure Enterprise Cloud
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsightBig Data, Hadoop, Hortonworks and Microsoft HDInsight
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
Hadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation ArchitecturesHadoop's Opportunity to Power Next-Generation Architectures
Hadoop's Opportunity to Power Next-Generation Architectures
 
Teradata Big Data London Seminar
Teradata Big Data London SeminarTeradata Big Data London Seminar
Teradata Big Data London Seminar
 
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, ParisHadoop's Role in the Big Data Architecture, OW2con'12, Paris
Hadoop's Role in the Big Data Architecture, OW2con'12, Paris
 
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptxHortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
 
CommVault Intro SureSkills Simpana 10 Event 2013
CommVault Intro SureSkills Simpana 10 Event 2013CommVault Intro SureSkills Simpana 10 Event 2013
CommVault Intro SureSkills Simpana 10 Event 2013
 
Forecasteronline.Com
Forecasteronline.ComForecasteronline.Com
Forecasteronline.Com
 
Cs itools06 csia_reporting_features_v10
Cs itools06 csia_reporting_features_v10Cs itools06 csia_reporting_features_v10
Cs itools06 csia_reporting_features_v10
 
Overview of XRF2 - A Unilog Product
Overview of XRF2 - A Unilog ProductOverview of XRF2 - A Unilog Product
Overview of XRF2 - A Unilog Product
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Hadoop World 2011: Raptor: Real-time Analytics on Hadoop - Soundararajan Velu - Sungard

  • 1. Raptor: Real-time Analytics on Hadoop Soundar Velu Product Architect, Advanced Technology, SunGard Anil Kumar Batchu Research Engineer, Applied Research, SunGard Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices
  • 2. Who We Are, Who Am I, What We Do?  Who We Are & What We Do?  Fortune 500 Company  Financial Services Firm  Provide software & consulting services across the industry  Exploring impacts of Big Data approach for last 2+ years  Who Am I?  Part of Applied Research & Consulting group based out of Bangalore, India  We focus on latest technology trends Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 1
  • 3. Agenda  Financial Services and Data Problems  Raptor Architecture Overview  Raptor Components  Benchmarks  Future Enhancements Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 2
  • 4. Financial Services & Typical “data intensive” Problems Wholesale Markets Portfolio Construction Price Forecasting Risk Calculation Compliance Surveillance Batch Processing Retail Markets Fraud Detection Targeted Marketing Demand Forecasting Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 3
  • 5. Implementation Constraints in Financial Services RDBMS Legacy Burden Centricity Architectures, languages, Constrained semantics, tools, systems, skills rigidity and specificity Data Silos Cost of Change Governance “You don’t own Evolution is the only answer that data, I do” Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 4
  • 6. What Did We Need?  A generic, reliable, and cost effective analytics solution with a wide range of application areas  Query execution and analytics at soft real-time windows (acceptable and consistent latencies)  Minimum customization, seamless integration and ease of use  Policy around data storage and processing  Adaptive segmentation algorithms for optimized data search (Indexing, partitioning and filtering) Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 5
  • 7. Some Limitations with Traditional Hadoop  Jobs are executed in a brute force fashion, causing complete scans of files for every single query  Long warm up time for jobs, performs poorly for relatively smaller data sets.  Scheduling imbalance in HDFS operations and job execution  Limitations around the kind of jobs that can be executed with MapReduce, (non equality joins not supported)  Open bugs and lacking features, memory management bottlenecks  Only for batch mode based applications, does not fit real-time analytics scenarios Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 6
  • 8. Raptor Application Stack Z Flume Scribe Sqoop o o Hue Raptor Oozie K e Hive/Pig e p HBase Map Reduce Quantum Processor e r HDFS Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 7
  • 9. Raptor Architecture Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 8
  • 10. Raptor Digester - Level 1 Segmentation Database Data Stream Logs/Files Data load adaptors Data Cleanser Call back Digester Raptor Client Table specific data transformer and partitioner, partition based on Data Transformer Call back demographics, Storage buckets customers, IP address or composite columns. Store partitioned HDFS data into respective file buckets in HDFS. Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 9
  • 11. Raptor Block Analyzer – Level 2 Segmentation Write Block Raptor client DataXceiver Offer BlockID for async-Indexing IndexFactory BlockReceiver Get Indexer for Asynchronously Block analyze/Index Block DataBlock Index Queue Block Analyzer DataBlock Indexer BloomFilter Store column meta information, Every block is asynchronously analyzed Block index map and column level bloom filters in ThorCache. and the resulting column level blooms, index tree and column level meta information is stored in ThorCache. All this Thor Block happens within the DataNode Context. Index Cache Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 10
  • 12. Raptor Client Framework Raptor Client is similar to dfsclient or jobclient, it stands as a primary interface into Raptor. Block Analyzer • Get Indexer for block • Mirror Indexer to replica nodes Code Generator NameNode Deploy Generated • Get block info for file Raptor Executor Code • Get file stats • Execute internal task • Clean up task • Get result segments Digester Raptor Client DataNode • Get Digester for table • Invoke Quantum Processor Services • Get Analyzer for table • Manage Indexes • Execute Jobs • Get Block analyzers Raptor Optimizer • Deploy generated infrastructure patch Re-Index Block • Get analyzer for table Re-Balance blocks • Get Indexer for table • Get adaptive metrics Adaptive Engine Raptor-Hive • Get indexer for block/table Policy Manager • Get Hive Meta-data for table • Get adaptive metrics for table • Auto shard buckets • Get Raptor Meta Data for table • Drop indexes for buckets • Get digester for Table • Archive Buckets • Deploy generated infrastructure patch Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 11
  • 13. Getting Results Out – End to End Flow Client Execute HQL Thrift Application Statement Client Hive JDBC Connection Thrift ResultSet Client Client applications create a hive JDBC Thrift Hive Server connection to Raptor-HiveServer. Client fires HQL Queries over the connection. The query is executed by Raptor and a Hive ResultSet is returned as Result. This is the interface used by Raptor-Hive Clients to access data in HDFS. e.g. “Select Name, CNO, TxID, floor(TxAmt) Fetch Task Execution from Credit_Tx order by CNO” Fetch Results Write Results Execute Non-HQL queries directly with raptor Client, HDFS Raptor Client Quantum Processor Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 12
  • 14. Raptor-Hive Processing Engine HiveServer Get Table Block JDBC Client Analyzers Datanodes Compile Optimize Raptor Optimizer Hive MetaStore Execution Plan Metrics published to Adaptive engine Raptor MetaStore Raptor Raptor Adaptive Executor Engine Raptor Client MapReduce Raptor HBase Quantum Processor Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 13
  • 15. Quantum Processor Framework Hive Executor QuantumNode is a service similar to DataXceiver on DataNode. All RaptorClient RaptorProtocolHelper Raptor operations are executed by the Quantum Processor framework. Quantum Node QuantumProcessor GetNodeStats SearchBlock ReadResultSegment Manage Indexes ReadBoundaryRecord ExecuteInternalTask Get File analyzers ExecuteExternalTask Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 14
  • 16. Raptor Block Searcher/Processor Block Searcher analyzes the block indexes and bloom filters Hive Executor Process/Search RaptorClient QuantumProcessor and extracts the resultant Block indexed records via index based offset reader or if no indexes are available then it does a complete block scan. Thor Block Index BlockSearcher Cache Block Index Block Index No /Bloom Seq Block Reader Available Process All Column Level Records Bloom Yes Quantum Operator Column Meta Search Index Tree Data Result Writer Index Block Reader HDFS Process Result Records Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 15
  • 17. Quantum Operator Quantum Operator takes block records as input and performs the required field inspections, and applies UDFs and aggregation. The resultant records are either written to local segment files or to HDFS directly based on the operator type. Evaluate rows Block Searcher Quantum Operator Read Remote Seg Offer result rows Read k-remote segment files, if reduce operator. Write Result Inspect Rows Apply Join/Merge Apply UDF Has Yes reduce child? Write results to k-segmented local files, if child is a reduce No operator. Local Quantum Operators: SelectOperator, JoinOperator, HDFS Segment File SortByOperator, GropuByOperator, GPUOperator, etc… Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 16
  • 18. Adaptive Indexing Execute Hive Queries Collect Execution Metrics Raptor Adaptive Engine Asynchronous Adaptive Raptor Metadata Raptor adaptive engine collects metrics Indexer from the executed queries. Metrics include the columns and tables involved Adaptive in the query, the types of aggregation Index executed on columns etc… these metrics Scheduler are used by the adaptive index scheduler to schedule re-indexing of the blocks based on these usage metrics. Re Analyze/Index Blocks based on usage/query Metrics HDFS DataNode DataNode DataNode DataNode ThorIndex Cache Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 17
  • 19. Code Generation and Data Definition Framework User supplies table schema in an XML format. Based on the Xml table schema and hints, Raptor code generator generates metadata in Raptor Code Generator hive metadata store, HBase and raptor metadata store. It also generates table specific indexer column analyzers and bloom filter classes. Create Hive Table Definitions HiveServer Hive Metadata HBase Create Raptor Table Metadata Raptor Adaptive Engine Raptor Metadata Raptor Client Generate Table Specific Indexers, Bloom Filter , Column Deploy Generated code to Analyzers and digester Raptor- HiveServer and DataNodes Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 18
  • 20. Raptor Policy Manager – Time Based Shard Time based auto sharding, as per the Raptor Policy Manager policy defined by the user. Table buckets are moved from highly segmented time shards to sparsely segmented archive Raptor Metadata store over time. HDFS Completely Bucket-1 Bucket-2 Bucket-3 Primary Segmented Time-Shard Partially Bucket-1 Bucket-2 Bucket-3 Secondary Segmented Time-Shard-1 … All Segments Merged and Indexes dropped Bucket-1 Bucket-2 Bucket-3 Archive Store Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 19
  • 21. Infrastructure Enhancements  Adaptive compression based on network congestion statistics  Scheduling of jobs via raptor client (via Hive) in conjunction with enhanced NameNode block policy  Computation intensive jobs scheduled on GPU enabled nodes  Object Pools across the Hadoop-Raptor ecosystem  Hand-shake mechanism between clients and DataNodes to avoid imminent operation failures  Interactive user console for managing the cluster, tasks, data policy, filters etc… Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 20
  • 22. Benchmark Raptor Benchmark carried out on a 5 node cluster with commodity hardware { Core2-Duo, 4GB RAM, 1TB storage, 100Mbps NIC, 64MB block size, Replication factor 3, Network Compression enabled} Table with 13 columns and mixed types. Sample data generated with node.js with moderate entropy. (table with 1 group index{3 columns} and 0 column bloom filters, 1 bucket} 5 Node Cluster Raptor Map/Reduce OperationTable Size 256MB 1GB 10GB 256MB 1GB 10GB Load Time 4.8s 22s 4.2m 54.6s 3.5m 36.0m Simple select wihout predicate 6.9s 22s 3.2m 21.6s 50.5s 9.5m Select with complex Predicate 0.8s 1.7s 0.9m 9.2s 32.2s 4.1m Select with Order By 1.6s 6.5s 1.1m 41.0s 3.2m 5.6m Select with Group By 0.6s 1.2s 0.8m 61.3s 1.5m 4.0m Simple Join 2.9s 5.7s 1.7m 1.2m 3.4m 9.2m User Defined row level Function 0.6s 1.1s 0.9m 13.2s 49s 7.3m Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 21
  • 23. Case Study: Credit Card Fraud Detection Scribe Scribe Scribe Clients Clients Clients Scribe Server Raptor Digester Cache Segment Data into Subsets/Buckets/ Partitions Raptor-Hadoop Raptor Policy Manager Pruning - Removes redundant Classifiers from Mapreduce Quantum ensemble Processor Apply Data Mining Techniques to generate Periodic Fraud Report Classifiers in parallel Generation Raptor Classifier Raptor-Hive Combine resultant base models to generate a uses Meta-Classifiers Meta-Classifier to Flag Transactions Analytics Results & Reports Flagging Cluster Sends out Notifications Notifications Cluster PG SQL Credit Card Fraud Application Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 22
  • 24. Other Raptor Use Cases  Smart customer care solution  Financial fraud analytics  Media usage log analytics  Computation intensive jobs using GPUs  Predictive trading Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 23
  • 25. Roadmap  Merge Raptor into Next Generation MapReduce  Dimensions and Cubes (MRCubes)  Cloud ready Raptor  Roles and security  Job failover management  Rules and triggers  Planning to open source  Starter Kits/Examples of various Use Cases Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 24
  • 26. Benefits  Query responses at soft real-time windows.  Code generation framework, for table specific raptor infrastructure code.  Zero down time, with hot deployment of generated infrastructure code.  Seamless integration into existing infrastructure with multiple ingress options.  Automatic time based sharding and data archival.  Distribute & execute non-MR jobs on the Hadoop Cluster Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 25
  • 27. Contacts Soundar Velu Anil Batchu Product Architect Research Engineer Soundararajan.velu@sungard.com AnilKumar.B@sungard.com Twitter: @greyquest Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices
  • 28. References  Optimizing Distributed Joins with Bloom Filters www.l3s.de/web/upload/documents/1/analysis.pdf  Apache Hadoop Goes Realtime at Facebook borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf  Data Mining with MAPREDUCE: Graph and Tensor Algorithms with MR www.ml.cmu.edu/research/dap-papers/tsourakakisdap.pdf  Distributed Cube Materialization on Holistic Measures∗ www.eecs.umich.edu/~congy/work/icde11a.pdf  HadoopDB in Action: Building Real World Applications www.cs.yale.edu/homes/dna/papers/hadoopdb-demo.pdf  TeraByte Sort on Apache Hadoop sortbenchmark.org/YahooHadoop.pdf  Mahouth in Action www.amazon.com/Mahout-Action-Sean-Owen/dp/1935182684  Web-Scale K-Means Clustering www.eecs.tufts.edu/~dsculley/papers/fastkmeans.pdf  Hive – A Petabyte Scale Data Warehouse Using Hadoop infolab.stanford.edu/~ragho/hive-icde2010.pdf 27 Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices
  • 29. CONFIDENTIALITY STATEMENT Copyright © 2011 by SunGard Data Systems (or its subsidiaries, “SunGard”). All rights reserved. No parts of this document may be reproduced, transmitted or stored electronically without SunGard’s prior written permission. This document contains SunGard's confidential or proprietary information. By accepting this document, you agree that: (A)(1) if a pre-existing contract containing disclosure and use restrictions exists between your company and SunGard, you and your company will use this information subject to the terms of the pre-existing contract; or (2) if no such pre-existing contract exists, you and your Company agree to protect this information and not reproduce or disclose the information in any way; and (B) SunGard makes no warranties, express or implied, in this document, and SunGard shall not be liable for damages of any kind arising out of use of this document. Proprietary and Confidential. Not to be distributed or reproduced without permission www.sungard.com/globalservices 28

Notes de l'éditeur

  1. Cloud Computing, Distributed Computing, Big Data, Low Latency, Mobile Computing, GPU/FPGA, NG UI Frameworks, Functional Languages, DSLs, etc…
  2. Financial Services firms are investigating next generation technologies for data analytics. Data growth – particularly of unstructured data – poses a special challenge as the volume and diversity of data types outstrip the capabilities of older technologies such as relational databases.Predictive analysis of both internal and external data result in better, proactive management of a wide range of issues from: credit and operational risk e.g. fraud and reputational risk; customer loyalty and profitability; etcBanks, mutual funds, and insurance companies need to transform their data into actionable information to comply with government and industry regulations, manage risk, increase revenue in a competitive economy, identify fraud and improve efficiency.
  3. Financial services firms worldwide are struggling with the challenges of Big Data analytics management - analyzing massive volumes of complex data from many sources in order to better serve and retain customers, find and effectively recruit new prospects, detect and prevent fraud, manage assets and streamline operations.Financial services companies are taking advantage of Big Data analytics to increase profit margins, reduce risk and solidify relationships with profitable customers. Now more than ever, a powerful, scalable, affordable and simple Business Intelligence (BI) solution for business users is key to helping make the right decisions that drive overall success. Data in Financial Services - Use It or Lose It! Financial Services is a domain where what you do with data and what you don't do with data can have a dramatic impact on your success. The BigData movement is attracting both investment and press - horizontally as an analytics infrastructure and vertically with domain specific tools. In financial services, distributed computing and parallel processing has been employed for many years, so what is the utility of BigData for these well-known problems? Is the size of data being analyzed the real challenge or is it the ability to evolve insight and analytic capabilities on what, in relative terms, may really be SmallData? This presentation will discuss how time, cost and information asymmetry are critical competitive advantages in financial services and how the current state of the art is in need of some help from the BigData community. We will also discuss the hardware and software used for typical computation and data processing workloads in financial services and why distributed computing is still reserved for only relatively few niche applications. We will conclude with a discussion on the barriers to adoption of new technology in financial services, why they exist, and how they might be overcome through greater academic and industry collaboration.BigDataWhy the allure?Why the marketing?How is innovation marketed?What is reality, what is hypeThesis – can BigData solve our challenges in FS?Financial Services, as a sectorWholesaleProblemsRiskForecastingProcessingcomplianceRetailProblemsForecastingProcessingcomplianceFS Data intensive computing for….ProcessingAlphaFS Data intensive computing in two flavorsTimeSizeFS Data IntensiveInter-machineGrids, BigData?Intra-machineGPUs, FPGAsFS Data Intensive AlgosGraphsLinear Time Series & eventsCalculusA 2nd look at data intensive computingCloud economicsConsumer successesSocial mediaDigital exhaustcompetitionBarriers to be overcome, barriers created by the burden of data constrained computingLegacy burden (see till slide)Data Silos – governanceJailed semantics with relational DBs
  4. One of the most promising technologies is the Apache Hadoop and MapReduce framework for dealing with this “big data” problem. However current implementations of Hadoop lacks the necessary enterprise robustness demanded by Financial Services firms for wide-scale deployment of MapReduce applications.
  5. Hadoop is a perfect fit for processing petabytes of data distributed over a large cluster of nodes. Developed with a fault tolerant design ground up, which makes it very resilient to hardware and software failures on nodes. The allocation of work to TaskTrackers is very simple. Every TaskTracker has a number of available slots (such as "4 slots"). Every active map or reduce task takes up one slot. The Job Tracker allocates work to the tracker nearest to the data with an available slot. There is no consideration of the current system load of the allocated machine, and hence its actual availability.If one TaskTracker is very slow, it can delay the entire MapReduce job - especially towards the end of a job, where everything can end up waiting for the slowest task. With speculative-execution enabled, however, a single task can be executed on multiple slave nodes.
  6. Raptor within the Hadoop ecosystem, Raptor is a specialized analytics platform, as hive is a generalized analytics platform, it is a transparent addition to the hive system.Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine.Oozie is an open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™. It is an extensible, scalable and data-aware service to orchestrate dependencies between jobs running on Hadoop (including HDFS, Pig and MapReduce). Oozie is a lot of things, but being:A workflow solution for off Hadoop processingAnother query processing API, a la Cascading is not one of them.Sqoop(“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:Imports individual tables or entire databases to files in HDFSGenerates Java classes to allow you to interact with your imported dataProvides the ability to import from SQL databases straight into your Hive data warehouseHue is a browser-based desktop interface for interacting with Hadoop. It supports a file browser, job tracker interface, cluster health monitor, and more.
  7. Raptor is an ensemble of various frameworks and instruments that help segment data to a fine level allowing for spontaneous search on massive data sets. This is achieved with segmentation at two levels and a intelligent data policy manager.With adaptive data segmentation, partitioning ,column level hierarchical indexes and bloom filters, Allows ad hoc spontaneous querying and searching with real-time and near-real-time analytics capabilities.Intelligent optimization algorithms for switching execution of queries & search within Raptor, HBase or MapReducePolicy manager allows optimized indexing and automatic sharding, based on time and bucket sizes.Level 1 segmentationRaptor digester framework, segmentation based on high level parametersLevel 2 segmentationRaptor block analyzer framework, fine level segmentation ThorCacheAn LRU/MRU based spill over cache system, facilitates the segmentation frameworks
  8. Level 1 segmentation:Based on high-level parametersPartition columns, row hashesBusiness specific data segmentation, by enhancing the generated digester code to meet business needs.Dimensioning/cubesAdaptive data transformation.Data is loaded into raptor using the following ways,Direct load from hive, using implicit load function in HQL.Via the Digester, This is an auto-generated code framework specific to the table.Data load Adaptors, these are custom data plug-in that facilitate in capturing data from various sources like databases, internet data streams, log files etc...Table Partitions – This approach helps store data based on columns, segmenting data based on demographics, groups, domains, industries etc… Digester – This component helps crunching down incoming data/streams, feeds the processed records into consistently hashed storage buckets.Storage Buckets – These are append 'able storage files in HDFS, each of the bucket corresponds to a specific Hash Key.Adaptive aggregation & transformation - Calculate data aggregates and vital stats per column as data is loaded into raptor and transform data to a more easy to use format. This is used for query optimization during query execution.Data Cubes & Dimensioning – Based on user hints, specific dimensions and cubes are generated for every table.
  9. Level 2 segmentation:Applied fine tuned segmentation techniquesSingle/hierarchical indexingColumn bloom filtersAdaptive column indexingMeta Column InformationIndexing Engine– This component indexes the records of every data block into a B+ Tree/Lucene index map which is in turn cached into LRU based ViteCache (spill over cache). Does adaptive indexing where  unused column indexes are evicted and most frequently accessed columns get  indexed.Bloom Filter - a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. In our case we use to test if the block/index map consists of the partial search keys.ThorCache– A LRU/MRU based low latency index store to hold the per-block b+ tree/Lucene indices. (allows for to-disk spill over for effective memory management)
  10. Raptor Meta Store – This is a RDBMS based database that store all meta data required by the various components in the system.
  11. Quantum Processor -  is a very light weight processing framework written within the Hadoop layer, provides results in fraction of times when compared with traditional mapreduce. Used for real-time queries,  used when level 1 segmentation  search returns positive/negative and level 2 segmentation search returns Positive.
  12. Business specific Code Generation Framework – This framework generates all the infrastructure components like indexer, digester, table schema, dimensions etc.. Based on user defined xml schema.
  13. Policy Manager –  This component  helps organize  data in a time sorted way, allowing for  moving older bucketized data to sparsely indexed/managed buckets while retaining newer data in highly organized/indexed buckets automatically based on defined policy.  Allowing for efficient resource utilization.
  14. Intelligent packet compression, based on the congestion on network, raptor switches to a resilient compression mode where packets are compressed using cpu efficient algorithm.Scheduling of jobs is now done using namenode operation maps, where namenode maintains information on all the operations {reads, writes, jobs} currently being performed on the cluster. This ensures a even distribution of all operations on the cluster.All computation intensive operations are scheduled on nodes that have capabilities to execute such jobs on GPU’s.Adaptive rebalancing, this function increases the replication factor of frequently used data and ensures a even availability of data, moves data closer to frequently accessed client sub-cluster nodes. Interactive User Console, this provides an interface to the user to manage the cluster, mange tasks, manage indexes, shards, filters, etc..for each table, set up policies around data, archive obsolete data.
  15. 65969
  16. Architecture: Raptor over Hadoop/HBase Raptor partitions incoming transactions Raptor executes Classifiers over GPU for each partition Flagging Cluster uses MetaClassifiers to Flag Transactions Notification Cluster sends out Notifications Raptor also implements Phasing out data and Classifier Pruning. Hive Interface not used except for Periodic Fraud Report GenerationHow does it work?Divide Data into Subsets/Buckets/Partitions (Where Raptor Excels) Apply Data Mining Techniques to generate Classifiers in parallel Combine resultant base models to generate a MetaClassifier The base classifiers execute in parallel while the MetaClassifier combines their results Pruning - Removes redundant Classifiers from ensemble (Very Old Data)Large Scale Data Mining TechniqueUsing a Scalable BlckBox ApproachCost Reduction Based Model usedLearning Algorithms used C4.5, CART, Ripper, Bayes, ID3
  17. Smart customer care solution - allows CSR representatives to query terabytes of real-time business data and handle customer queries on complaints and events. Financial fraud analytics - Retrospective Analysis of incoming financial logs, Ranks entities based on suspicious behavior, generate events on suspected entry triggers.Media usage Log Analytics – “Targeted Marketing”, The analytics system, helps add short term and long term score to marketing content based on media content usage statistics this information helps target end users with the right marketing content.Predictive Trading – Distributed GPU-based event processing platform to significantly speed up algorithm trading computations, namely, market event parsing and market event matching against strategies.Strategies are distributed in disjoint clusters to enables highly parallelizable event matching, In each cluster, strategies are stored as contiguous blocks of memory to enable fast sequential access to improve memory localityComputation intensive jobs using GPUs - Just as in traditional parallel computing, distributed GPU processing scales better than processing on shared-systems because it will not overwhelm shared system resources.