SlideShare une entreprise Scribd logo
1  sur  32
How to Make Hadoop
Easy, Dependable, and Fast

©MapR Technologies - Confidential   1
Agenda

     Quick MapR overview
     Typical and atypical use cases
       –   restaurant recommendation
       –   network security
       –   mega-scale fraud modeling
       –   log analysis through creative abuse of text retrieval


     Lessons Learned
       –   data import and export techniques
       –   zen integration
       –   accessing data from a variety of applications
       –   how to protect data from the most common cause of data loss

©MapR Technologies - Confidential              2
MapR’s Complete Distribution for Apache Hadoop

      Apache Applications are                                    MapR Control System

                                                   MapR
       Integrated, tested,                       Heatmap™
                                                                  LDAP, NIS
                                                                 Integration
                                                                                  Quotas,
                                                                               Alerts, Alarms
                                                                                                       CLI,
                                                                                                     REST APT

       hardened and
       Supported.                            Hive        Pig         Oozle       Sqoop         HBase        Whirr


      100% Hadoop, HBase,                       Mahout Cascading         Naglos       Ganglia        Flume       Zoo-
                                                                         Integration   Integration               keeper
       HDFS API compatible
      Easy portability/
       migration between                Direct   Real-Time                              Snap-           Data
                                        Access   Streaming Volumes Mirrors              shots        Placement
       distributions                     NFS

                                             No NameNode           High Performance             Stateful Failover
      No changes required                    Architecture           Direct Shuffle             and Self Healing
       to Hadoop applications
                                                                         2.7
                                                               MapR’s Storage Services™


    ©MapR Technologies - Confidential                3
MapR: Lights Out Data Center Ready


               Reliable Compute                                  Dependable Storage




     Automated stateful failover                   Business continuity with snapshots
     Automated re-replication                       and mirrors
     Self-healing from HW and SW failures          Recover to a point in time
     Load balancing                                End-to-end check summing
     Rolling upgrades                              Strong consistency
     No lost jobs or data                          Data safe
     99999’s of uptime                             Mirror across sites to meet
                                                     Recovery Time Objectives
    ©MapR Technologies - Confidential        4
Restaurant Recommendation

     Use transaction data to characterize users


     Determine restaurant affinities for transactors


     On demand, produce geo-local restaurant recommendation


     Web or mobile interface




©MapR Technologies - Confidential      5
Restaurant Recommendation

     Training goes-ins
       –   transaction data from purchases
       –   user feedback on recommendations
     Training goes-outs
       –   large recommendation data files


     Online goes-ins
       –   user id, current location, recent transaction history
       –   filters
     Online goes-outs
       –   restaurant recommendations


©MapR Technologies - Confidential              6
What is the Delivery Mechanism

     Database?
       –   export takes forever
       –   limited scalability


     Key value store?
       –   export takes forever
       –   YAWTM (yet another widget to maintain)


     Do we really need a mechanism at all?




©MapR Technologies - Confidential          7
Deploying Recommendations




                                     Final recommendations
                                    computed in browser/app




©MapR Technologies - Confidential                        8
Summary

     With mirrors and NFS, no special “deployment” mechanism is
      necessary


     User’s browser can do final assembly on the recommendations


     Recommendation components served as static files by web-server




©MapR Technologies - Confidential   9
Mega-scale Fraud Modeling

     Why not use the simplest modeling technology around?
       –   similar folk do similar things!
       –   just find tens of thousands of similar folk and see what they did


     Can we make it a million times faster than the prototype?
       –   well, yes … we can


     And can you deploy that into a live system?


     And can sequential and parallel versions co-exist?



©MapR Technologies - Confidential             10
Modeling with k-nearest Neighbors




                                        a



                                    b            c




©MapR Technologies - Confidential           11
Speeds and Feeds

     Single machine version can cluster at 20μs per point
       –   1 million points in ~20s
       –   100 million points in ~2000s = 40 minutes


     Parallel version can cluster at 20μs / nodes per point + 30 seconds
       –   1 million points in 31 s on 20 nodes (ish)
       –   100 million points in 150 s = 2.5 minutes (on 20 nodes)


     Really would like interchangeable versions




©MapR Technologies - Confidential             12
What About Deployment?

     Final matrix size is several GB


     Can’t have copy per thread
       –   can’t even wait to load many copies


     What about mmap?
       –   needs real files, can’t use HDFS
       –   NFS works great


     Need to deploy in map-reduce and real-time environments
       –   can’t depend on Hadoop features like distributed cache


©MapR Technologies - Confidential             13
©MapR Technologies - Confidential   14
Summary

     With mirrors and NFS, no special “deployment” mechanism is
      necessary


     The modeling client can use NFS + mmap share memory between
      threads or processes

     Mirrors can stage as many replicas as desired on whichever
      machines are specified




©MapR Technologies - Confidential    15
Network Security

     Take an existing network security appliance


     Add magical parallel machine learning to find new attacks


     But don’t spend time copying data back and forth


     And don’t change the legacy code




©MapR Technologies - Confidential    16
©MapR Technologies - Confidential   17
Summary

     Legacy code “just works” with MapR’s NFS


     Map-reduce programs don’t care where the input comes from


     Exposing new control data requires no special mechanism




©MapR Technologies - Confidential   18
Log Analysis

     Receive 200K log lines per second or more


     Want to do multi-field search


     Want to search on log lines with < 30 second delay before search




©MapR Technologies - Confidential     19
Solr Based Flexible Analytics

     Solr/Lucene can index at 500K small documents per second


     Faceting provides simple aggregation


     Multiple index search is a given, not a special future enhancement


     Solr/Lucene has awesome record of stability




©MapR Technologies - Confidential    20
Data Ingestion and Indexing



                                                          SolR
                                                        SolR                          Solr
Incoming                                                Indexer
                                                      Text
                                    Kafka             Indexer                       indexer
    Data                                            analysis

                                            Real-time



                                                      Raw                           Live index
                                                                      Older index      shard
                                                   documents            shards

                                                                  Time sharded Solr indexes




©MapR Technologies - Confidential                   21
Some Special Points

     Textual analysis is done in parallel outside of the indexer


     Raw documents are stored outside of Solr to minimize index size


     Index hot-spotting is a feature here because it gives time-based
      sharding


     Indexing into NFS allows legacy code reuse




©MapR Technologies - Confidential      22
Basic Search

                                                                         Solr
                                                                        search
                                    Query   Web tier

                                                             SolR
                                                           SolR
                                                           Indexer
                                                         Solr
                                                         Indexer
                                                        search




                                               Raw                      Live index
                                                          Older index      shard
                                            documents       shards




©MapR Technologies - Confidential                  23
Additional Points

     The number of shards per core can be adjusted easily to match
      load


     Near real-time indexing not really required


     No transaction logs need be kept by Solr for failure tolerance
       –   core failure requires other cores take on lost shards
       –   indexer failure requires indexer restart … Kafka retains unprocessed input
       –   indexing is idempotent




©MapR Technologies - Confidential             24
Secure Search
                                                 Auth
                                                 data

                                                                              Solr
                                               Security                      search
            Query                   Web tier
                                                filter
                                                                  SolR
                                                                SolR
                                                                Indexer
                                                              Solr
                                                              Indexer
                                                             search




                                                  Raw                        Live index
                                                               Older index      shard
                                               documents         shards




©MapR Technologies - Confidential                       25
Conclusions




©MapR Technologies - Confidential        26
Lessons Learned

     Import/export is often a non-issue
       –   NFS allows processing in place


     Legacy access via NFS provides high performance, minimal effort

     Interchangeable map-reduce and conventional programs are key


     Do simple tasks in simple ways. Save the effort for the big tasks




©MapR Technologies - Confidential           27
Zen Integration

     The student went to the master and asked how to integrate
      multiple programs using different models
       –   The master said, “to do more, do less”
     The student went away and came back pointing out that HDFS
      allows copying data in and out. He quoted Turing.
       –   The master said, “to do more, do less”
     The student thought about this for many days. In the
      meantime, the master installed MapR and deleted all the
      integration code.
     When the student returned and saw this, he asked where the
      integration was.
     The master answered “ ” and the student was enlightened.

©MapR Technologies - Confidential            28
The Cause of Almost All Data Loss




©MapR Technologies - Confidential   29
The Cause of Almost All Data Loss




©MapR Technologies - Confidential   30
The Cause of Almost All Data Loss




                And snapshots are the cure (partially)
©MapR Technologies - Confidential   31
Time for Questions

     Download MapR to learn more
       –   http://mapr.com/download


     Send email with questions later
       –   tdunning@maprtech.com


     Tweet as the spirit moves
       –   @ted_dunning


     These slides and other resources
       –   http://www.mapr.com/company/events/speaking/tableau-11-8-12


©MapR Technologies - Confidential        32

Contenu connexe

Tendances

エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...MapR Technologies Japan
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling steveshah
 
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"DataStax Academy
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram Chinta
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesMatt Ray
 
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationUse Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationGunther_01
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares? greggulrich
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is uselessAdrian Cockcroft
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design PatternsDavid Pallmann
 

Tendances (20)

エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
エンタープライズ NoSQL/HBase プラットフォーム – MapR M7 エディション - db tech showcase 大阪 2014 201...
 
Google Compute and MapR
Google Compute and MapRGoogle Compute and MapR
Google Compute and MapR
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
NYC* 2013 — "Using Cassandra for DVR Scheduling at Comcast"
 
Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1Ram chinta hug-20120922-v1
Ram chinta hug-20120922-v1
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
OSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best PracticesOSCON 2012 OpenStack Automation and DevOps Best Practices
OSCON 2012 OpenStack Automation and DevOps Best Practices
 
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing VirtualizationUse Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
Use Cases and Integration Scenarios with SAP Adaptive Computing Virtualization
 
Servers fail, who cares?
Servers fail, who cares? Servers fail, who cares?
Servers fail, who cares?
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Windows Azure Overview
Windows Azure OverviewWindows Azure Overview
Windows Azure Overview
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Cmg06 utilization is useless
Cmg06 utilization is uselessCmg06 utilization is useless
Cmg06 utilization is useless
 
Chris millercloud
Chris millercloudChris millercloud
Chris millercloud
 
Hcj 2013-01-21
Hcj 2013-01-21Hcj 2013-01-21
Hcj 2013-01-21
 
Windows Azure Design Patterns
Windows Azure Design PatternsWindows Azure Design Patterns
Windows Azure Design Patterns
 

En vedette

Opslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseOpslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseFréderique Debecker
 
Tactical and Special Mission Pods
Tactical and Special Mission PodsTactical and Special Mission Pods
Tactical and Special Mission Podspwynns
 
Uc4 wp one_automation_us
Uc4 wp one_automation_usUc4 wp one_automation_us
Uc4 wp one_automation_usLockheed Martin
 
SandeepKumar _Resume
SandeepKumar _ResumeSandeepKumar _Resume
SandeepKumar _ResumeSandeep Kumar
 

En vedette (7)

Opslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseOpslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouse
 
Tactical and Special Mission Pods
Tactical and Special Mission PodsTactical and Special Mission Pods
Tactical and Special Mission Pods
 
Uc4 wp one_automation_us
Uc4 wp one_automation_usUc4 wp one_automation_us
Uc4 wp one_automation_us
 
esi profile
esi profileesi profile
esi profile
 
Fraser Uniquip
Fraser UniquipFraser Uniquip
Fraser Uniquip
 
Mareck Resume
Mareck ResumeMareck Resume
Mareck Resume
 
SandeepKumar _Resume
SandeepKumar _ResumeSandeepKumar _Resume
SandeepKumar _Resume
 

Similaire à How to Make Hadoop Easy, Dependable and Fast

Pandora FMS - Technical presentation
Pandora FMS - Technical presentationPandora FMS - Technical presentation
Pandora FMS - Technical presentationSancho Lerena
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Riak at shareaholic
Riak at shareaholicRiak at shareaholic
Riak at shareaholicfreerobby
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at ShareaholicShareaholic
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)BigDataEverywhere
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale CloudOpen Stack
 
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...Gna Phetsarath
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batchboorad
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event ProcessingSybase Türkiye
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformGruter
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...Edge AI and Vision Alliance
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Next Gen Datacenter
Next Gen DatacenterNext Gen Datacenter
Next Gen DatacenterRui Lopes
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 

Similaire à How to Make Hadoop Easy, Dependable and Fast (20)

Pandora FMS - Technical presentation
Pandora FMS - Technical presentationPandora FMS - Technical presentation
Pandora FMS - Technical presentation
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Riak at shareaholic
Riak at shareaholicRiak at shareaholic
Riak at shareaholic
 
Migrating to Riak at Shareaholic
Migrating to Riak at ShareaholicMigrating to Riak at Shareaholic
Migrating to Riak at Shareaholic
 
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
Big Data Everywhere Chicago: Getting Real with the MapR Platform (MapR)
 
Operating the Hyperscale Cloud
Operating the Hyperscale CloudOperating the Hyperscale Cloud
Operating the Hyperscale Cloud
 
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
New York Storm Users Group 2014-01-28 - Using Storm with MapR M7 for Real-Tim...
 
PhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond BatchPhillyDB Talk - Beyond Batch
PhillyDB Talk - Beyond Batch
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
Sybase Complex Event Processing
Sybase Complex Event ProcessingSybase Complex Event Processing
Sybase Complex Event Processing
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Introduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData PlatformIntroduction to Gruter and Gruter's BigData Platform
Introduction to Gruter and Gruter's BigData Platform
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic..."Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Next Gen Datacenter
Next Gen DatacenterNext Gen Datacenter
Next Gen Datacenter
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 

Plus de MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

Plus de MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Dernier

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

How to Make Hadoop Easy, Dependable and Fast

  • 1. How to Make Hadoop Easy, Dependable, and Fast ©MapR Technologies - Confidential 1
  • 2. Agenda  Quick MapR overview  Typical and atypical use cases – restaurant recommendation – network security – mega-scale fraud modeling – log analysis through creative abuse of text retrieval  Lessons Learned – data import and export techniques – zen integration – accessing data from a variety of applications – how to protect data from the most common cause of data loss ©MapR Technologies - Confidential 2
  • 3. MapR’s Complete Distribution for Apache Hadoop  Apache Applications are MapR Control System MapR Integrated, tested, Heatmap™ LDAP, NIS Integration Quotas, Alerts, Alarms CLI, REST APT hardened and Supported. Hive Pig Oozle Sqoop HBase Whirr  100% Hadoop, HBase, Mahout Cascading Naglos Ganglia Flume Zoo- Integration Integration keeper HDFS API compatible  Easy portability/ migration between Direct Real-Time Snap- Data Access Streaming Volumes Mirrors shots Placement distributions NFS No NameNode High Performance Stateful Failover  No changes required Architecture Direct Shuffle and Self Healing to Hadoop applications 2.7 MapR’s Storage Services™ ©MapR Technologies - Confidential 3
  • 4. MapR: Lights Out Data Center Ready Reliable Compute Dependable Storage  Automated stateful failover  Business continuity with snapshots  Automated re-replication and mirrors  Self-healing from HW and SW failures  Recover to a point in time  Load balancing  End-to-end check summing  Rolling upgrades  Strong consistency  No lost jobs or data  Data safe  99999’s of uptime  Mirror across sites to meet Recovery Time Objectives ©MapR Technologies - Confidential 4
  • 5. Restaurant Recommendation  Use transaction data to characterize users  Determine restaurant affinities for transactors  On demand, produce geo-local restaurant recommendation  Web or mobile interface ©MapR Technologies - Confidential 5
  • 6. Restaurant Recommendation  Training goes-ins – transaction data from purchases – user feedback on recommendations  Training goes-outs – large recommendation data files  Online goes-ins – user id, current location, recent transaction history – filters  Online goes-outs – restaurant recommendations ©MapR Technologies - Confidential 6
  • 7. What is the Delivery Mechanism  Database? – export takes forever – limited scalability  Key value store? – export takes forever – YAWTM (yet another widget to maintain)  Do we really need a mechanism at all? ©MapR Technologies - Confidential 7
  • 8. Deploying Recommendations Final recommendations computed in browser/app ©MapR Technologies - Confidential 8
  • 9. Summary  With mirrors and NFS, no special “deployment” mechanism is necessary  User’s browser can do final assembly on the recommendations  Recommendation components served as static files by web-server ©MapR Technologies - Confidential 9
  • 10. Mega-scale Fraud Modeling  Why not use the simplest modeling technology around? – similar folk do similar things! – just find tens of thousands of similar folk and see what they did  Can we make it a million times faster than the prototype? – well, yes … we can  And can you deploy that into a live system?  And can sequential and parallel versions co-exist? ©MapR Technologies - Confidential 10
  • 11. Modeling with k-nearest Neighbors a b c ©MapR Technologies - Confidential 11
  • 12. Speeds and Feeds  Single machine version can cluster at 20μs per point – 1 million points in ~20s – 100 million points in ~2000s = 40 minutes  Parallel version can cluster at 20μs / nodes per point + 30 seconds – 1 million points in 31 s on 20 nodes (ish) – 100 million points in 150 s = 2.5 minutes (on 20 nodes)  Really would like interchangeable versions ©MapR Technologies - Confidential 12
  • 13. What About Deployment?  Final matrix size is several GB  Can’t have copy per thread – can’t even wait to load many copies  What about mmap? – needs real files, can’t use HDFS – NFS works great  Need to deploy in map-reduce and real-time environments – can’t depend on Hadoop features like distributed cache ©MapR Technologies - Confidential 13
  • 14. ©MapR Technologies - Confidential 14
  • 15. Summary  With mirrors and NFS, no special “deployment” mechanism is necessary  The modeling client can use NFS + mmap share memory between threads or processes  Mirrors can stage as many replicas as desired on whichever machines are specified ©MapR Technologies - Confidential 15
  • 16. Network Security  Take an existing network security appliance  Add magical parallel machine learning to find new attacks  But don’t spend time copying data back and forth  And don’t change the legacy code ©MapR Technologies - Confidential 16
  • 17. ©MapR Technologies - Confidential 17
  • 18. Summary  Legacy code “just works” with MapR’s NFS  Map-reduce programs don’t care where the input comes from  Exposing new control data requires no special mechanism ©MapR Technologies - Confidential 18
  • 19. Log Analysis  Receive 200K log lines per second or more  Want to do multi-field search  Want to search on log lines with < 30 second delay before search ©MapR Technologies - Confidential 19
  • 20. Solr Based Flexible Analytics  Solr/Lucene can index at 500K small documents per second  Faceting provides simple aggregation  Multiple index search is a given, not a special future enhancement  Solr/Lucene has awesome record of stability ©MapR Technologies - Confidential 20
  • 21. Data Ingestion and Indexing SolR SolR Solr Incoming Indexer Text Kafka Indexer indexer Data analysis Real-time Raw Live index Older index shard documents shards Time sharded Solr indexes ©MapR Technologies - Confidential 21
  • 22. Some Special Points  Textual analysis is done in parallel outside of the indexer  Raw documents are stored outside of Solr to minimize index size  Index hot-spotting is a feature here because it gives time-based sharding  Indexing into NFS allows legacy code reuse ©MapR Technologies - Confidential 22
  • 23. Basic Search Solr search Query Web tier SolR SolR Indexer Solr Indexer search Raw Live index Older index shard documents shards ©MapR Technologies - Confidential 23
  • 24. Additional Points  The number of shards per core can be adjusted easily to match load  Near real-time indexing not really required  No transaction logs need be kept by Solr for failure tolerance – core failure requires other cores take on lost shards – indexer failure requires indexer restart … Kafka retains unprocessed input – indexing is idempotent ©MapR Technologies - Confidential 24
  • 25. Secure Search Auth data Solr Security search Query Web tier filter SolR SolR Indexer Solr Indexer search Raw Live index Older index shard documents shards ©MapR Technologies - Confidential 25
  • 27. Lessons Learned  Import/export is often a non-issue – NFS allows processing in place  Legacy access via NFS provides high performance, minimal effort  Interchangeable map-reduce and conventional programs are key  Do simple tasks in simple ways. Save the effort for the big tasks ©MapR Technologies - Confidential 27
  • 28. Zen Integration  The student went to the master and asked how to integrate multiple programs using different models – The master said, “to do more, do less”  The student went away and came back pointing out that HDFS allows copying data in and out. He quoted Turing. – The master said, “to do more, do less”  The student thought about this for many days. In the meantime, the master installed MapR and deleted all the integration code.  When the student returned and saw this, he asked where the integration was.  The master answered “ ” and the student was enlightened. ©MapR Technologies - Confidential 28
  • 29. The Cause of Almost All Data Loss ©MapR Technologies - Confidential 29
  • 30. The Cause of Almost All Data Loss ©MapR Technologies - Confidential 30
  • 31. The Cause of Almost All Data Loss And snapshots are the cure (partially) ©MapR Technologies - Confidential 31
  • 32. Time for Questions  Download MapR to learn more – http://mapr.com/download  Send email with questions later – tdunning@maprtech.com  Tweet as the spirit moves – @ted_dunning  These slides and other resources – http://www.mapr.com/company/events/speaking/tableau-11-8-12 ©MapR Technologies - Confidential 32

Notes de l'éditeur

  1. MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates while providing a 100% compatible Hadoop for Apache distribution.This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
  2. With MapR Hadoop is Lights out Data Center ReadyMapR provides 5 99999’s of availability including support for rolling upgrades, self –healing and automated stateful failover. MapR is the only distribution that provides these capabilities, MapR also provides dependable data storage with full data protection and business continuity features. MapR provides point in time recovery to protect against application and user errors. There is end to end check summing so data corruption is automatically detected and corrected with MapR’s self healing capabilities. Mirroring across sites is fully supported.All these features support lights out data center operations. Every two weeks an administrator can take a MapR report and a shopping cart full of drives and replace failed drives.