SlideShare une entreprise Scribd logo
1  sur  26
Mastering MapReduce Series, Session I:MapReduce for Big Data Management and Analysis Curt Monash, Monash Research Steve Wooledge, Aster Data Peter Pawlowski, Aster Data Eric Friedman, Aster Data October 15th, 2009
 Aster Data Overview  SQL-MapReduce  Example SQL-MapReduce applications  SQL-MapReduce Syntax/example  Q&A Topics
Aster Data Creating the Next-Generation Data Management System Founded in 2005 to revolutionize data processing & management of very large data volumes Founding team innovated on the ‘big data’ problem at Stanford University and were joined by big data experts from Google, Oracle, and Microsoft Aster’s first commercial product, nCluster, has been in market since 2007. Customers include MySpace, LinkedIn, Coremetrics, Akamai, others. Since 2008, innovated on Google’s well-known MapReduceframework to transform data processing. Created patent-pending  SQL-MapReduce(In-Database MapReduce)
Example Data-Driven Applications  Large Data Volumes and Analytics-Intensive ,[object Object]
Service Personalization (e.g. telco)
Graph analysis
Consumer segmentation
Consumer buying patterns and consumer behavior
Click-stream analysis
Compliance & Regulatory Reporting
Predictive and granular forecasting
Trend analysis and modeling
Credit and Risk management
Fraud detection
Cross-platform ad and event attribution
Cross-platform media affinity analysis,[object Object]
Improving Computation Push-Down Cycle Time = Seconds to Minutes BI Reports  Server DataMining Workload Common SQL Queries: aggregation, sub-sets & samples MPP Database Confidential and proprietary. Copyright © 2009 Aster Data Systems 6
Aster’s Solution - A Massively Parallel Data Warehouse With the Unique Ability to Embed Applications Deeper, Faster Analytics on Big Data OtherApplications(C, C++, Perl, Python…) Leading BI Tools Key Classes ofApplications Custom JAVAApplications Custom .NET Applications Packaged Analytic Apps 6 Aster nCluster System Aster’s SQL-MapReduce orStandard Interfaces Unified  Interface SQL SQL-MapReduce 5 High Volume, Fast Querying Industry-leading  WLM: 300+  Concurrent Workloads 4 Dynamic Workload Manager (WLM) Data .NET App Java App Embedded Parallelized Apps – executes within the DB Pack’gdApp Other Apps 3 3 Data Data Data Data Data Data MPP Data Warehouse withIncremental Scaling  (scale by function) Data Data Data Data Data 2  Massively -Parallel  Data Store 1 Commodity Hardware
Aster SQL-MapReduce (SQL-MR) Bring your applications to the data “Data-Applications” Development Platform Rich portfolio of supported languages – Java, .NET, Python, Ruby, Perl, C++, R and More Use SQL to develop rich data apps Expressive flexibility Reusability across applications and reports
Full Tilt Poker: Fraud DetectionThe second largest online poker site in the world Objective: Improve fraud analytics and stop revenue leakage Before: Separate Java-based fraud detection applications ran once a week	 ,[object Object]
Java-based program ran the data mining on extracted data
Algorithm had to be oversimplified due to performance limitations
Fraud was detected too late or not at allAfter: Store and analyze all data in one location…the Aster database with SQL-MapReduce ,[object Object]
Enriched fraud algorithm is now catching previously undetected fraud
Query performance improved by 60x (90 mins down to 90 secs)9 Confidential and proprietary. Copyright © 2009 Aster Data Systems
Aster’s Patent-Pending SQL-MapReduce Enables faster, easier, and more powerful analytics  SQL-MapReduce framework (for developers to create and extend) Flexible: MapReduce expressiveness, languages, polymorphism Performance: Massive parallelization, computational push-down  Availability: Fault isolation, resource management  Powerful SQL-MR functions (for analysts to consume) Deep insights: Unlimited analytical power at your disposal Ease of use: Simply plug in to the SQL you know and love The Power of Aster’s SQL-MapReduce Framework Write Install Use and Reuse Write a SQL-MR function in Java, C, etc. Install inside Aster nCluster Invoke SQL-MR function from SQL 3 1 2

Contenu connexe

Tendances

Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
DataWorks Summit
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
Raul Chong
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Con LA
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Publicis Sapient Engineering
 

Tendances (20)

Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 

Similaire à Mastering MapReduce: MapReduce for Big Data Management and Analysis

Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
VMware Tanzu
 

Similaire à Mastering MapReduce: MapReduce for Big Data Management and Analysis (20)

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 words
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 

Plus de Teradata Aster

SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation public
Teradata Aster
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Teradata Aster
 

Plus de Teradata Aster (20)

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic Environment
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey Data
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual Analytics
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social Media
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball Analytics
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics,
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big Data
 
Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced Analytics
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation public
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
 
comScore
comScorecomScore
comScore
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Mastering MapReduce: MapReduce for Big Data Management and Analysis

  • 1. Mastering MapReduce Series, Session I:MapReduce for Big Data Management and Analysis Curt Monash, Monash Research Steve Wooledge, Aster Data Peter Pawlowski, Aster Data Eric Friedman, Aster Data October 15th, 2009
  • 2. Aster Data Overview SQL-MapReduce Example SQL-MapReduce applications SQL-MapReduce Syntax/example Q&A Topics
  • 3. Aster Data Creating the Next-Generation Data Management System Founded in 2005 to revolutionize data processing & management of very large data volumes Founding team innovated on the ‘big data’ problem at Stanford University and were joined by big data experts from Google, Oracle, and Microsoft Aster’s first commercial product, nCluster, has been in market since 2007. Customers include MySpace, LinkedIn, Coremetrics, Akamai, others. Since 2008, innovated on Google’s well-known MapReduceframework to transform data processing. Created patent-pending SQL-MapReduce(In-Database MapReduce)
  • 4.
  • 8. Consumer buying patterns and consumer behavior
  • 11. Predictive and granular forecasting
  • 13. Credit and Risk management
  • 15. Cross-platform ad and event attribution
  • 16.
  • 17. Improving Computation Push-Down Cycle Time = Seconds to Minutes BI Reports Server DataMining Workload Common SQL Queries: aggregation, sub-sets & samples MPP Database Confidential and proprietary. Copyright © 2009 Aster Data Systems 6
  • 18. Aster’s Solution - A Massively Parallel Data Warehouse With the Unique Ability to Embed Applications Deeper, Faster Analytics on Big Data OtherApplications(C, C++, Perl, Python…) Leading BI Tools Key Classes ofApplications Custom JAVAApplications Custom .NET Applications Packaged Analytic Apps 6 Aster nCluster System Aster’s SQL-MapReduce orStandard Interfaces Unified Interface SQL SQL-MapReduce 5 High Volume, Fast Querying Industry-leading WLM: 300+ Concurrent Workloads 4 Dynamic Workload Manager (WLM) Data .NET App Java App Embedded Parallelized Apps – executes within the DB Pack’gdApp Other Apps 3 3 Data Data Data Data Data Data MPP Data Warehouse withIncremental Scaling (scale by function) Data Data Data Data Data 2 Massively -Parallel Data Store 1 Commodity Hardware
  • 19. Aster SQL-MapReduce (SQL-MR) Bring your applications to the data “Data-Applications” Development Platform Rich portfolio of supported languages – Java, .NET, Python, Ruby, Perl, C++, R and More Use SQL to develop rich data apps Expressive flexibility Reusability across applications and reports
  • 20.
  • 21. Java-based program ran the data mining on extracted data
  • 22. Algorithm had to be oversimplified due to performance limitations
  • 23.
  • 24. Enriched fraud algorithm is now catching previously undetected fraud
  • 25. Query performance improved by 60x (90 mins down to 90 secs)9 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 26. Aster’s Patent-Pending SQL-MapReduce Enables faster, easier, and more powerful analytics SQL-MapReduce framework (for developers to create and extend) Flexible: MapReduce expressiveness, languages, polymorphism Performance: Massive parallelization, computational push-down Availability: Fault isolation, resource management Powerful SQL-MR functions (for analysts to consume) Deep insights: Unlimited analytical power at your disposal Ease of use: Simply plug in to the SQL you know and love The Power of Aster’s SQL-MapReduce Framework Write Install Use and Reuse Write a SQL-MR function in Java, C, etc. Install inside Aster nCluster Invoke SQL-MR function from SQL 3 1 2
  • 27.
  • 29.
  • 31.
  • 33.
  • 35. Expensive HW & maintenanceBest of both worlds! Traditional Database
  • 36. MapReduce Applications Behavioral Analytics (CRM) Sequential pattern analysis (e.g., up-sell/cross-sell) Spam/BOT analysis Sessionization analysis Risk & Fraud analysis Consumer credit scoring/default risk, market risk/VaR, operational risk, etc Fraud detection Graph analysis Social network “connectedness” (e.g., SSSP, APSP, etc) Text analysis Tokenization (e.g., word count classification) Natural language processing Statistical analysis (machine learning) Linear regression K-means clustering R Project algorithms
  • 37. Aster’s SQL-MapReduce Library: Pre-packaged (SDK), SQL-MR APIs, and documentation Pre-packaged SQL-MR sample functions nPath – complex sequential analysis for time-series and behavioral pattern analysis SSSP – single source shortest path Graph algorithm useful for fraud and segmentation analysis Sessionize– session categorization based on a sequence of clicks within a specified timeout Approximate percentiles – ultra-fast percentile (or N-tile) statistical distribution analysis Linear regression – statistical technique used to predict values based on a set of related variables. Tokenize – text analysis that splits strings into words, categorizes them, and does a word count
  • 38.
  • 39. Requires dozens of SQL queries every N minutes (dozens of times per day)
  • 40.
  • 41. Significantly simpler code: <100 lines vs. 1000 lines
  • 42. Single pass over data for optimal performanceSource: Avinash Kaushik, Occam’s Razor, Nov ‘08 14 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 43.
  • 44. Running data mining and statistical analysis on multi-TB system
  • 46.
  • 47. Single pass over large-scale data
  • 48. 100 lines of code down to 12
  • 49. Significant SQL optimization: Minimal SQL code, greater performance via parallel execution
  • 50. Cycle time reduction: Significant resource savings in both time and utilization15 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 52. nPath is a SQL-MR function included with nCluster. nPath enables analysis of ordered data: Clickstream data Financial transaction data User interaction data Anything of a time series nature Leverages the power of the SQL-MR framework to transcend SQL’s limitations with respect to ordered data What is Aster nPath? 17
  • 53. Example: Analyzing a Clickstream Business question How many distinct users: Start at the home page. Click on an auction. View the seller’s profile. Bid on the item. Available Data A database table clicks, populated with web log data, that has columns user_id, timestamp, and page_type.
  • 54. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (1) Partition: Form groups by user_id. (2) Order: Sort each group by timestamp.
  • 55. The nPath query (3b) Match: Define the subsequences of interest via regex. SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (3a) Match: Define a set of symbols.
  • 56. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (4) Compute Aggregates over matched subsequences.
  • 57. Market Basket Analysis Example Question Detect customers - that purchase the same category of items - in three market baskets in a row - with total value &gt; $150
  • 58. Two Methods – Same Answer Multi-pass Nested Sub-selects Single Pass SQL-MR nPath Query 5187 17769 3542 1889 5753 2001 156 193 2521 156 1416 75194 75194 10411 27355
  • 59. Demo – Market Basket Analysis (1M Rows)
  • 60. Summary:Bringing MapReduce to Big Data Management Aster’s MPP data warehouse + SQL-MapReduce
  • 61. Upcoming Webcast: Mastering MapReduce Part II Save the date!: December 3rd MapReduce Resources - http://www.asterdata.com/mapreduce/index.php Recorded application use-cases Code samples and tutorials DBMS2 on MapReduce: http://www.dbms2.com/category/parallelization/mapreduce/ Aster’s SQL-MapReduce http://www.asterdata.com/product/mapreduce.php http://www.asterdata.com/blog/index.php/category/mapreduce/ TDWI Technical whitepaper Contact us hello@asterdata.com Steve.wooledge@asterdata.com Thank You!