SlideShare une entreprise Scribd logo
1  sur  16
© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
• Introductions
• Log File enrichment
• ETL with ML
• Recommendation Engine
• Adhoc SQL Queries
• The Future case
© 2014 MapR Technologies 3
Who is Mike Emerick ?
My bio the highlights.
Architect for MapR for 2.5 years.
“creative hours at Workshop 88.”
© 2014 MapR Technologies 4
Approach to this presentation
1.No API discussion
1.Architecture features and utilization
2. Use Cases .. and Why Spark?
© 2014 MapR Technologies 5
Spark 10,000 feet
• Fundamentally Spark is an MPP.
• Can use many Storage Subsystems.
(Great for development)
• RDD, Accumulators, Broadcast.
• Map Reduce +.
• Apache Spark site has
great resources
on architecture and API.
© 2014 MapR Technologies 6
Usecase : SQL Queries
• “Interactive SQL on Hadoop...”
• How does Spark make this easier?
– Native Hive QL (SQL 93 ish)
– In memory and from disk
– Usually the first thought...
• Spark SQL
© 2014 MapR Technologies 7
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL Persistent Datastore
Spark SQL
Presentation
Monitoring
Spark NoSQL
Tools
Tableau Qlikview
Spark Mlib
Reporting Data Products WebPortal
Content
Management
MapR Dataplatform
Microstrategy
Known Exploits
Blacklists
GeoIP
© 2014 MapR Technologies 8
Usecase : Log file enrichment
• Why enrich my log data..?
• This is not Storm it is Batch
– Similar to Hbase Async API..
• How does Spark make this easier?
– Streaming API
– Sliding Windows
– SQL Hive/Shark
• Connect to Hbase
– NoSQL Connectors
• Hbase
© 2014 MapR Technologies 9
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL Persistent Datastore
Spark SQL
Presentation
Monitoring
Spark NoSQL
Tools
Tableau Qlikview
Spark Mlib
Reporting Data Products WebPortal
Content
Management
MapR Dataplatform
Microstrategy
Known Exploits
Blacklists
GeoIP
© 2014 MapR Technologies 10
Usecase : SQL mixing with ML
• Why are folks doing this..?
• How does Spark make this easier?
– Native Machine learning Mlib
– Access to neartime Adhoc SQL queries
– R and SQL in the same place
– Bigger than in memory faster than MR
© 2014 MapR Technologies 11
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL Persistent Datastore
Spark SQL
Presentation
Monitoring
Spark NoSQL
Tools
Tableau Qlikview
Spark Mlib
Reporting Data Products WebPortal
Content
Management
MapR Dataplatform
Microstrategy
Known Exploits
Blacklists
GeoIP
© 2014 MapR Technologies 12
Usecase : Recommendation Engine
• It is a recommendation engine...
• How does Spark make this easier?
– ETL and Enrichment
– Mlib makes it easy to import data.
– Mlib Training in same cluster
– NoSQL Adhoc serves recommendations
– Dynamic
© 2014 MapR Technologies 13
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL Persistent Datastore
Spark SQL
Presentation
Monitoring
Spark NoSQL
Tools
Tableau Qlikview
Spark Mlib
Reporting Data Products WebPortal
Content
Management
MapR Dataplatform
Microstrategy
Known Exploits
Blacklists
GeoIP
© 2014 MapR Technologies 14
Use cases build in complexity
• Adoption follows a curve of complexity
– Ingestion and query
– Ingestion Enrichment Query
– Ingestion Enrichment Machine learning Query
– Ingestion Enrichment Machine learning Serving recommendations
– .....
• Spark is flattening the curve
• Why?
– One framework
– Less data movement
– Access to preferred language
© 2014 MapR Technologies 15
Future state: ~ in the year 2000
• ADAM - Genomics
• GraphX – Graph is near...
• Mlib – Look for lots of work here
• PySpark – Fastest evolving
• SparkR – Just getting started
• BlinkDB – ~ Queries
• OEM...
© 2014 MapR Technologies 16
Business Services
MapR is hiring in Chicago
Apache Drill Beta this Summer
Happy National Making day !
Check out W88 for Hadoop classes

Contenu connexe

Tendances

Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMapR Technologies
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill Carol McDonald
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015Aaron Benz
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemCloudera, Inc.
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jDataWorks Summit
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillDataWorks Summit
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptxTed Dunning
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive TutorialSandeep Patil
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezMapR Technologies
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 

Tendances (20)

Maintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single ClusterMaintaining Low Latency While Maximizing Throughput on a Single Cluster
Maintaining Low Latency While Maximizing Throughput on a Single Cluster
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Drill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is PossibleDrill into Drill – How Providing Flexibility and Performance is Possible
Drill into Drill – How Providing Flexibility and Performance is Possible
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill NoSQL HBase schema design and SQL with Apache Drill
NoSQL HBase schema design and SQL with Apache Drill
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Big Data Conference April 2015
Big Data Conference April 2015Big Data Conference April 2015
Big Data Conference April 2015
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
The Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop EcosystemThe Evolution of the Hadoop Ecosystem
The Evolution of the Hadoop Ecosystem
 
Applied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4jApplied Deep Learning with Spark and Deeplearning4j
Applied Deep Learning with Spark and Deeplearning4j
 
Understanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache DrillUnderstanding the Value and Architecture of Apache Drill
Understanding the Value and Architecture of Apache Drill
 
Cmu-2011-09.pptx
Cmu-2011-09.pptxCmu-2011-09.pptx
Cmu-2011-09.pptx
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Intro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco VasquezIntro to Apache Spark by Marco Vasquez
Intro to Apache Spark by Marco Vasquez
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
SQOOP - RDBMS to Hadoop
SQOOP - RDBMS to HadoopSQOOP - RDBMS to Hadoop
SQOOP - RDBMS to Hadoop
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 

Similaire à Meet Spark

Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology Avinash Gautam
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataWeCloudData
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014mahchiev
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiDatabricks
 
Introduing spark
Introduing sparkIntroduing spark
Introduing sparkTaotao Li
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
Spark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleSpark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleMapR Technologies
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNHortonworks
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Dataconomy Media
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Mats Uddenfeldt
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkJerry Wen
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupHyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentationRamesh Mudunuri
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkDataWorks Summit
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemCloudera, Inc.
 

Similaire à Meet Spark (20)

Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Big Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudDataBig Data for Data Scientists - WeCloudData
Big Data for Data Scientists - WeCloudData
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Spark 101
Spark 101Spark 101
Spark 101
 
Introduing spark
Introduing sparkIntroduing spark
Introduing spark
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Spark & Hadoop at Production at Scale
Spark & Hadoop at Production at ScaleSpark & Hadoop at Production at Scale
Spark & Hadoop at Production at Scale
 
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARNYARN webinar series: Using Scalding to write applications to Hadoop and YARN
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...Self-Service BI for big data applications using Apache Drill (Big Data Amster...
Self-Service BI for big data applications using Apache Drill (Big Data Amster...
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of spark
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Implementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache SparkImplementing the Lambda Architecture efficiently with Apache Spark
Implementing the Lambda Architecture efficiently with Apache Spark
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop EcosystemWhy Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem
 

Plus de Chicago Hadoop Users Group

Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChicago Hadoop Users Group
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieChicago Hadoop Users Group
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopChicago Hadoop Users Group
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917Chicago Hadoop Users Group
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Chicago Hadoop Users Group
 

Plus de Chicago Hadoop Users Group (18)

Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Choosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your BusinessChoosing the Right Big Data Architecture for your Business
Choosing the Right Big Data Architecture for your Business
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Hadoop and Big Data Security
Hadoop and Big Data SecurityHadoop and Big Data Security
Hadoop and Big Data Security
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Advanced Oozie
Advanced OozieAdvanced Oozie
Advanced Oozie
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Financial Data Analytics with Hadoop
Financial Data Analytics with HadoopFinancial Data Analytics with Hadoop
Financial Data Analytics with Hadoop
 
Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
An Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache HadoopAn Introduction to Impala – Low Latency Queries for Apache Hadoop
An Introduction to Impala – Low Latency Queries for Apache Hadoop
 
HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917HCatalog: Table Management for Hadoop - CHUG - 20120917
HCatalog: Table Management for Hadoop - CHUG - 20120917
 
Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604Map Reduce v2 and YARN - CHUG - 20120604
Map Reduce v2 and YARN - CHUG - 20120604
 
Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416Hadoop in a Windows Shop - CHUG - 20120416
Hadoop in a Windows Shop - CHUG - 20120416
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416Avro - More Than Just a Serialization Framework - CHUG - 20120416
Avro - More Than Just a Serialization Framework - CHUG - 20120416
 

Dernier

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Dernier (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Meet Spark

  • 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
  • 2. © 2014 MapR Technologies 2 Agenda • Introductions • Log File enrichment • ETL with ML • Recommendation Engine • Adhoc SQL Queries • The Future case
  • 3. © 2014 MapR Technologies 3 Who is Mike Emerick ? My bio the highlights. Architect for MapR for 2.5 years. “creative hours at Workshop 88.”
  • 4. © 2014 MapR Technologies 4 Approach to this presentation 1.No API discussion 1.Architecture features and utilization 2. Use Cases .. and Why Spark?
  • 5. © 2014 MapR Technologies 5 Spark 10,000 feet • Fundamentally Spark is an MPP. • Can use many Storage Subsystems. (Great for development) • RDD, Accumulators, Broadcast. • Map Reduce +. • Apache Spark site has great resources on architecture and API.
  • 6. © 2014 MapR Technologies 6 Usecase : SQL Queries • “Interactive SQL on Hadoop...” • How does Spark make this easier? – Native Hive QL (SQL 93 ish) – In memory and from disk – Usually the first thought... • Spark SQL
  • 7. © 2014 MapR Technologies 7 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
  • 8. © 2014 MapR Technologies 8 Usecase : Log file enrichment • Why enrich my log data..? • This is not Storm it is Batch – Similar to Hbase Async API.. • How does Spark make this easier? – Streaming API – Sliding Windows – SQL Hive/Shark • Connect to Hbase – NoSQL Connectors • Hbase
  • 9. © 2014 MapR Technologies 9 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
  • 10. © 2014 MapR Technologies 10 Usecase : SQL mixing with ML • Why are folks doing this..? • How does Spark make this easier? – Native Machine learning Mlib – Access to neartime Adhoc SQL queries – R and SQL in the same place – Bigger than in memory faster than MR
  • 11. © 2014 MapR Technologies 11 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
  • 12. © 2014 MapR Technologies 12 Usecase : Recommendation Engine • It is a recommendation engine... • How does Spark make this easier? – ETL and Enrichment – Mlib makes it easy to import data. – Mlib Training in same cluster – NoSQL Adhoc serves recommendations – Dynamic
  • 13. © 2014 MapR Technologies 13 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
  • 14. © 2014 MapR Technologies 14 Use cases build in complexity • Adoption follows a curve of complexity – Ingestion and query – Ingestion Enrichment Query – Ingestion Enrichment Machine learning Query – Ingestion Enrichment Machine learning Serving recommendations – ..... • Spark is flattening the curve • Why? – One framework – Less data movement – Access to preferred language
  • 15. © 2014 MapR Technologies 15 Future state: ~ in the year 2000 • ADAM - Genomics • GraphX – Graph is near... • Mlib – Look for lots of work here • PySpark – Fastest evolving • SparkR – Just getting started • BlinkDB – ~ Queries • OEM...
  • 16. © 2014 MapR Technologies 16 Business Services MapR is hiring in Chicago Apache Drill Beta this Summer Happy National Making day ! Check out W88 for Hadoop classes

Notes de l'éditeur

  1. My approach to this presentation Not and API presentation Great documentation and examples Lots of presentations of this variety Architecture presentation – Use case study. What unique features facilitate workloads. What is here and coming for new workloads.  
  2. Adhoc Queries with Shark An early use case. Not usually the first production. MapR has a few. In the long run...
  3. Adhoc Queries with Shark An early use case. Not usually the first production. MapR has a few. In the long run...
  4. Logfile enrichment Streaming API Leveraging near time resolution for enrichment. Not the same as Storm - Its Batch Hooks to other messaging tools ZeroMQ, Kafka etc.. Sliding Window features NoSQL capabilities Access to Tables via Hbase API Access to in memory RDDs
  5. Adhoc Queries with Shark An early use case. Not usually the first production. MapR has a few. In the long run...
  6. SQL Mixing with Machine Learning “ETL for the math nerd” R and Python Current access is Shark Spark SQL will drive to native SQL
  7. Adhoc Queries with Shark An early use case. Not usually the first production. MapR has a few. In the long run...
  8. Recommendation Engine Provides all aspects ETL Vector/Matrix Generation Training Near time recommendation
  9. Adhoc Queries with Shark An early use case. Not usually the first production. MapR has a few. In the long run...
  10. Pharma’s like Adam Academia is behind Few Graph use cases in development none deployed none on GraphX Mlib and Mahout may join forces.. PySpark according to DataBricks is some of the most active code SparkR BlinkDB Time limited queries separate git hub will be merged in to main Spark Branch Couple OEM vendors for Spark they are covered on the @ databricks site.
  11. MapR has a very large and robust, growing ecosystem of partners. This is important for you because you have existing investments and relationships with other technologies which need to work well with MapR, integrate easily, and allow you to create a differentiated set of technologies. (highlight key partners which are important to your customer)