SlideShare une entreprise Scribd logo
1  sur  24
A D N A N M A S O O D , P H D
S Y S T E M S A R C H I T E C T / D A T A S C I E N T I S T
A D N A N . M A S O O D @ O W A S P . O R G
( H T T P : / / B L O G . A D N A N M A S O O D . C O M )
G I T H U B ( G I T H U B . C O M / A D N A N M A S O O D ) ,
T W I T T E R ( @ A D N A N M A S O O D ) .
P R E S E N T E D A T M I C R O S O F T D A T A S C I E N C E G R O U P –
T A M P A B A Y D A T A S C I E N C E P R O F E S S I O N A L S
H T T P : / / W W W . M E E T U P . C O M / D A T A - S C I E N T I S T S - T A M P A - B A Y / E V E N T S / 2 3 1 2 9 3 0 7 7 /
Spark with Azure HDInsight
About the Speaker
Adnan Masood, Ph.D. is a developer, software architect, and researcher and specializes
in FinTech, machine learning and Bayesian belief networks. Before joining PDS
Health care, and GDC (a leading prepaid financial technology institution), he enjoyed
life as a principal engineer of a start-up and worked for a leading UK based nonprofit
organization as a solutions architect.
A strong believer in the development community, Adnan is an active member of the
Open Web Application Security Project (OWASP), an organization dedicated to
software security. In the .NET community, he is a cofounder and president of the
Pasadena .NET Developers group, which he has been successfully leading for 8 years.
He led a number of successful enterprise solutions and consulted for several Fortune
500 company projects.
Adnan devotes himself to his own continual, practical education. He holds
certifications in big data, machine learning, and systems architecture from
Massachusetts Institute of Technology; an Application Security certification from
Stanford University; an SOA Smarts certification from Carnegie Mellon University;
and certifications as a ScrumMaster, Microsoft Certified Trainer, Microsoft Certified
Solutions Developer, and Sun Certified Java Developer.
For more details, visit Adnan's blog (http://blog.adnanmasood.com), GitHub
repository (http://github.com/adnanmasood), and Twitter (@adnanmasood). Adnan
can be reached at adnan.masood@owasp.org.
Announcement: Apache Spark for Azure HDInsight now
generally available
Channel 9 Walk through of Apache Spark on Azure HDInsight
Spark 101
 Spark is a unified framework for big data
analytics. Spark provides one integrated API
for use by developers, data scientists, and
analysts to perform diverse tasks that would
have previously required separate
processing engines such as batch analytics,
stream processing and statistical modeling.
Spark supports a wide range of popular
languages including Python, R, Scala, SQL,
and Java. Spark can read from diverse data
sources and scale to thousands of nodes.
Spark with Azure HDInsight
Deployment Models
Big Data Deployment – Public Cloud
• Hadoop-as-a-Service
- Amazon Web Services EC2 and EMR
- Microsoft Azure HDInsight
- Google Cloud Dataproc
- IBM Bluemix ... and others
• Spark-as-a-Service
- All of the above
- Databricks
Big Data Deployment – On-Premises
• Bare-Metal
• Virtual Machines
- VMware Big Data Extensions
- OpenStack Sahara
• Containers
- BlueData
- Mesos
HDInsight as Part of Azure Portal
Spark - Benefits
Spark – Use Cases
Spark is Fast!
Spark is Fast!
Demo - Creating a HDInsight Spark Cluster
HDInsight Spark Streaming
“Along with traditional Hadoop technologies, HDInsight also provides
Spark as a cloud service. Spark is an integrated set of open source
technologies that can run on a Hadoop cluster. The Spark family
includes options for analyzing large amounts of operational data,
doing machine learning, and more. It also includes Spark Streaming, a
technology for working with streaming data.
Spark Streaming is similar to Storm in some ways. Like Storm, it’s a
general-purpose technology for processing streaming data. Unlike
Storm, Spark Streaming is implemented as an extension to the basic
Spark engine—it’s not an add-on technology. This tight connection can
make Spark applications faster, since there’s less need to move data
between components, and easier to create, since everything uses the
same core Spark technology. Because of this, Spark Streaming (and
Spark in general) are getting more popular by the day”
David Chappell
STREAMING SCENARIOS USING THE MICROSOFT DATA PLATFORM
A GUIDE FOR IT LEADERS
HDInsight Spark Streaming
• What is it?
- Distributed compute framework, an extension of the core Apache Spark API
- Allows users to integrate real-time data from disparate event streams (e.g. Kafka,
HDFS, Twitter) in event-driven, asynchronous, scalable, type-safe, and fault tolerant
applications
• When to use it?
- When organizations need realtime decision making
- When you are working with streams of continuous data
• Why Spark Streaming?
- Enables high-throughput and reliable processing of live data streams
- Batch, Iterative, and Streaming analysis on the same platform
- Easily add Machine Learning for streaming data pathways
Getting Started.
References & Further Reading
 Use MapReduce in Hadoop on HDInsight
https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-use-mapreduce
 Get started: Create Apache Spark cluster on HDInsight
Linux and run interactive queries using Spark SQL
https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-
zeppelin-notebook-jupyter-spark-sql/
 Azure Machine Learning -
https://azure.microsoft.com/en-us/services/machine-
learning/
References & Further Reading
 Announcing Apache Spark on Azure HDInsight
https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache-
Spark-on-Azure-HDInsight
 Apache Zeppelin https://zeppelin.incubator.apache.org
 Project Jupyter http://jupyter.org/
 https://azure.microsoft.com/en-us/services/hdinsight/
 https://azure.microsoft.com/en-us/blog/apache-spark-for-azure-
hdinsight-now-generally-available/
 Microsoft expands its commitment to Apache Spark big-data framework
 https://azure.microsoft.com/en-us/documentation/articles/hdinsight-
apache-spark-use-zeppelin-notebook/
 https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache-
Spark-on-Azure-HDInsight
 http://www.c-sharpcorner.com/UploadFile/aa700f/jumpstart-into-big-
data-with-hdinsight/
References & Further Reading
 Get started: Create Apache Spark cluster on HDInsight Linux and run
interactive queries using Spark SQL https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/
 EdX Course: Processing Big Data with Azure HDInsight Processing Big
Data with Azure HDInsight Learn how to use Hadoop technologies in
Microsoft Azure HDInsight to process big data in this five week, hands-on
course. https://www.edx.org/course/processing-big-data-azure-hdinsight-
microsoft-dat202-1x-0
 Apache Spark for Azure HDInsight https://azure.microsoft.com/en-
us/services/hdinsight/apache-spark/
 Build Machine Learning applications to run on Apache Spark clusters on
HDInsight Linux https://azure.microsoft.com/en-
us/documentation/articles/hdinsight-apache-spark-ipython-notebook-
machine-learning/
References & Further Reading
Questions

Contenu connexe

Tendances

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsDataWorks Summit
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningDataWorks Summit
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...DataWorks Summit/Hadoop Summit
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformCaserta
 

Tendances (20)

Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?A Continuously Deployed Hadoop Analytics Platform?
A Continuously Deployed Hadoop Analytics Platform?
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Evolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data ApplicationsEvolving Hadoop into an Operational Platform with Data Applications
Evolving Hadoop into an Operational Platform with Data Applications
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
Empowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine LearningEmpowering you with Democratized Data Access, Data Science and Machine Learning
Empowering you with Democratized Data Access, Data Science and Machine Learning
 
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
Hortonworks Data Platform and IBM Systems - A Complete Solution for Cognitive...
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platformBig Data 2.0: ETL & Analytics: Implementing a next generation platform
Big Data 2.0: ETL & Analytics: Implementing a next generation platform
 

En vedette

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategyJames Serra
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionAdnan Masood
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...Hortonworks
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
 
[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft AzureMichał Smereczyński
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataData Con LA
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonGreg Kirchoff
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation BriefBoni Bruno
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData, Inc.
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the CloudDATAVERSITY
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015Krishna-Kumar
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData, Inc.
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Jen Stirrup
 

En vedette (20)

Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Realtime analytics with_hadoop
Realtime analytics with_hadoopRealtime analytics with_hadoop
Realtime analytics with_hadoop
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Data science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief IntroductionData science with Windows Azure - A Brief Introduction
Data science with Windows Azure - A Brief Introduction
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
SAS - Hortonworks: Creating the Omnichannel Experience in Retail webinar marc...
 
Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)Introduction to Microsoft’s Hadoop solution (HDInsight)
Introduction to Microsoft’s Hadoop solution (HDInsight)
 
[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure[PL] Code Europe 2016 - Python and Microsoft Azure
[PL] Code Europe 2016 - Python and Microsoft Azure
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Why Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueDataWhy Virtualization is important by Tom Phelan of BlueData
Why Virtualization is important by Tom Phelan of BlueData
 
Dell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with IsilonDell/EMC Technical Validation of BlueData EPIC with Isilon
Dell/EMC Technical Validation of BlueData EPIC with Isilon
 
BlueData Isilon Validation Brief
BlueData Isilon Validation BriefBlueData Isilon Validation Brief
BlueData Isilon Validation Brief
 
BlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for HadoopBlueData Hunk Integration: Splunk Analytics for Hadoop
BlueData Hunk Integration: Splunk Analytics for Hadoop
 
Big Data & the Cloud
Big Data & the CloudBig Data & the Cloud
Big Data & the Cloud
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015
 
BlueData EPIC 2.0 Overview
BlueData EPIC 2.0 OverviewBlueData EPIC 2.0 Overview
BlueData EPIC 2.0 Overview
 
Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?Business Intelligence Barista: What DataViz Tool to Use, and When?
Business Intelligence Barista: What DataViz Tool to Use, and When?
 

Similaire à Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD

Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark ZaranTech LLC
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistancephdAssistance1
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...phdAssistance1
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source frameworkedunextgen
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework edunextgen
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache sparksarith divakar
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Data scientist versus big data
Data scientist versus big dataData scientist versus big data
Data scientist versus big dataPrwaTech
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 

Similaire à Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD (20)

Madhu
MadhuMadhu
Madhu
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
 
BigData_Krishna Kumar Sharma
BigData_Krishna Kumar SharmaBigData_Krishna Kumar Sharma
BigData_Krishna Kumar Sharma
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
sudipto_resume
sudipto_resumesudipto_resume
sudipto_resume
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Coding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - PhdassistanceCoding software and tools used for data science management - Phdassistance
Coding software and tools used for data science management - Phdassistance
 
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
 
Career opportunities in open source framework
Career opportunities in open source frameworkCareer opportunities in open source framework
Career opportunities in open source framework
 
Career opportunities in open source framework
Career opportunities in open source framework Career opportunities in open source framework
Career opportunities in open source framework
 
Big data processing with apache spark
Big data processing with apache sparkBig data processing with apache spark
Big data processing with apache spark
 
Apresentação Hadoop
Apresentação HadoopApresentação Hadoop
Apresentação Hadoop
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Data scientist versus big data
Data scientist versus big dataData scientist versus big data
Data scientist versus big data
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Apache spark
Apache sparkApache spark
Apache spark
 
INFO491FinalPaper
INFO491FinalPaperINFO491FinalPaper
INFO491FinalPaper
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 

Plus de Adnan Masood

Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachAdnan Masood
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureAdnan Masood
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software DevelopmentAdnan Masood
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationAdnan Masood
 
Bayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisBayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisAdnan Masood
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Adnan Masood
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionAdnan Masood
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonAdnan Masood
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupAdnan Masood
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Adnan Masood
 

Plus de Adnan Masood (10)

Restructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality ApproachRestructuring Technical Debt - A Software and System Quality Approach
Restructuring Technical Debt - A Software and System Quality Approach
 
System Quality Attributes for Software Architecture
System Quality Attributes for Software ArchitectureSystem Quality Attributes for Software Architecture
System Quality Attributes for Software Architecture
 
Agile Software Development
Agile Software DevelopmentAgile Software Development
Agile Software Development
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Bayesian Networks and Association Analysis
Bayesian Networks and Association AnalysisBayesian Networks and Association Analysis
Bayesian Networks and Association Analysis
 
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
Probabilistic Interestingness Measures - An Introduction with Bayesian Belief...
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Web API or WCF - An Architectural Comparison
Web API or WCF - An Architectural ComparisonWeb API or WCF - An Architectural Comparison
Web API or WCF - An Architectural Comparison
 
SOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User GroupSOLID Principles of Refactoring Presentation - Inland Empire User Group
SOLID Principles of Refactoring Presentation - Inland Empire User Group
 
Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...Brief bibliography of interestingness measure, bayesian belief network and ca...
Brief bibliography of interestingness measure, bayesian belief network and ca...
 

Dernier

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Dernier (20)

SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Spark with Azure HDInsight - Tampa Bay Data Science - Adnan Masood, PhD

  • 1. A D N A N M A S O O D , P H D S Y S T E M S A R C H I T E C T / D A T A S C I E N T I S T A D N A N . M A S O O D @ O W A S P . O R G ( H T T P : / / B L O G . A D N A N M A S O O D . C O M ) G I T H U B ( G I T H U B . C O M / A D N A N M A S O O D ) , T W I T T E R ( @ A D N A N M A S O O D ) . P R E S E N T E D A T M I C R O S O F T D A T A S C I E N C E G R O U P – T A M P A B A Y D A T A S C I E N C E P R O F E S S I O N A L S H T T P : / / W W W . M E E T U P . C O M / D A T A - S C I E N T I S T S - T A M P A - B A Y / E V E N T S / 2 3 1 2 9 3 0 7 7 / Spark with Azure HDInsight
  • 2. About the Speaker Adnan Masood, Ph.D. is a developer, software architect, and researcher and specializes in FinTech, machine learning and Bayesian belief networks. Before joining PDS Health care, and GDC (a leading prepaid financial technology institution), he enjoyed life as a principal engineer of a start-up and worked for a leading UK based nonprofit organization as a solutions architect. A strong believer in the development community, Adnan is an active member of the Open Web Application Security Project (OWASP), an organization dedicated to software security. In the .NET community, he is a cofounder and president of the Pasadena .NET Developers group, which he has been successfully leading for 8 years. He led a number of successful enterprise solutions and consulted for several Fortune 500 company projects. Adnan devotes himself to his own continual, practical education. He holds certifications in big data, machine learning, and systems architecture from Massachusetts Institute of Technology; an Application Security certification from Stanford University; an SOA Smarts certification from Carnegie Mellon University; and certifications as a ScrumMaster, Microsoft Certified Trainer, Microsoft Certified Solutions Developer, and Sun Certified Java Developer. For more details, visit Adnan's blog (http://blog.adnanmasood.com), GitHub repository (http://github.com/adnanmasood), and Twitter (@adnanmasood). Adnan can be reached at adnan.masood@owasp.org.
  • 3. Announcement: Apache Spark for Azure HDInsight now generally available
  • 4. Channel 9 Walk through of Apache Spark on Azure HDInsight
  • 5. Spark 101  Spark is a unified framework for big data analytics. Spark provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.
  • 6. Spark with Azure HDInsight
  • 8. Big Data Deployment – Public Cloud • Hadoop-as-a-Service - Amazon Web Services EC2 and EMR - Microsoft Azure HDInsight - Google Cloud Dataproc - IBM Bluemix ... and others • Spark-as-a-Service - All of the above - Databricks
  • 9. Big Data Deployment – On-Premises • Bare-Metal • Virtual Machines - VMware Big Data Extensions - OpenStack Sahara • Containers - BlueData - Mesos
  • 10. HDInsight as Part of Azure Portal
  • 11.
  • 13. Spark – Use Cases
  • 16. Demo - Creating a HDInsight Spark Cluster
  • 17. HDInsight Spark Streaming “Along with traditional Hadoop technologies, HDInsight also provides Spark as a cloud service. Spark is an integrated set of open source technologies that can run on a Hadoop cluster. The Spark family includes options for analyzing large amounts of operational data, doing machine learning, and more. It also includes Spark Streaming, a technology for working with streaming data. Spark Streaming is similar to Storm in some ways. Like Storm, it’s a general-purpose technology for processing streaming data. Unlike Storm, Spark Streaming is implemented as an extension to the basic Spark engine—it’s not an add-on technology. This tight connection can make Spark applications faster, since there’s less need to move data between components, and easier to create, since everything uses the same core Spark technology. Because of this, Spark Streaming (and Spark in general) are getting more popular by the day” David Chappell STREAMING SCENARIOS USING THE MICROSOFT DATA PLATFORM A GUIDE FOR IT LEADERS
  • 18. HDInsight Spark Streaming • What is it? - Distributed compute framework, an extension of the core Apache Spark API - Allows users to integrate real-time data from disparate event streams (e.g. Kafka, HDFS, Twitter) in event-driven, asynchronous, scalable, type-safe, and fault tolerant applications • When to use it? - When organizations need realtime decision making - When you are working with streams of continuous data • Why Spark Streaming? - Enables high-throughput and reliable processing of live data streams - Batch, Iterative, and Streaming analysis on the same platform - Easily add Machine Learning for streaming data pathways
  • 20. References & Further Reading  Use MapReduce in Hadoop on HDInsight https://azure.microsoft.com/en- us/documentation/articles/hdinsight-use-mapreduce  Get started: Create Apache Spark cluster on HDInsight Linux and run interactive queries using Spark SQL https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark- zeppelin-notebook-jupyter-spark-sql/  Azure Machine Learning - https://azure.microsoft.com/en-us/services/machine- learning/
  • 21. References & Further Reading  Announcing Apache Spark on Azure HDInsight https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache- Spark-on-Azure-HDInsight  Apache Zeppelin https://zeppelin.incubator.apache.org  Project Jupyter http://jupyter.org/  https://azure.microsoft.com/en-us/services/hdinsight/  https://azure.microsoft.com/en-us/blog/apache-spark-for-azure- hdinsight-now-generally-available/  Microsoft expands its commitment to Apache Spark big-data framework  https://azure.microsoft.com/en-us/documentation/articles/hdinsight- apache-spark-use-zeppelin-notebook/  https://channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache- Spark-on-Azure-HDInsight  http://www.c-sharpcorner.com/UploadFile/aa700f/jumpstart-into-big- data-with-hdinsight/
  • 22. References & Further Reading  Get started: Create Apache Spark cluster on HDInsight Linux and run interactive queries using Spark SQL https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-jupyter-spark-sql/  EdX Course: Processing Big Data with Azure HDInsight Processing Big Data with Azure HDInsight Learn how to use Hadoop technologies in Microsoft Azure HDInsight to process big data in this five week, hands-on course. https://www.edx.org/course/processing-big-data-azure-hdinsight- microsoft-dat202-1x-0  Apache Spark for Azure HDInsight https://azure.microsoft.com/en- us/services/hdinsight/apache-spark/  Build Machine Learning applications to run on Apache Spark clusters on HDInsight Linux https://azure.microsoft.com/en- us/documentation/articles/hdinsight-apache-spark-ipython-notebook- machine-learning/

Notes de l'éditeur

  1. Slides courtesy Microsoft Corporation - Scott talks to Asad Khan about the addition of Apache Spark on Azure HDInsight. Apache Spark is a unified, open source, parallel data processing framework for Big Data Analytics. Spark brings together batch processing, real-time processing, stream analytics, machine learning, and interactive SQL and Azure makes Spark "Software as a Service." It's a great time for you to jump into the world of Big Data on Azure.
  2. Slide Courtesy Neudesic