SlideShare une entreprise Scribd logo
1  sur  30
Zaawansowana analityka na
platformie Azure HDInsight
Łukasz Grala | Senior Architect
Łukasz Grala
• Senior architekt rozwiązań Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK
• Twórca „Data Scientist as as Service”
• Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach
• Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów
• Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP
• Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe)
• Prelegent na licznych konferencjach w kraju i na świecie
• Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…)
• Członek Polskiego Towarzystwa Informatycznego
• Członek i lider Polish SQL Server User Group (PLSSUG)
• Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu
email lukasz@tidk.pl
Big Data – 4V
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
lukasz@tidk.pl
Lambda Architecture
lukasz@tidk.pl
Hadoop Ecosystem
runs on
ETL
RDBMS Import/Export
Distributed Storage & Processing Framework
Secure NoSQL DB
SQL on HBase
NoSQL DB
Workflow Management
SQL
Streaming Data Ingestion
Cluster System Operations
Secure Gateway
Distributed Registry
ETL
Search & Indexing
Even Faster Data Processing
Data Management
Machine Learning
The Hadoop Ecosystem
Management & Monitoring
(Ambari)
Coordination
(ZooKeeper)
Workflow&Scheduling
(Oozie)
Scripting
(Pig)
Machine Learning
(Mahout)
Query
(Hive)
Distributed Processing
(MapReduce)
Distributed Storage
(HDFS)
NoSQLDatabase
(HBase)
DataIntegration
(Sqoop/REST/ODBC)
lukasz@tidk.pl
`
+
/directory/structure/in/memory.txt
Resource management + schedulingDisk, CPU, Memory
NameNode
HDFS
ResourceManager
YARN
Hadoop daemon
User application
NN
RM
DataNode
HDFS
NodeManager
YARN
Worker Node
Core
lukasz@tidk.pl
HDFS
constitution.txt The mappers read the file’s
blocks from HDFS line-by-line
1
We the people, in order to form a...
The lines of text are split into
words and output to the
reducers
2
The shuffle/sort phase
combines pairs with the same
key
3
The reducers add up the “1’s”
and output the word and its
count
4
<We, 1>
<the,1>
<people,1>
<in,1>
<order, 1>
<to,1>
<form,1>
<a,1>
<We, (1,1,1,1)>
<the, (1,1,1,1,1,1,1,...)>
<people,(1,1,1,1,1)>
<form, (1)><We,4>
<the,265>
<people,5>
<form,1>HDFS
WordCount in MapReduce
lukasz@tidk.pl
What is Apache Ambari?
A completely open source
management platform for
provisioning, managing,
monitoring and securing
Apache Hadoop clusters.
Apache Ambari takes the
guesswork out of operating
Hadoop.
lukasz@tidk.pl
Spark’s Position in a Modern Data Platform
Disk Based
Source
Streaming Source
Reference Data
Stream Processing
Storm/Spark-Streaming
Data Pipeline
Hive/Pig/Spark
Long Term Data
Warehouse
Hive + ORC
Data Discovery
Operational
Reporting
Business
Intelligence
Ad Hoc/On
Demand Source
Data Science
Spark-ML, Spark-SQL
Advanced Analytics
Data Sources Data Processing, Storage & Analytics Data Access
lukasz@tidk.pl
Spark Context
 Main entry point for Spark functionality
 Represents a connection to a Spark cluster
 Represented as sc in your code
What is it?
lukasz@tidk.pl
Analytics
Platform
System
SQL Server Information
Management
(ADF, MDS,
DQS, SSIS &
DataSync)
Analytics
Cortana
Suite
SQL Server
Reporting
Services
PowerBI
Microsoft Data Platform (1)
Datazen
Server
SQL Server
Analysis
Services
lukasz@tidk.pl
Azure
Data Lake
Azure
DocumentDB
Azure
HDInsight
(Hadoop,
Spark, Hbase &
Storm)
Azure Machine
Learning
Azure Search
Microsoft Data Platform (2)
Azure SQL
Database
Azure Stream
Analytics
Azure SQL Data
Warehouse
lukasz@tidk.pl
HDInsight
• HDInsight is a Hadoop-based service that brings 100%
Apache Hadoop solution running on the Microsoft Azure
platform
• Based on the Hortonworks Data Platform (HDP)
• Scalable, on-demand service
lukasz@tidk.pl
CRAN: 7000+ add-on packages for R
CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
lukasz@tidk.pl
1993 Research project in Auckland, NZ
• Ross Ihaka and Robert Gentlemen
1995 Released as open-source software
• Generally compatible with the “S” language
1997 R core group formed
2003 R Foundation formed in Austria
2007 Revolution Analytics founded
2014 Revolution R Open launched
2015 R Consortium founded
2015 Microsoft acquires Revolution Analytics
2016 Microsoft R Open 3.2.3 released
A brief history of R
Photo credit: Robert Gentleman
lukasz@tidk.pl
R: The #1 software for Data Science
… and #6 amongst general-purpose programming languages
R Usage Growth
Rexer Data Miner Survey, 2007-2015
Language Popularity
IEEE Spectrum Top Programming Languages, 2015
76% of analytic
professionals
report using R
36% select R as
their primary tool
lukasz@tidk.pl
Use Microsoft R Open with…
Microsoft R Server Big-data analytics and distributed computing on Linux, Hadoop and Teradata
SQL Server 2016 Big-data analytics integrated with SQL Server database
PowerBI Computations and charts from R scripts in dashboards
Azure ML Studio R Scripts in cloud-based Experiment workflows
Visual Studio R Tools for Visual Studio: integrated development environment for R
HDInsights R integrated with cloud-based Hadoop clusters
Cortana Analytics Cloud-based R APIs and Virtual Machines
lukasz@tidk.pl
The Microsoft R Server Platform
ROpen MicrosoftRServer
DeployRDevelopR
ConnectR
• High-speed & direct connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed
format text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBCScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical tests
• Range of predictive functions
• User tools for distributing customized R algorithms across nodes
• Wide data sets supported – thousands of variables
DistributedR
• Distributed computing framework
• Delivers cross-platform portability
R+CRAN
• Open source R interpreter
• R 3.2.5
• Freely-available huge range of R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing R scripts,
functions and packages
RevoR
• Performance enhanced R interpreter
• Based on open source R
• Adds high-performance math library
to speed up linear algebra functions
lukasz@tidk.pl
Toolkits for data scientists and numerical analysts to create custom
parallel and distributed algorithms
ParallelR: parallel programming for multi-CPU servers and grids
RHadoop: map-reduce programming in R language
Mainly useful for “embarrassingly parallel” problems, where parallel
components work with small amounts of data
Big Data Predictive Analytics mostly not embarrassingly parallel
80+ pre-built “parallel external memory algorithms” included with RevoScale
Azure ML Studio includes many ML algorithms
R Packages: RHadoop and ParallelR
lukasz@tidk.pl
ScaleR – Parallel + “Big Data”
Stream data in to RAM in blocks. “Big Data” can be any data size. We
handle Megabytes to Gigabytes to Terabytes…
Our ScaleR algorithms work
inside multiple cores / nodes in
parallel at high speed
Interim results are collected and
combined analytically to produce
the output on the entire data setXDF file format is optimised to work with the ScaleR library and
significantly speeds up iterative algorithm processing.
lukasz@tidk.pl
MRS and Hadoop Architecture options
R R R R R
R R R R R
ScaleR Production
RStudio Server Pro
Microsoft R Server
1. Copy
2. Stream
3. Send
lukasz@tidk.pl
DistributedR - Hadoop Processing Methods
Method 1: Local (Linux) parallel processing using all
cores on one node, copying data from HDFS to store
in local Linux file-system.
Compute Context
HadoopCompute Context
HadoopCompute Context
Local Parallel
Linux (Local)
File-System
HDFS
Csv, Xdf
Processing
Data
1 Edge node 1:n data nodes
1:n disks 1:(n x number of
nodes) disks
Csv, Xdf
Linux FS
Read / write
Method 1
(“Beside” or “Edge”)
Copy to
Local
File
Method 2: Local (Linux) parallel processing using all
cores on one node, streaming data from / to HDFS
Compute Context
HadoopCompute Context
HadoopCompute Context
Local Parallel
Compute Context
Hadoop
Linux (Local)
File-System
HDFS
Csv, Xdf
1:n nodes
1:n disks 1:(n x number of
nodes) disks
1 Edge node
lukasz@tidk.pl
Method 3
Method 3: Hadoop (Map-Reduce) parallel processing
using all cores on n nodes, using HDFS data on each
node
Compute Context
HadoopCompute Context
HadoopCompute Context
Local Parallel
Compute Context
Hadoop
Linux (Local)
File-System
HDFS
Csv, Xdf
Processing
Data
1:n nodes
1:n disks 1:(n x number of
nodes) disks
Csv, Xdf
HDFS
Read / write
(“inside”)
R script
sent to
data
nodes
1 Edge node
R model script sent to Master Node:
1. Starts a master process
2. Distribute work
3. Master tasks for each node
4. Master initiates distributed work
1.Hadoop schedules mapper for each split
2.Algorithm computes intermediate result
3.Reducer combines intermediate results
5. Master process evaluates
completion
6. Iterates as required by the
algorithm
7. Returns consolidated answer to
script
lukasz@tidk.pl
DistributedR - What processing mode to
use, when?
Analytic data set size and processing complexity (e.g. simple summary statistics vs iterative algorithm)
guide the use of Method 1 and 2 (Edge Node / Server Linux local processing) vs Method 3 (in-Hadoop
processing)
Low Medium High
Small Data
< 10GB
Medium Data
< 50GB
Bigger Data
> 50GB
Edge Node Linux
processing
In-Hadoop
processing
Local Linux
file-system
Hadoop
file-system
Legend
Processing
Complexity
Data Size
lukasz@tidk.pl
Parallelized Algorithms
• Data import – Delimited, Fixed, SAS, SPSS, OBDC
• Variable creation & transformation
• Recode variables
• Factor variables
• Missing value handling
• Sort, Merge, Split
• Aggregate by category (means, sums)
• Min / Max, Mean, Median (approx.)
• Quantiles (approx.)
• Standard Deviation
• Variance
• Correlation
• Covariance
• Sum of Squares (cross product matrix for set variables)
• Pairwise Cross tabs
• Risk Ratio & Odds Ratio
• Cross-Tabulation of Data (standard tables & long form)
• Marginal Summaries of Cross Tabulations
• Chi Square Test
• Kendall Rank Correlation
• Fisher’s Exact Test
• Student’s t-Test
• Subsample (observations & variables)
• Random Sampling
Data Step Statistical Tests
Sampling
Descriptive Statistics
• Sum of Squares (cross product matrix for set variables)
• Multiple Linear Regression
• Generalized Linear Models (GLM) exponential family
distributions: binomial, Gaussian, inverse Gaussian,
Poisson, Tweedie. Standard link functions: cauchit,
identity, log, logit, probit. User defined distributions & link
functions.
• Covariance & Correlation Matrices
• Logistic Regression
• Classification & Regression Trees
• Predictions/scoring for models
• Residuals for all models
Predictive Models
• K-Means
• Decision Trees
• Decision Forests
• Stochastic Gradient Boosted Decision Trees
Cluster Analysis
Classification
Simulation
Variable Selection
• Stepwise Regression Linear, Logistic
and GLM
• Monte Carlo
• Parallel Random Number Generation
Combination
• Using Revolution rxDataStep and rxExec functions
to combine open source R with Revolution R
• PEMA API
lukasz@tidk.pl
Compare function rxKmeans vs kmenas()
lukasz@tidk.pl
Compare linear regression
lukasz@tidk.pl
Performance Comparison
 US flight data for 20 years
 Linear Regression on Arrival Delay
 Run on 4 core laptop, 16GB RAM and 500GB SSD
Microsoft R Server has no data size limits in relation to size of available RAM. When open source R operates on data
sets that exceed RAM it will fail. In contrast Microsoft R Server scales linearly well beyond RAM limits and parallel
algorithms are much faster.
lukasz@tidk.pl
Question?
lukasz@tidk.pl
tidk.pl DSaaS.co facebook.com/TIDKpl

Contenu connexe

Tendances

Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupHari Shreedharan
 
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Databricks
 
Big Data tools in practice
Big Data tools in practiceBig Data tools in practice
Big Data tools in practiceDarko Marjanovic
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Databricks
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Wes McKinney
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataWes McKinney
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Pactera_US
 
Future of data visualization
Future of data visualizationFuture of data visualization
Future of data visualizationhadoopsphere
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future Wes McKinney
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesDatabricks
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin Databricks
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark Summit
 

Tendances (19)

Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User Group
 
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
 
Big Data tools in practice
Big Data tools in practiceBig Data tools in practice
Big Data tools in practice
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
Geospatial Analytics at Scale with Deep Learning and Apache Spark with Tim hu...
 
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
 
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory dataApache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
Future of data visualization
Future of data visualizationFuture of data visualization
Future of data visualization
 
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
 
Big data applications
Big data applicationsBig data applications
Big data applications
 
Building Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta LakesBuilding Data Intensive Analytic Application on Top of Delta Lakes
Building Data Intensive Analytic Application on Top of Delta Lakes
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
The Key to Machine Learning is Prepping the Right Data with Jean Georges Perrin
 
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadinSpark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
Spark and Cassandra: An Amazing Apache Love Story by Patrick McFadin
 

Similaire à AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight

PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DiveTravis Wright
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBob Ward
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
Microsoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckMicrosoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckGeorge Walters
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Dataconomy Media
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Brk3288 sql server v.next with support on linux, windows and containers was...
Brk3288 sql server v.next with support on linux, windows and containers   was...Brk3288 sql server v.next with support on linux, windows and containers   was...
Brk3288 sql server v.next with support on linux, windows and containers was...Bob Ward
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformBob Ward
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and dockerBob Ward
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionTravis Wright
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Andrey Vykhodtsev
 

Similaire à AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight (20)

PASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep DivePASS Summit - SQL Server 2017 Deep Dive
PASS Summit - SQL Server 2017 Deep Dive
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Microsoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deckMicrosoft SQL server 2017 Level 300 technical deck
Microsoft SQL server 2017 Level 300 technical deck
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,Real-Time Analytics with Apache Cassandra and Apache Spark,
Real-Time Analytics with Apache Cassandra and Apache Spark,
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Brk3288 sql server v.next with support on linux, windows and containers was...
Brk3288 sql server v.next with support on linux, windows and containers   was...Brk3288 sql server v.next with support on linux, windows and containers   was...
Brk3288 sql server v.next with support on linux, windows and containers was...
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data Platform
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
SQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux IntroductionSQL Server 2017 on Linux Introduction
SQL Server 2017 on Linux Introduction
 
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 

Plus de Łukasz Grala

Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftŁukasz Grala
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Microsoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RMicrosoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RŁukasz Grala
 
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...Łukasz Grala
 
AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?Łukasz Grala
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
 
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...Łukasz Grala
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and AnalyticsŁukasz Grala
 
20160405 Cloud Community Poznań - Cloud Analytics on Azure
20160405  Cloud Community Poznań - Cloud Analytics on Azure20160405  Cloud Community Poznań - Cloud Analytics on Azure
20160405 Cloud Community Poznań - Cloud Analytics on AzureŁukasz Grala
 
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine LearningŁukasz Grala
 
20160316 techstolica - cloudstorage -tidk
20160316  techstolica - cloudstorage -tidk20160316  techstolica - cloudstorage -tidk
20160316 techstolica - cloudstorage -tidkŁukasz Grala
 
20160316 techstolica - cloudanalytics -tidk
20160316  techstolica - cloudanalytics -tidk20160316  techstolica - cloudanalytics -tidk
20160316 techstolica - cloudanalytics -tidkŁukasz Grala
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive AnalyticsŁukasz Grala
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - PolybaseŁukasz Grala
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016Łukasz Grala
 
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Łukasz Grala
 
Pre mts Sharepoint 2010 i SQL Server 2012
Pre mts   Sharepoint 2010 i SQL Server 2012Pre mts   Sharepoint 2010 i SQL Server 2012
Pre mts Sharepoint 2010 i SQL Server 2012Łukasz Grala
 
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz gralaŁukasz Grala
 
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz gralaŁukasz Grala
 
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas..."SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...Łukasz Grala
 

Plus de Łukasz Grala (20)

Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Microsoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package RMicrosoft ML - State of The Art Microsoft Machine Learning - Package R
Microsoft ML - State of The Art Microsoft Machine Learning - Package R
 
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
AnalyticsConf2016 - Innowacyjność poprzez inteligentną analizę informacji - C...
 
AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?AzureDay - What is Machine Learnin?
AzureDay - What is Machine Learnin?
 
AzureDay - Introduction Big Data Analytics.
AzureDay  - Introduction Big Data Analytics.AzureDay  - Introduction Big Data Analytics.
AzureDay - Introduction Big Data Analytics.
 
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
WyspaIT 2016 - Azure Stream Analytics i Azure Machine Learning w analizie str...
 
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics20060416   Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
20060416 Azure Boot Camp 2016- Azure Data Lake Storage and Analytics
 
20160405 Cloud Community Poznań - Cloud Analytics on Azure
20160405  Cloud Community Poznań - Cloud Analytics on Azure20160405  Cloud Community Poznań - Cloud Analytics on Azure
20160405 Cloud Community Poznań - Cloud Analytics on Azure
 
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning20160309   AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
20160309 AzureDay 2016 - Azure Stream Analytics & Azure Machine Learning
 
20160316 techstolica - cloudstorage -tidk
20160316  techstolica - cloudstorage -tidk20160316  techstolica - cloudstorage -tidk
20160316 techstolica - cloudstorage -tidk
 
20160316 techstolica - cloudanalytics -tidk
20160316  techstolica - cloudanalytics -tidk20160316  techstolica - cloudanalytics -tidk
20160316 techstolica - cloudanalytics -tidk
 
Prescriptive Analytics
Prescriptive AnalyticsPrescriptive Analytics
Prescriptive Analytics
 
DAC4B 2015 - Polybase
DAC4B 2015 - PolybaseDAC4B 2015 - Polybase
DAC4B 2015 - Polybase
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
Nowy SQL Server 2012 – DENALI rewolucją w silnikach baz danych - Microsoft te...
 
Pre mts Sharepoint 2010 i SQL Server 2012
Pre mts   Sharepoint 2010 i SQL Server 2012Pre mts   Sharepoint 2010 i SQL Server 2012
Pre mts Sharepoint 2010 i SQL Server 2012
 
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
 
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz gralaSQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych  - łukasz grala
SQL Day 2011 - Modelowanie i zasilanie wymiarów hurtowni danych - łukasz grala
 
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas..."SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011-  Łukas...
"SharePoint 2010 a SQL Server" - Konferencja Time For SharePoint 2011- Łukas...
 

Dernier

➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 

Dernier (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 

AnalyticsConf2016 - Zaawansowana analityka na platformie Azure HDInsight

  • 1. Zaawansowana analityka na platformie Azure HDInsight Łukasz Grala | Senior Architect
  • 2. Łukasz Grala • Senior architekt rozwiązań Platformy Danych & Business Intelligence & Zaawansowanej Analityki w TIDK • Twórca „Data Scientist as as Service” • Certyfikowany trener Microsoft i wykładowca na wyższych uczelniach • Autor zaawansowanych szkoleń i warsztatów, oraz licznych publikacji i webcastów • Od 2010 roku wyróżniany nagrodą Microsoft Data Platform MVP • Doktorant Politechnika Poznańska – Wydział Informatyki (obszar bazy danych, eksploracja danych, uczenie maszynowe) • Prelegent na licznych konferencjach w kraju i na świecie • Posiada liczne certyfikaty (MCT, MCSE, MCSA, MCITP,…) • Członek Polskiego Towarzystwa Informatycznego • Członek i lider Polish SQL Server User Group (PLSSUG) • Pasjonat analizy, przechowywania i przetwarzania danych, miłośnik Jazzu email lukasz@tidk.pl
  • 3. Big Data – 4V http://www.ibmbigdatahub.com/infographic/four-vs-big-data http://www.ibmbigdatahub.com/infographic/four-vs-big-data lukasz@tidk.pl
  • 5. Hadoop Ecosystem runs on ETL RDBMS Import/Export Distributed Storage & Processing Framework Secure NoSQL DB SQL on HBase NoSQL DB Workflow Management SQL Streaming Data Ingestion Cluster System Operations Secure Gateway Distributed Registry ETL Search & Indexing Even Faster Data Processing Data Management Machine Learning
  • 6. The Hadoop Ecosystem Management & Monitoring (Ambari) Coordination (ZooKeeper) Workflow&Scheduling (Oozie) Scripting (Pig) Machine Learning (Mahout) Query (Hive) Distributed Processing (MapReduce) Distributed Storage (HDFS) NoSQLDatabase (HBase) DataIntegration (Sqoop/REST/ODBC) lukasz@tidk.pl
  • 7. ` + /directory/structure/in/memory.txt Resource management + schedulingDisk, CPU, Memory NameNode HDFS ResourceManager YARN Hadoop daemon User application NN RM DataNode HDFS NodeManager YARN Worker Node Core lukasz@tidk.pl
  • 8. HDFS constitution.txt The mappers read the file’s blocks from HDFS line-by-line 1 We the people, in order to form a... The lines of text are split into words and output to the reducers 2 The shuffle/sort phase combines pairs with the same key 3 The reducers add up the “1’s” and output the word and its count 4 <We, 1> <the,1> <people,1> <in,1> <order, 1> <to,1> <form,1> <a,1> <We, (1,1,1,1)> <the, (1,1,1,1,1,1,1,...)> <people,(1,1,1,1,1)> <form, (1)><We,4> <the,265> <people,5> <form,1>HDFS WordCount in MapReduce lukasz@tidk.pl
  • 9. What is Apache Ambari? A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. Apache Ambari takes the guesswork out of operating Hadoop. lukasz@tidk.pl
  • 10. Spark’s Position in a Modern Data Platform Disk Based Source Streaming Source Reference Data Stream Processing Storm/Spark-Streaming Data Pipeline Hive/Pig/Spark Long Term Data Warehouse Hive + ORC Data Discovery Operational Reporting Business Intelligence Ad Hoc/On Demand Source Data Science Spark-ML, Spark-SQL Advanced Analytics Data Sources Data Processing, Storage & Analytics Data Access lukasz@tidk.pl
  • 11. Spark Context  Main entry point for Spark functionality  Represents a connection to a Spark cluster  Represented as sc in your code What is it? lukasz@tidk.pl
  • 12. Analytics Platform System SQL Server Information Management (ADF, MDS, DQS, SSIS & DataSync) Analytics Cortana Suite SQL Server Reporting Services PowerBI Microsoft Data Platform (1) Datazen Server SQL Server Analysis Services lukasz@tidk.pl
  • 13. Azure Data Lake Azure DocumentDB Azure HDInsight (Hadoop, Spark, Hbase & Storm) Azure Machine Learning Azure Search Microsoft Data Platform (2) Azure SQL Database Azure Stream Analytics Azure SQL Data Warehouse lukasz@tidk.pl
  • 14. HDInsight • HDInsight is a Hadoop-based service that brings 100% Apache Hadoop solution running on the Microsoft Azure platform • Based on the Hortonworks Data Platform (HDP) • Scalable, on-demand service lukasz@tidk.pl
  • 15. CRAN: 7000+ add-on packages for R CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/ lukasz@tidk.pl
  • 16. 1993 Research project in Auckland, NZ • Ross Ihaka and Robert Gentlemen 1995 Released as open-source software • Generally compatible with the “S” language 1997 R core group formed 2003 R Foundation formed in Austria 2007 Revolution Analytics founded 2014 Revolution R Open launched 2015 R Consortium founded 2015 Microsoft acquires Revolution Analytics 2016 Microsoft R Open 3.2.3 released A brief history of R Photo credit: Robert Gentleman lukasz@tidk.pl
  • 17. R: The #1 software for Data Science … and #6 amongst general-purpose programming languages R Usage Growth Rexer Data Miner Survey, 2007-2015 Language Popularity IEEE Spectrum Top Programming Languages, 2015 76% of analytic professionals report using R 36% select R as their primary tool lukasz@tidk.pl
  • 18. Use Microsoft R Open with… Microsoft R Server Big-data analytics and distributed computing on Linux, Hadoop and Teradata SQL Server 2016 Big-data analytics integrated with SQL Server database PowerBI Computations and charts from R scripts in dashboards Azure ML Studio R Scripts in cloud-based Experiment workflows Visual Studio R Tools for Visual Studio: integrated development environment for R HDInsights R integrated with cloud-based Hadoop clusters Cortana Analytics Cloud-based R APIs and Virtual Machines lukasz@tidk.pl
  • 19. The Microsoft R Server Platform ROpen MicrosoftRServer DeployRDevelopR ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database & Aster • EDWs and ADWs • ODBCScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Range of predictive functions • User tools for distributing customized R algorithms across nodes • Wide data sets supported – thousands of variables DistributedR • Distributed computing framework • Delivers cross-platform portability R+CRAN • Open source R interpreter • R 3.2.5 • Freely-available huge range of R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Performance enhanced R interpreter • Based on open source R • Adds high-performance math library to speed up linear algebra functions lukasz@tidk.pl
  • 20. Toolkits for data scientists and numerical analysts to create custom parallel and distributed algorithms ParallelR: parallel programming for multi-CPU servers and grids RHadoop: map-reduce programming in R language Mainly useful for “embarrassingly parallel” problems, where parallel components work with small amounts of data Big Data Predictive Analytics mostly not embarrassingly parallel 80+ pre-built “parallel external memory algorithms” included with RevoScale Azure ML Studio includes many ML algorithms R Packages: RHadoop and ParallelR lukasz@tidk.pl
  • 21. ScaleR – Parallel + “Big Data” Stream data in to RAM in blocks. “Big Data” can be any data size. We handle Megabytes to Gigabytes to Terabytes… Our ScaleR algorithms work inside multiple cores / nodes in parallel at high speed Interim results are collected and combined analytically to produce the output on the entire data setXDF file format is optimised to work with the ScaleR library and significantly speeds up iterative algorithm processing. lukasz@tidk.pl
  • 22. MRS and Hadoop Architecture options R R R R R R R R R R ScaleR Production RStudio Server Pro Microsoft R Server 1. Copy 2. Stream 3. Send lukasz@tidk.pl
  • 23. DistributedR - Hadoop Processing Methods Method 1: Local (Linux) parallel processing using all cores on one node, copying data from HDFS to store in local Linux file-system. Compute Context HadoopCompute Context HadoopCompute Context Local Parallel Linux (Local) File-System HDFS Csv, Xdf Processing Data 1 Edge node 1:n data nodes 1:n disks 1:(n x number of nodes) disks Csv, Xdf Linux FS Read / write Method 1 (“Beside” or “Edge”) Copy to Local File Method 2: Local (Linux) parallel processing using all cores on one node, streaming data from / to HDFS Compute Context HadoopCompute Context HadoopCompute Context Local Parallel Compute Context Hadoop Linux (Local) File-System HDFS Csv, Xdf 1:n nodes 1:n disks 1:(n x number of nodes) disks 1 Edge node lukasz@tidk.pl
  • 24. Method 3 Method 3: Hadoop (Map-Reduce) parallel processing using all cores on n nodes, using HDFS data on each node Compute Context HadoopCompute Context HadoopCompute Context Local Parallel Compute Context Hadoop Linux (Local) File-System HDFS Csv, Xdf Processing Data 1:n nodes 1:n disks 1:(n x number of nodes) disks Csv, Xdf HDFS Read / write (“inside”) R script sent to data nodes 1 Edge node R model script sent to Master Node: 1. Starts a master process 2. Distribute work 3. Master tasks for each node 4. Master initiates distributed work 1.Hadoop schedules mapper for each split 2.Algorithm computes intermediate result 3.Reducer combines intermediate results 5. Master process evaluates completion 6. Iterates as required by the algorithm 7. Returns consolidated answer to script lukasz@tidk.pl
  • 25. DistributedR - What processing mode to use, when? Analytic data set size and processing complexity (e.g. simple summary statistics vs iterative algorithm) guide the use of Method 1 and 2 (Edge Node / Server Linux local processing) vs Method 3 (in-Hadoop processing) Low Medium High Small Data < 10GB Medium Data < 50GB Bigger Data > 50GB Edge Node Linux processing In-Hadoop processing Local Linux file-system Hadoop file-system Legend Processing Complexity Data Size lukasz@tidk.pl
  • 26. Parallelized Algorithms • Data import – Delimited, Fixed, SAS, SPSS, OBDC • Variable creation & transformation • Recode variables • Factor variables • Missing value handling • Sort, Merge, Split • Aggregate by category (means, sums) • Min / Max, Mean, Median (approx.) • Quantiles (approx.) • Standard Deviation • Variance • Correlation • Covariance • Sum of Squares (cross product matrix for set variables) • Pairwise Cross tabs • Risk Ratio & Odds Ratio • Cross-Tabulation of Data (standard tables & long form) • Marginal Summaries of Cross Tabulations • Chi Square Test • Kendall Rank Correlation • Fisher’s Exact Test • Student’s t-Test • Subsample (observations & variables) • Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics • Sum of Squares (cross product matrix for set variables) • Multiple Linear Regression • Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. • Covariance & Correlation Matrices • Logistic Regression • Classification & Regression Trees • Predictions/scoring for models • Residuals for all models Predictive Models • K-Means • Decision Trees • Decision Forests • Stochastic Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection • Stepwise Regression Linear, Logistic and GLM • Monte Carlo • Parallel Random Number Generation Combination • Using Revolution rxDataStep and rxExec functions to combine open source R with Revolution R • PEMA API lukasz@tidk.pl
  • 27. Compare function rxKmeans vs kmenas() lukasz@tidk.pl
  • 29. Performance Comparison  US flight data for 20 years  Linear Regression on Arrival Delay  Run on 4 core laptop, 16GB RAM and 500GB SSD Microsoft R Server has no data size limits in relation to size of available RAM. When open source R operates on data sets that exceed RAM it will fail. In contrast Microsoft R Server scales linearly well beyond RAM limits and parallel algorithms are much faster. lukasz@tidk.pl