SlideShare une entreprise Scribd logo
1  sur  36
1Pivotal Confidential–Internal Use Only
BUILT FOR THE SPEED OF BUSINESS
2Pivotal Confidential–Internal Use Only 2Pivotal Confidential–Internal Use Only
MADlib Architecture
3Pivotal Confidential–Internal Use Only
MPP (Massively Parallel Processing)
Network
Interconnect
... ...
......
Master
Servers
Query planning &
dispatch
Segment
Servers
Query processing
& data storage
SQL
MapReduce
External
Sources
Loading,
streaming, etc.
Shared-Nothing Database Architecture
4Pivotal Confidential–Internal Use Only
Architecture
C API
(HAWQ, GPDB, PostgreSQL)
Low-level Abstraction Layer
(array operations,
C++ to DB type-bridge, …)
RDBMS
Built-in
Functions
User Interface
Functions for Inner Loops
(implements ML logic)
SQL, generated per
specification
C++
3.&Lack&of&language&support&for&
linear&algebra&
• C++&AbstracOon&Layer&uses&Eigen&
• (Dense)&Vectors&and&matrices:&
DOUBLE PRECI SI ON[ ] !
• Example:&
AnyType!
sol ve: : r un( AnyType& ar gs) { !
MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; !
MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; !
!
Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; !
x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; !
r et ur n x; !
} ! Performance:&
• No&unnecessary&copying&
• No&internal&type&conversion&
18&
Eigen
5Pivotal Confidential–Internal Use Only
How do we implement scalability?
Example: Linear Regression
• Finding linear dependencies between variables
y ≈ c0 + c1 · x1 + c2 · x2 + …?
y | x1 | …
-------+-------------
10.14 | 0 | …
11.93 | 0.69 | …
13.57 | 1.1 | …
14.17 | 1.39 | …
15.25 | 1.61 | …
16.15 | 1.79 | …
Design
matrix X
Vector of
dependent
variables y
Predictor (x1)
Regressor(y)
7Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
Segment 1
Segment 2
8Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
Segment 1
Segment 2
a c e g
b d f h
Segment1
Segment2
9Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
a c e g
b d f h
a2+c2+e2+g2
=
Data across nodes
are multiplied
10Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
a c e g
b d f h
a2+c2+e2+g2
=
Data across nodes
are multiplied!
ab+cd+ef+gh
11Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
a c e g
b d f h
a2+c2+e2+g2
=
Looks like the
result can be
decomposed
ab+cd+ef+gh
b2+d2+f2+h2
ab+cd+ef+gh
12Pivotal Confidential–Internal Use Only
Challenges in computing OLS solution
a b
c d
e f
g h
a c e g
b d f h
a2+c2+e2+g2
=
Data across nodes
are multiplied!
ab+cd+ef+gh
b2+d2+f2+h2
ab+cd+ef+gh
= +a b e
f
e f
a
b +c d
g
h
g hc
d
+
13Pivotal Confidential–Internal Use Only
Linear Regression: Streaming Algorithm
How to compute with a single table scan?
XT
X
XT
y
-1
XTyXTX
+ +
-1
14Pivotal Confidential–Internal Use Only
Problem solved? … Not Yet
 Many ML solutions are iterative without analytical
formulations
Initialize problem
Perform single step
Has converged?
Return results
false
true
15Pivotal Confidential–Internal Use Only
In general, use a convex optimization
framework
7 2.383 0.3904
6 2.869 0.4769
8 4.475 1.151
3 13.35 3.263
0 45.48 13.10
171.7 84.59
cution times
ure6: TheArchetypical Convex Function f(x) = x2
.
Application Objective
Least Squares
P
(u,y)2⌦(xT u − y)2
Lasso [38]
P
(u,y)2⌦(xT u − y)2 + µkxk1
Logisitic Regression
P
(u,y)2⌦log(1 + exp(−yxtu))
P T
Each step has an analytical formulation
that can be performed in parallel
Gradient Descent
Start at a random point
Repeat
Determine a descent direction
Choose a step size
Update the model
Until stopping criterion is satisfied
16Pivotal Confidential–Internal Use Only
Architecture
C API
(HAWQ, GPDB, PostgreSQL)
Low-level Abstraction Layer
(array operations,
C++ to DB type-bridge, …)
RDBMS
Built-in
Functions
User Interface
Functions for Inner Loops
(implements ML logic)
SQL, generated per
specification
C++
3.&Lack&of&language&support&for&
linear&algebra&
• C++&AbstracOon&Layer&uses&Eigen&
• (Dense)&Vectors&and&matrices:&
DOUBLE PRECI SI ON[ ] !
• Example:&
AnyType!
sol ve: : r un( AnyType& ar gs) { !
MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; !
MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; !
!
Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; !
x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; !
r et ur n x; !
} ! Performance:&
• No&unnecessary&copying&
• No&internal&type&conversion&
18&
Eigen
17Pivotal Confidential–Internal Use Only
Architecture
C API
(Greenplum, PostgreSQL, HAWQ)
Low-level Abstraction Layer
(array operations,
C++ to DB type-bridge, …)
RDBMS
Built-in
Functions
User Interface
High-level Iteration Layer
(iteration controller, …)
Functions for Inner Loops
(implements ML logic)
Python
SQL, generated per
specification
C++
3.&Lack&of&language&support&for&
linear&algebra&
• C++&AbstracOon&Layer&uses&Eigen&
• (Dense)&Vectors&and&matrices:&
DOUBLE PRECI SI ON[ ] !
• Example:&
AnyType!
sol ve: : r un( AnyType& ar gs) { !
MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; !
MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; !
!
Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; !
x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; !
r et ur n x; !
} ! Performance:&
• No&unnecessary&copying&
• No&internal&type&conversion&
18&
Eigen
18Pivotal Confidential–Internal Use Only 18Pivotal Confidential–Internal Use Only
But not all data scientists
speak SQL …
Accessing scalability through R
19Pivotal Confidential–Internal Use Only
Why R?
O’Reilly: Strata 2013 Data Science Salary Survey
“The preponderance of R and Python usage is more surprising …
two most commonly used individual tools, even above Excel. R and Python are likely
popular because they are easily accessible and effective open source tools.”
That SQL/RDBisthetop bar isno surprise: accessingdataisthemeat
and potatoes of data analysis, and has not been displaced by other
tools. Thepreponderance of R and Python usageismoresurprising
—operating systems aside, these werethetwo most commonly used
individual tools, even aboveExcel, which for yearshasbeen thego-to
20Pivotal Confidential–Internal Use Only
PivotalR: Bringing MADlib and HAWQ to a
familiar R interface
 Challenge
Want to harness the familiarity of R’s interface and
the performance & scalability benefits of in-DB
analytics
d <- db.data.frame(”houses")
houses_linregr <-
madlib.lm(price ~ tax
+ bath
+ size
, data=d)
Pivotal R
SELECT madlib.linregr_train( 'houses’,
'houses_linregr’,
'price’,
'ARRAY[1, tax, bath, size]’);
SQL Code
21Pivotal Confidential–Internal Use Only
PivotalR Design Overview
2. SQL to execute
3. Computation results
1. R  SQL
RPostgreSQL
PivotalR
Data lives hereNo data here
Database/HAWQ
w/ MADlib
• Syntax is analogous to native R function
• Data doesn’t need to leave the database
• All heavy lifting, including model estimation
& computation, are done in the database
22Pivotal Confidential–Internal Use Only 22Pivotal Confidential–Internal Use Only
Demo
23Pivotal Confidential–Internal Use Only
library(PivotalR)
db.connect(port = 14526, dbname = "madlib")
db.objects()
x <- db.data.frame("madlibtestdata.dt_abalone")
dim(x)
names(x)
x$rings
lookat(x, 10) # look at a sample of table
mean(x$rings)
lookat(mean(x$rings))
fit <- madlib.lm(rings ~ . - id | sex, data = y)
predict(fit, x)
mean((x$rings - predict(fit, x))^2)
x$sex <- as.factor(v$sex)
m0 <- madlib.glm(resp ~ age,
family="binomial", data=dbbank)
mstep <- step(m0, scope=list(
lower=~age,
upper=~age + factor(marital) +
factor(education) +
factor(housing) + factor(loan) +
factor(job)))
Load the Library
Connect to the database “madlib” on port 14526
List all the tables in the active connection
Create an R object that references a table in the database
Report #/rows and #/columns in the table
Column names within the table
Database query object representing “select rings from madlibtestdata.dt_abalone”
Pull 10 rows of data from the table back into the R environment
query object representing “select avg(rings) from madlibtestdata.dt_abalone”
execute the query and report back the result
Run a linear regression within the database and return a model object
Create a query object representing scoring the model in the database
Query object calculating the mean square error of the model
Add a calculated factor column to the database query object
Calculate a logistic regression model
Perform stepwise feature selection
Demonstration
26Pivotal Confidential–Internal Use Only
Class hierarchy
db.obj
db.data.frame db.Rquery
db.table db.view
Wrapper of objects in database
x = db.data.frame("table")
Resides in R only
x[,1:2],
merge(x, y, by="column")
Operations/
MADlib
functions
lookat
as.db.data.frame
operation
27Pivotal Confidential–Internal Use Only
Some of current features
A wrapper of MADlib
• Generalized linear models
(lm, glm)
• Elastic Net (elnet)
• Cross validation (generic.cv)
• ARIMA
• Tree methods
(rpart, randomforest)
• Table summary
• $ [ [[ $<- [<- [[<-
• is.na
+ - * /
%% %/% ^
• & | !
• == != > < >= <=
• merge
• by
• db.data.frame
• as.db.data.frame
• preview• sort
• c mean sum sd var min max
length colMeans colSums
• db.connect db.disconnect
db.list db.objects
db.existsObject delete
• dim
• names
• as.factor()
• content
And more ... (SQL wrapper)
• predict
28Pivotal Confidential–Internal Use Only
We’re looking for contributors
• Browse our help pages
– Start page: madlib.net
– Github pages
• github.com/apache/incubator-madlib (SQL)
• github.com/pivotalsoftware/pivotalr (R)
• github.com/pivotalsoftware/pymadlib (Python)
• Use our product and report issues:
• https://issues.apache.org/jira/browse/MADLIB (Issue tracker)
• user@madlib.incubator.apache.org (User forum)
• dev@madlib.incubator.apache.org (Developer forum)
29Pivotal Confidential–Internal Use Only
Credits
Leaders and contributors:
Gavin Sherry
Caleb Welton
Joseph Hellerstein
Christopher Ré
Zhe Wang
Florian Schoppmann
Hai Qian
Shengwen Yang
Xixuan Feng
and many others
…
30Pivotal Confidential–Internal Use Only 30Pivotal Confidential–Internal Use Only
Thank you for your attention
Important links:
Product email: user@madlib.net
Product site: madlib.net
31Pivotal Confidential–Internal Use Only 31Pivotal Confidential–Internal Use Only
Backup slides
32Pivotal Confidential–Internal Use Only
Performing a linear regression on 10 million rows in seconds
Hellerstein et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB
Endowment 5.12 (2012): 1700-1711.
33Pivotal Confidential–Internal Use Only
Reminder: Linear-Regression Model
•
• If residuals i.i.d. Gaussians with standard deviation σ:
– max likelihood ⇔ min sum of squared residuals
• First-order conditions for the following quadratic objective
(in c)
yield the minimizer
34Pivotal Confidential–Internal Use Only
Linear Regression: Streaming Algorithm
How to compute with a single table scan?
XT
X
XT
y
-1
XTX XTy
35Pivotal Confidential–Internal Use Only
PivotalR Architecture
36Pivotal Confidential–Internal Use Only
37Pivotal Confidential–Internal Use Only 37Pivotal Confidential–Internal Use Only
PL/X Procedural Languages
38Pivotal Confidential–Internal Use Only
PivotalR vs PL/R
PivotalR
• Interface is R client
• Execution is in database
• Parallelism handled by
PivotalR
• Supports a portion of R
R> x = db.data.frame(“t1”)
R> l = madlib.lm(interlocks ~ assets + nation, data = t)
PL/R
• Interface is SQL client
• Execution is in R
• Parallelism via SQL
function invocation
• Supports all of R
psql> CREATE FUNCTION lregr() …
LANGUAGE PLR;
psql> SELECT lregr( array_agg(interlocks),
array_agg(assets),
array_agg(nation) )
FROM t1;
39Pivotal Confidential–Internal Use Only
Parallelized R in Pivotal via PL/R:
An Example
SQL & R
 R piggy-backs on Pivotal’s parallel architecture
 Minimize data movement
 Build predictive model for each state in parallel
TN
Data
CA
Data
NY
Data
PA
Data
TX
Data
CT
Data
NJ
Data
IL
Data
MA
Data
WA
Data
TN
Model
CA
Model
NY
Model
PA
Model
TX
Model
CT
Model
NJ
Model
IL
Model
MA
Model
WA
Model

Contenu connexe

Tendances

Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingPeter Haase
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigDataWorks Summit/Hadoop Summit
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRclive boulton
 

Tendances (20)

Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Apache drill
Apache drillApache drill
Apache drill
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
HW09 Hadoop Vaidya
HW09 Hadoop VaidyaHW09 Hadoop Vaidya
HW09 Hadoop Vaidya
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Seattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapRSeattle Scalability Meetup - Ted Dunning - MapR
Seattle Scalability Meetup - Ted Dunning - MapR
 
What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS What's new in Hadoop Common and HDFS
What's new in Hadoop Common and HDFS
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 

Similaire à MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR

Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing EcosystemDatabricks
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in SparkDatabricks
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptxPaulo Alonso
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 

Similaire à MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR (20)

Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptx
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Bio bigdata
Bio bigdata Bio bigdata
Bio bigdata
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 

Plus de PivotalOpenSourceHub

Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabasePivotalOpenSourceHub
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache GeodePivotalOpenSourceHub
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationPivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode AdaptorPivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analyticsPivotalOpenSourceHub
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed SystemsPivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
 

Plus de PivotalOpenSourceHub (20)

Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache Geode
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 

Dernier

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Dernier (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 

MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR

  • 1. 1Pivotal Confidential–Internal Use Only BUILT FOR THE SPEED OF BUSINESS
  • 2. 2Pivotal Confidential–Internal Use Only 2Pivotal Confidential–Internal Use Only MADlib Architecture
  • 3. 3Pivotal Confidential–Internal Use Only MPP (Massively Parallel Processing) Network Interconnect ... ... ...... Master Servers Query planning & dispatch Segment Servers Query processing & data storage SQL MapReduce External Sources Loading, streaming, etc. Shared-Nothing Database Architecture
  • 4. 4Pivotal Confidential–Internal Use Only Architecture C API (HAWQ, GPDB, PostgreSQL) Low-level Abstraction Layer (array operations, C++ to DB type-bridge, …) RDBMS Built-in Functions User Interface Functions for Inner Loops (implements ML logic) SQL, generated per specification C++ 3.&Lack&of&language&support&for& linear&algebra& • C++&AbstracOon&Layer&uses&Eigen& • (Dense)&Vectors&and&matrices:& DOUBLE PRECI SI ON[ ] ! • Example:& AnyType! sol ve: : r un( AnyType& ar gs) { ! MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; ! MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; ! ! Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; ! x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; ! r et ur n x; ! } ! Performance:& • No&unnecessary&copying& • No&internal&type&conversion& 18& Eigen
  • 5. 5Pivotal Confidential–Internal Use Only How do we implement scalability? Example: Linear Regression • Finding linear dependencies between variables y ≈ c0 + c1 · x1 + c2 · x2 + …? y | x1 | … -------+------------- 10.14 | 0 | … 11.93 | 0.69 | … 13.57 | 1.1 | … 14.17 | 1.39 | … 15.25 | 1.61 | … 16.15 | 1.79 | … Design matrix X Vector of dependent variables y Predictor (x1) Regressor(y)
  • 6. 7Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h Segment 1 Segment 2
  • 7. 8Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h Segment 1 Segment 2 a c e g b d f h Segment1 Segment2
  • 8. 9Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h a c e g b d f h a2+c2+e2+g2 = Data across nodes are multiplied
  • 9. 10Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h a c e g b d f h a2+c2+e2+g2 = Data across nodes are multiplied! ab+cd+ef+gh
  • 10. 11Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h a c e g b d f h a2+c2+e2+g2 = Looks like the result can be decomposed ab+cd+ef+gh b2+d2+f2+h2 ab+cd+ef+gh
  • 11. 12Pivotal Confidential–Internal Use Only Challenges in computing OLS solution a b c d e f g h a c e g b d f h a2+c2+e2+g2 = Data across nodes are multiplied! ab+cd+ef+gh b2+d2+f2+h2 ab+cd+ef+gh = +a b e f e f a b +c d g h g hc d +
  • 12. 13Pivotal Confidential–Internal Use Only Linear Regression: Streaming Algorithm How to compute with a single table scan? XT X XT y -1 XTyXTX + + -1
  • 13. 14Pivotal Confidential–Internal Use Only Problem solved? … Not Yet  Many ML solutions are iterative without analytical formulations Initialize problem Perform single step Has converged? Return results false true
  • 14. 15Pivotal Confidential–Internal Use Only In general, use a convex optimization framework 7 2.383 0.3904 6 2.869 0.4769 8 4.475 1.151 3 13.35 3.263 0 45.48 13.10 171.7 84.59 cution times ure6: TheArchetypical Convex Function f(x) = x2 . Application Objective Least Squares P (u,y)2⌦(xT u − y)2 Lasso [38] P (u,y)2⌦(xT u − y)2 + µkxk1 Logisitic Regression P (u,y)2⌦log(1 + exp(−yxtu)) P T Each step has an analytical formulation that can be performed in parallel Gradient Descent Start at a random point Repeat Determine a descent direction Choose a step size Update the model Until stopping criterion is satisfied
  • 15. 16Pivotal Confidential–Internal Use Only Architecture C API (HAWQ, GPDB, PostgreSQL) Low-level Abstraction Layer (array operations, C++ to DB type-bridge, …) RDBMS Built-in Functions User Interface Functions for Inner Loops (implements ML logic) SQL, generated per specification C++ 3.&Lack&of&language&support&for& linear&algebra& • C++&AbstracOon&Layer&uses&Eigen& • (Dense)&Vectors&and&matrices:& DOUBLE PRECI SI ON[ ] ! • Example:& AnyType! sol ve: : r un( AnyType& ar gs) { ! MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; ! MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; ! ! Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; ! x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; ! r et ur n x; ! } ! Performance:& • No&unnecessary&copying& • No&internal&type&conversion& 18& Eigen
  • 16. 17Pivotal Confidential–Internal Use Only Architecture C API (Greenplum, PostgreSQL, HAWQ) Low-level Abstraction Layer (array operations, C++ to DB type-bridge, …) RDBMS Built-in Functions User Interface High-level Iteration Layer (iteration controller, …) Functions for Inner Loops (implements ML logic) Python SQL, generated per specification C++ 3.&Lack&of&language&support&for& linear&algebra& • C++&AbstracOon&Layer&uses&Eigen& • (Dense)&Vectors&and&matrices:& DOUBLE PRECI SI ON[ ] ! • Example:& AnyType! sol ve: : r un( AnyType& ar gs) { ! MappedMat r i x A = ar gs[ 0] . get As<MappedMat r i x>( ) ; ! MappedCol umnVect or b = ar gs[ 1] . get As<MappedCol umnVect or >( ) ; ! ! Mut abl eMappedCol umnVect or x = al l ocat eAr r ay<doubl e>( A. col s( ) ) ; ! x = A. col Pi vHousehol der Qr ( ) . sol ve( b) ; ! r et ur n x; ! } ! Performance:& • No&unnecessary&copying& • No&internal&type&conversion& 18& Eigen
  • 17. 18Pivotal Confidential–Internal Use Only 18Pivotal Confidential–Internal Use Only But not all data scientists speak SQL … Accessing scalability through R
  • 18. 19Pivotal Confidential–Internal Use Only Why R? O’Reilly: Strata 2013 Data Science Salary Survey “The preponderance of R and Python usage is more surprising … two most commonly used individual tools, even above Excel. R and Python are likely popular because they are easily accessible and effective open source tools.” That SQL/RDBisthetop bar isno surprise: accessingdataisthemeat and potatoes of data analysis, and has not been displaced by other tools. Thepreponderance of R and Python usageismoresurprising —operating systems aside, these werethetwo most commonly used individual tools, even aboveExcel, which for yearshasbeen thego-to
  • 19. 20Pivotal Confidential–Internal Use Only PivotalR: Bringing MADlib and HAWQ to a familiar R interface  Challenge Want to harness the familiarity of R’s interface and the performance & scalability benefits of in-DB analytics d <- db.data.frame(”houses") houses_linregr <- madlib.lm(price ~ tax + bath + size , data=d) Pivotal R SELECT madlib.linregr_train( 'houses’, 'houses_linregr’, 'price’, 'ARRAY[1, tax, bath, size]’); SQL Code
  • 20. 21Pivotal Confidential–Internal Use Only PivotalR Design Overview 2. SQL to execute 3. Computation results 1. R  SQL RPostgreSQL PivotalR Data lives hereNo data here Database/HAWQ w/ MADlib • Syntax is analogous to native R function • Data doesn’t need to leave the database • All heavy lifting, including model estimation & computation, are done in the database
  • 21. 22Pivotal Confidential–Internal Use Only 22Pivotal Confidential–Internal Use Only Demo
  • 22. 23Pivotal Confidential–Internal Use Only library(PivotalR) db.connect(port = 14526, dbname = "madlib") db.objects() x <- db.data.frame("madlibtestdata.dt_abalone") dim(x) names(x) x$rings lookat(x, 10) # look at a sample of table mean(x$rings) lookat(mean(x$rings)) fit <- madlib.lm(rings ~ . - id | sex, data = y) predict(fit, x) mean((x$rings - predict(fit, x))^2) x$sex <- as.factor(v$sex) m0 <- madlib.glm(resp ~ age, family="binomial", data=dbbank) mstep <- step(m0, scope=list( lower=~age, upper=~age + factor(marital) + factor(education) + factor(housing) + factor(loan) + factor(job))) Load the Library Connect to the database “madlib” on port 14526 List all the tables in the active connection Create an R object that references a table in the database Report #/rows and #/columns in the table Column names within the table Database query object representing “select rings from madlibtestdata.dt_abalone” Pull 10 rows of data from the table back into the R environment query object representing “select avg(rings) from madlibtestdata.dt_abalone” execute the query and report back the result Run a linear regression within the database and return a model object Create a query object representing scoring the model in the database Query object calculating the mean square error of the model Add a calculated factor column to the database query object Calculate a logistic regression model Perform stepwise feature selection Demonstration
  • 23. 26Pivotal Confidential–Internal Use Only Class hierarchy db.obj db.data.frame db.Rquery db.table db.view Wrapper of objects in database x = db.data.frame("table") Resides in R only x[,1:2], merge(x, y, by="column") Operations/ MADlib functions lookat as.db.data.frame operation
  • 24. 27Pivotal Confidential–Internal Use Only Some of current features A wrapper of MADlib • Generalized linear models (lm, glm) • Elastic Net (elnet) • Cross validation (generic.cv) • ARIMA • Tree methods (rpart, randomforest) • Table summary • $ [ [[ $<- [<- [[<- • is.na + - * / %% %/% ^ • & | ! • == != > < >= <= • merge • by • db.data.frame • as.db.data.frame • preview• sort • c mean sum sd var min max length colMeans colSums • db.connect db.disconnect db.list db.objects db.existsObject delete • dim • names • as.factor() • content And more ... (SQL wrapper) • predict
  • 25. 28Pivotal Confidential–Internal Use Only We’re looking for contributors • Browse our help pages – Start page: madlib.net – Github pages • github.com/apache/incubator-madlib (SQL) • github.com/pivotalsoftware/pivotalr (R) • github.com/pivotalsoftware/pymadlib (Python) • Use our product and report issues: • https://issues.apache.org/jira/browse/MADLIB (Issue tracker) • user@madlib.incubator.apache.org (User forum) • dev@madlib.incubator.apache.org (Developer forum)
  • 26. 29Pivotal Confidential–Internal Use Only Credits Leaders and contributors: Gavin Sherry Caleb Welton Joseph Hellerstein Christopher Ré Zhe Wang Florian Schoppmann Hai Qian Shengwen Yang Xixuan Feng and many others …
  • 27. 30Pivotal Confidential–Internal Use Only 30Pivotal Confidential–Internal Use Only Thank you for your attention Important links: Product email: user@madlib.net Product site: madlib.net
  • 28. 31Pivotal Confidential–Internal Use Only 31Pivotal Confidential–Internal Use Only Backup slides
  • 29. 32Pivotal Confidential–Internal Use Only Performing a linear regression on 10 million rows in seconds Hellerstein et al. "The MADlib analytics library: or MAD skills, the SQL." Proceedings of the VLDB Endowment 5.12 (2012): 1700-1711.
  • 30. 33Pivotal Confidential–Internal Use Only Reminder: Linear-Regression Model • • If residuals i.i.d. Gaussians with standard deviation σ: – max likelihood ⇔ min sum of squared residuals • First-order conditions for the following quadratic objective (in c) yield the minimizer
  • 31. 34Pivotal Confidential–Internal Use Only Linear Regression: Streaming Algorithm How to compute with a single table scan? XT X XT y -1 XTX XTy
  • 32. 35Pivotal Confidential–Internal Use Only PivotalR Architecture
  • 34. 37Pivotal Confidential–Internal Use Only 37Pivotal Confidential–Internal Use Only PL/X Procedural Languages
  • 35. 38Pivotal Confidential–Internal Use Only PivotalR vs PL/R PivotalR • Interface is R client • Execution is in database • Parallelism handled by PivotalR • Supports a portion of R R> x = db.data.frame(“t1”) R> l = madlib.lm(interlocks ~ assets + nation, data = t) PL/R • Interface is SQL client • Execution is in R • Parallelism via SQL function invocation • Supports all of R psql> CREATE FUNCTION lregr() … LANGUAGE PLR; psql> SELECT lregr( array_agg(interlocks), array_agg(assets), array_agg(nation) ) FROM t1;
  • 36. 39Pivotal Confidential–Internal Use Only Parallelized R in Pivotal via PL/R: An Example SQL & R  R piggy-backs on Pivotal’s parallel architecture  Minimize data movement  Build predictive model for each state in parallel TN Data CA Data NY Data PA Data TX Data CT Data NJ Data IL Data MA Data WA Data TN Model CA Model NY Model PA Model TX Model CT Model NJ Model IL Model MA Model WA Model

Notes de l'éditeur

  1. For a table of 10 million rows Num of independent variables (x axis) = number of columns in table. So a vertical slice allows you to look at scalability. See roughly linear scale in execution time with number of segments. * e.g., 6 segments approx 200 sec, 24 segments approx 50 sec (4x faster)