SlideShare une entreprise Scribd logo
1  sur  40
Practical data analysis with wine 
  
December 2014 
Toshifumi Kuga CEO of TOSHI STATS SDN. BHD. 
beta version 
1
Today’s menu 
1. formula for prediction of wine price 
2. data handling (vector & matrix) 
3. liner regression model with R 
2
  Formula for prediction of wine price is public 
• Dr. Orley Ashenfelter  
• He is a professor of economics at Princeton 
University and was a president American Economic 
Association in 2011 
• The formula was public in 1990 
3 
1. formula of price prediction 
http://www.liquidasset.com/winedata.html 
Data is available here
Dr. Orley Ashenfelter’s formular 
wine price=-12.145+0.00117×amount of rain in winter+ 0.06163×average 
temperature- 0.00386×amount of rain in harvest+ 0.02385×years from 1983 
• parameters:θ=[ -12.145, 0.00117, 0.06163, -0.00386, 0.02385 ] 
• input variables:X=[1, rain winter, average temp, rain harvest, years] 
• wine price:Y=θ0+θ1×X1+θ2×X2+θ3×X3+θ4×X4 
• wine price can be represented as「Y=θX」 
※ ‘wine price’ : ratio of average price of the year against the average price of 1961, and take log of the ratio 
4 
1. formula of price prediction 
simplified in the explanation above
   Step for prediction of wine price 
• wine price:Y=θX 
• Y : value to be predicted(Future wine price in this case、unknown value) 
• X : known value(temperature in the past are known now) 
• Parametersθis unknown 
    → Ifθis obtained 、future wine price Y can be obtained, too! 
• Y in the past is also known(wine price in the past is known) 
   → X and Y in the past are available as a set → θcan be obtained 
5 
1. formula of price prediction
  Data used in the analysis 
OBS VINT Y:LPRICE2 X1:WRAIN X2:DEGREES X3:HRAIN X4:TIME_SV 
1 1952 -0.99868 600 17.1167 160 31 
2 1953 -0.4544 690 16.7333 80 30 
3 1954 430 15.3833 180 29 
4 1955 -0.80796 502 17.15 130 28 
5 1956 440 15.65 140 27 
… … … … … … … 
35 1986 563 16.2833 171 -3 
36 1987 452 16.9833 115 -4 
37 1988 808 17.1 59 -5 
38 1989 443 82 -6 
Y X 
6 
1. formula of price prediction
 How to obtainθ: Least square method 
• Compared predictions with observed value(value it the 
past), parametersθcan be obtained so that square of 
deferences can be minimize 
• There are programs (algorithms) that calculations 
automatically are executed in the computers 
• In practice, we rarely calculate parameters manually(In 
practice, it can not be solved manually) 
7 
1. formula of price prediction
1. formula of price prediction 
 Parameter calculations by computer 
8 
Value in the past 
Y Parameter calculation Price prediction model 
θ 
X 
Y=θX
  θand X are not “ just a number ” 
• gathering of numbers 
• It can be represented as vectors and matrixes in math 
• Massive amount of data can be represented by vectors 
and matrices with ease! 
• Data can be handled as vectors and matrices in computers 
• Major program language, such as R, MATLAB, python can 
prepare vectors and matrices and control them effectively 
9 
1. formula of price prediction
Be familiar with vectors and matrices! 
• You can handle data as you like 
• You can program it by yourself 
• First step for practical data analysis 
10 
1. formula of price prediction
2. Data handling(vector&matrix) 
 Math in high school is important ! 
• Arithmetic is mainly explained 
• No more than +, -, ×, / 
• Exercise manually until getting familiar 
with vectors and matrices 
• Let us verify the results by using R 
11
2. Data handling(vector&matrix) 
   Vector : one line 
• either vertical or horizontal 
[1 3 7] [5 13 ] 
12 
] 
]5 1 
b=c(5,13) 
d=c(1,5) 
a=c(1,3,7) 
Blue: Verify the results by R language
2. Data handling(vector&matrix) 
   vector : addition 
[1 3 7] 
+ = 
6 7 
2 7 + = 
13 
] 
]5 1 
[24 1] 
] 
[3 7 8] 
] 
] 
] 
a=c(1,3,7) 
b=c(2,4,1) 
a+b 
a=c(1,5) 
b=c(6,2) 
a+b
2. Data handling(vector&matrix) 
   vector : subtraction 
[1 4 7] 
- = 
6 -5 
2 3 - = 
14 
] 
]5 1 
[23 1] 
] 
[-11 6] 
] 
] 
] 
a=c(1,4,7) 
b=c(2,3,1) 
a-b 
a=c(1,5) 
b=c(6,2) 
a-b
2. Data handling(vector&matrix) 
   vector : scalar multiplication 
3 [24 1] 
6 12 
2 4 2 × = 
15 
] 
[6 12 3] 
] 
× = 
] 
] 
a=c(2,4,1) 
3*a 
b=c(6,2) 
2*b
2. Data handling(vector&matrix) 
 vector : multiplication (inner product)  
× = 
16 
[24 1] 
]3 
6 32 
2 
] 
a=c(2,4,1) 
b=c(3,6,2) 
2×3+ 4×6+1×2 =32 a%*%b
2. Data handling(vector&matrix) 
   Matrix : rectangular shape 
a=matrix(c(1,3,2,4),2,2) 2×2 2×2 3×2 
• dimension:number of rows × number of columns (m×n) 
17
2. Data handling(vector&matrix) 
   Matrix : elements 
• elements (entries) 
first row first column:1 
second row first column:3 
first row and second column:2 
second row and second column:4 
18
2. Data handling(vector&matrix) 
   Matrix : addition 
+ = 
19 
+ 
=
2. Data handling(vector&matrix) 
   Matrix : subtraction 
ー= 
20 
ー 
=
2. Data handling(vector&matrix) 
  Matrix:scalar multiplication/division 
21 
= 
× 
= 
2 × 
/ 2 = 1/2
2. Data handling(vector&matrix) 
   Matrix : multiplication 
22 
× 
× 
= 
= 
a little 
complicated?
2. Data handling(vector&matrix) 
   Let us see it more details ! 
[× 52 49] 
[1 2] 3 4 
[ 
a=matrix(c(1,3,2,4),2,2) 
b=matrix(c(2,5,9,4),2,2) 
1×9+ 2×4 =17 
3×9+ 4×4 =43]= 
23 
= 
a%*%b 
[ 17] 26 43 
1×2+ 2×5 =12 12 
3×2+ 4×5 =26
2. Data handling(vector&matrix) 
 Matrix multiplication : not commutative 
24 
× 
× 
× = × 
=
2. Data handling(vector&matrix) 
   vector : multiplication 2 
[24] 
25 
]3 
]6 × = [6 12] 12 24 
a=matrix(c(3,6),2,1) 
b=c(2,4) 
a%*%b
2. Data handling(vector&matrix) 
identity matrix 
• Diagonal elements are 1 
• Any other elements are 0 
• In multiplication with identity matrix, 
nothing is changed 
× = × = 
26 
diag(2)
inverse matrix 
• If A is m×m matrix and if A has an inverse 
matrix AA=AA=I I : identity matrix 
-1 -1 
× = × = 
a=matrix(c(1,3,2,4),2,2) 
> a 
[,1] [,2] 
[1,] 1 2 
[2,] 3 4 
27 
> inv=solve(a) 
> inv 
[,1] [,2] 
[1,] -2.0 1.0 
[2,] 1.5 -0.5 
-1= 
-1 -1 
2. Data handling(vector&matrix)
transpose matrix 
• exchange elements of row and column 
28 
a=matrix(c(1,3,2,4),2,2) 
t(a) 
= 
T 
2. Data handling(vector&matrix)
   Least squares estimation 
• Vector and matrix are used in programming least squares estimation 
• J = 1/(2*m) * T(X*θ-Y)*(X*θ-Y):cost function (Squared error function) 
• m : number of sample data 
• X is a matrix, Y is a vector、θ is a parameter vector 
• T( )means transpose matrix 
• θ can be obtained so that J is minimized ( deference between predictions 
29 
and real value can be minimized) 
→ Least squares estimation 
2. Data handling(vector&matrix)
  analysis by liner regression model “lm” 
> wineprice=lm(LPRICE2~WRAIN+DEGREES+HRAIN+TIME_SV, data=wine) 
> wineprice 
input variables 
30 
3. Liner regression with R 
After lm, put a variable to be predicted 、then ”~” and input variables、data=name of data file 
> ▼▼▼=lm(◯◯◯~△△△+■■■, data=◎◎◎) 
> ▼▼▼ 
a variable to be 
predicted 
http://www.liquidasset.com/winedata.html Data is available here
3. Liner regression with R 
   Parameters can be obtained! 
• Call: 
• lm(formula = LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, 
Let us compare them with formula of 
31 
data =wine) 
prediction of wine price 
• Coefficients: 
• (Intercept) WRAIN DEGREES HRAIN TIME_SV 
• -12.145007 0.001167 0.616365 -0.003861 0.023850
32 
RStudio 
see p38 
3. Liner regression with R
33 
3. Liner regression with R 
□ prediction 
◯ real price 
predict(wineprice,data.frame(wine))
  analyze data by functions automatically 
• By function ‘lm’, parameters can be obtained with one line command 
• There are a lot of of functions in R. we can analyze data by these functions 
without wring functions by ourselves. 
• However we should understand how calculations are done in functions 
broadly. Blackbox approach is not recommendedただし、 
• More we can understand functions, better we can select the functions for 
particular cases to solve 
• Let us be familiar with ‘lm’. Then you can understand other functions with ease 
34 
3. Liner regression with R
recommender systems 
• amazon.com and Netflix are famous for 
recommendations 
• a variety of recommendations 
• Recommend the most popular product 
→same recommendation for everyone 
• Recommend the best products for the 
individual customer 
→need for personalization method ! 
35
Personalization 
• example of method for personalized recommendations 
• θ:customers’ preference(click the products or not? 
provide the rating or not?) 
• X:items features(in the case of movies:holler? romance? 
SF?・Who is the director, actor, actress?・When and where is 
it created?) 
• Obtain probabilities based on θX by logistic regression model 
• If probability is high, recommendations of the item are 
provided to the customer 
36
Quandl:data source 
37 
• Over 10M data is 
available for free 
• Data can be 
downloaded 
directly to R、 
MATLAB、python 
https://www.quandl.com
Website of R and RStudio 
• R is a language and environment for statistical computing. R 
Foundation for Statistical Computing, Vienna, Austria. ISBN 
3-90005107-0 URL http://www.R-project.org 
• I prepare short movie about how to use R. 
http://www.toshistats.net/introduction-to-r-language/ 
• RStudio is one of the best IDE for R. 
http://www.rstudio.com/products/rstudio/download/ 
38
Thanks for your attentions 
• TOSHI STATS SDN. BHD, Digital-learning center for statistical computing in Asia 
• CEO : Toshifumi Kuga, Certified financial services auditor 
• Company website : www.toshistats.net 
• Company FB page : www.facebook.com/toshistatsco 
• Company blog : http://toshistats.wordpress.com/aboutme/ 
• Company blog is updated on AM 10:00 every Thursday and reports the latest 
information about data analysis ! Please look at this blog or Company website. 
39
Disclaimer 
• TOSHI STATS SDN. BHD. and I do not accept any responsibility or 
liability for loss or damage occasioned to any person or property 
through using materials, instructions, methods, algorithm or ideas 
contained herein, or acting or refraining from acting as a result of 
such use. TOSHI STATS SDN. BHD. and I expressly disclaim all 
implied warranties, including merchantability or fitness for any 
particular purpose. There will be no duty on TOSHI STATS SDN. 
BHD. and me to correct any errors or defects in the codes and the 
software 
© 2014 TOSHI STATS SDN. BHD. All rights reserved 
40

Contenu connexe

Tendances (20)

M a t r i k s
M a t r i k sM a t r i k s
M a t r i k s
 
Matlab
MatlabMatlab
Matlab
 
matrix algebra
matrix algebramatrix algebra
matrix algebra
 
MATRICES
MATRICESMATRICES
MATRICES
 
Matrices
Matrices Matrices
Matrices
 
matrices and determinantes
matrices and determinantes matrices and determinantes
matrices and determinantes
 
Digital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing AlgorithmDigital Differential Analyzer Line Drawing Algorithm
Digital Differential Analyzer Line Drawing Algorithm
 
Ppt presentasi matrix algebra
Ppt presentasi matrix algebraPpt presentasi matrix algebra
Ppt presentasi matrix algebra
 
Introduction to Logarithm
Introduction to LogarithmIntroduction to Logarithm
Introduction to Logarithm
 
Polynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLABPolynomials and Curve Fitting in MATLAB
Polynomials and Curve Fitting in MATLAB
 
Matlab tutorial 2
Matlab tutorial 2Matlab tutorial 2
Matlab tutorial 2
 
Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -Bba i-bm-u-2- matrix -
Bba i-bm-u-2- matrix -
 
2. determinantes
2. determinantes2. determinantes
2. determinantes
 
Matrices - Mathematics
Matrices - MathematicsMatrices - Mathematics
Matrices - Mathematics
 
Computer graphics presentation
Computer graphics presentationComputer graphics presentation
Computer graphics presentation
 
Mat lab
Mat labMat lab
Mat lab
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!5HBC: How to Graph Implicit Relations Intro Packet!
5HBC: How to Graph Implicit Relations Intro Packet!
 
digital systems and information
digital systems and informationdigital systems and information
digital systems and information
 
Matrices and determinants-1
Matrices and determinants-1Matrices and determinants-1
Matrices and determinants-1
 

En vedette

Charting Your Career Path with Globibo: Opportunities for Growth and Development
Charting Your Career Path with Globibo: Opportunities for Growth and DevelopmentCharting Your Career Path with Globibo: Opportunities for Growth and Development
Charting Your Career Path with Globibo: Opportunities for Growth and Developmentglobibo
 
Cluster analysis - Wholesale customers data set
Cluster analysis - Wholesale customers data setCluster analysis - Wholesale customers data set
Cluster analysis - Wholesale customers data setDivya Ganjoo, PMP® CSM®
 
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...Mohammed Al Hamadi
 

En vedette (6)

Wine Analytics
Wine AnalyticsWine Analytics
Wine Analytics
 
Charting Your Career Path with Globibo: Opportunities for Growth and Development
Charting Your Career Path with Globibo: Opportunities for Growth and DevelopmentCharting Your Career Path with Globibo: Opportunities for Growth and Development
Charting Your Career Path with Globibo: Opportunities for Growth and Development
 
Cluster analysis - Wholesale customers data set
Cluster analysis - Wholesale customers data setCluster analysis - Wholesale customers data set
Cluster analysis - Wholesale customers data set
 
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
Predicting Wine Quality Using Different Implementations of Decision Tree Algo...
 
Wine quality Analysis
Wine quality AnalysisWine quality Analysis
Wine quality Analysis
 
IDS 570 project presentation
IDS 570 project presentationIDS 570 project presentation
IDS 570 project presentation
 

Similaire à Practical data analysis with wine

Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingMvenkatarao
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smallerTony Tran
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraLinaCovington707
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners Jen Stirrup
 
The fundamentals of regression
The fundamentals of regressionThe fundamentals of regression
The fundamentals of regressionStephanie Locke
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1aravindangc
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Size Measurement and Estimation
Size Measurement and EstimationSize Measurement and Estimation
Size Measurement and EstimationLouis A. Poulin
 
Lab lecture 1 line_algo
Lab lecture 1 line_algoLab lecture 1 line_algo
Lab lecture 1 line_algosimpleok
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdfRohanBorgalli
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................nourhandardeer3
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in AgricultureAman Vasisht
 

Similaire à Practical data analysis with wine (20)

Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Introduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searchingIntroduction to Data Structures Sorting and searching
Introduction to Data Structures Sorting and searching
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
Big datacourse
Big datacourseBig datacourse
Big datacourse
 
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
 
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
 
CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners CuRious about R in Power BI? End to end R in Power BI for beginners
CuRious about R in Power BI? End to end R in Power BI for beginners
 
3 analysis.gtm
3 analysis.gtm3 analysis.gtm
3 analysis.gtm
 
The fundamentals of regression
The fundamentals of regressionThe fundamentals of regression
The fundamentals of regression
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
 
R Programming Intro
R Programming IntroR Programming Intro
R Programming Intro
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Size Measurement and Estimation
Size Measurement and EstimationSize Measurement and Estimation
Size Measurement and Estimation
 
Lab lecture 1 line_algo
Lab lecture 1 line_algoLab lecture 1 line_algo
Lab lecture 1 line_algo
 
R Programming - part 1.pdf
R Programming - part 1.pdfR Programming - part 1.pdf
R Programming - part 1.pdf
 
chapter1.pdf ......................................
chapter1.pdf ......................................chapter1.pdf ......................................
chapter1.pdf ......................................
 
Application of Machine Learning in Agriculture
Application of Machine  Learning in AgricultureApplication of Machine  Learning in Agriculture
Application of Machine Learning in Agriculture
 

Plus de TOSHI STATS Co.,Ltd.

ビジネスマネージャとデータ分析
ビジネスマネージャとデータ分析ビジネスマネージャとデータ分析
ビジネスマネージャとデータ分析TOSHI STATS Co.,Ltd.
 
Introduction to credit risk management
Introduction to credit risk managementIntroduction to credit risk management
Introduction to credit risk managementTOSHI STATS Co.,Ltd.
 
Basic of computational economics with MATLAB program
Basic of computational economics with MATLAB programBasic of computational economics with MATLAB program
Basic of computational economics with MATLAB programTOSHI STATS Co.,Ltd.
 

Plus de TOSHI STATS Co.,Ltd. (6)

実践データ分析基礎
実践データ分析基礎実践データ分析基礎
実践データ分析基礎
 
ビジネスマネージャとデータ分析
ビジネスマネージャとデータ分析ビジネスマネージャとデータ分析
ビジネスマネージャとデータ分析
 
How to be data savvy manager
How to be data savvy managerHow to be data savvy manager
How to be data savvy manager
 
Introduction to credit risk management
Introduction to credit risk managementIntroduction to credit risk management
Introduction to credit risk management
 
Introduction to VaR
Introduction to VaRIntroduction to VaR
Introduction to VaR
 
Basic of computational economics with MATLAB program
Basic of computational economics with MATLAB programBasic of computational economics with MATLAB program
Basic of computational economics with MATLAB program
 

Dernier

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 

Dernier (20)

Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 

Practical data analysis with wine

  • 1. Practical data analysis with wine   December 2014 Toshifumi Kuga CEO of TOSHI STATS SDN. BHD. beta version 1
  • 2. Today’s menu 1. formula for prediction of wine price 2. data handling (vector & matrix) 3. liner regression model with R 2
  • 3.   Formula for prediction of wine price is public • Dr. Orley Ashenfelter  • He is a professor of economics at Princeton University and was a president American Economic Association in 2011 • The formula was public in 1990 3 1. formula of price prediction http://www.liquidasset.com/winedata.html Data is available here
  • 4. Dr. Orley Ashenfelter’s formular wine price=-12.145+0.00117×amount of rain in winter+ 0.06163×average temperature- 0.00386×amount of rain in harvest+ 0.02385×years from 1983 • parameters:θ=[ -12.145, 0.00117, 0.06163, -0.00386, 0.02385 ] • input variables:X=[1, rain winter, average temp, rain harvest, years] • wine price:Y=θ0+θ1×X1+θ2×X2+θ3×X3+θ4×X4 • wine price can be represented as「Y=θX」 ※ ‘wine price’ : ratio of average price of the year against the average price of 1961, and take log of the ratio 4 1. formula of price prediction simplified in the explanation above
  • 5.    Step for prediction of wine price • wine price:Y=θX • Y : value to be predicted(Future wine price in this case、unknown value) • X : known value(temperature in the past are known now) • Parametersθis unknown     → Ifθis obtained 、future wine price Y can be obtained, too! • Y in the past is also known(wine price in the past is known)    → X and Y in the past are available as a set → θcan be obtained 5 1. formula of price prediction
  • 6.   Data used in the analysis OBS VINT Y:LPRICE2 X1:WRAIN X2:DEGREES X3:HRAIN X4:TIME_SV 1 1952 -0.99868 600 17.1167 160 31 2 1953 -0.4544 690 16.7333 80 30 3 1954 430 15.3833 180 29 4 1955 -0.80796 502 17.15 130 28 5 1956 440 15.65 140 27 … … … … … … … 35 1986 563 16.2833 171 -3 36 1987 452 16.9833 115 -4 37 1988 808 17.1 59 -5 38 1989 443 82 -6 Y X 6 1. formula of price prediction
  • 7.  How to obtainθ: Least square method • Compared predictions with observed value(value it the past), parametersθcan be obtained so that square of deferences can be minimize • There are programs (algorithms) that calculations automatically are executed in the computers • In practice, we rarely calculate parameters manually(In practice, it can not be solved manually) 7 1. formula of price prediction
  • 8. 1. formula of price prediction  Parameter calculations by computer 8 Value in the past Y Parameter calculation Price prediction model θ X Y=θX
  • 9.   θand X are not “ just a number ” • gathering of numbers • It can be represented as vectors and matrixes in math • Massive amount of data can be represented by vectors and matrices with ease! • Data can be handled as vectors and matrices in computers • Major program language, such as R, MATLAB, python can prepare vectors and matrices and control them effectively 9 1. formula of price prediction
  • 10. Be familiar with vectors and matrices! • You can handle data as you like • You can program it by yourself • First step for practical data analysis 10 1. formula of price prediction
  • 11. 2. Data handling(vector&matrix)  Math in high school is important ! • Arithmetic is mainly explained • No more than +, -, ×, / • Exercise manually until getting familiar with vectors and matrices • Let us verify the results by using R 11
  • 12. 2. Data handling(vector&matrix)    Vector : one line • either vertical or horizontal [1 3 7] [5 13 ] 12 ] ]5 1 b=c(5,13) d=c(1,5) a=c(1,3,7) Blue: Verify the results by R language
  • 13. 2. Data handling(vector&matrix)    vector : addition [1 3 7] + = 6 7 2 7 + = 13 ] ]5 1 [24 1] ] [3 7 8] ] ] ] a=c(1,3,7) b=c(2,4,1) a+b a=c(1,5) b=c(6,2) a+b
  • 14. 2. Data handling(vector&matrix)    vector : subtraction [1 4 7] - = 6 -5 2 3 - = 14 ] ]5 1 [23 1] ] [-11 6] ] ] ] a=c(1,4,7) b=c(2,3,1) a-b a=c(1,5) b=c(6,2) a-b
  • 15. 2. Data handling(vector&matrix)    vector : scalar multiplication 3 [24 1] 6 12 2 4 2 × = 15 ] [6 12 3] ] × = ] ] a=c(2,4,1) 3*a b=c(6,2) 2*b
  • 16. 2. Data handling(vector&matrix)  vector : multiplication (inner product)  × = 16 [24 1] ]3 6 32 2 ] a=c(2,4,1) b=c(3,6,2) 2×3+ 4×6+1×2 =32 a%*%b
  • 17. 2. Data handling(vector&matrix)    Matrix : rectangular shape a=matrix(c(1,3,2,4),2,2) 2×2 2×2 3×2 • dimension:number of rows × number of columns (m×n) 17
  • 18. 2. Data handling(vector&matrix)    Matrix : elements • elements (entries) first row first column:1 second row first column:3 first row and second column:2 second row and second column:4 18
  • 19. 2. Data handling(vector&matrix)    Matrix : addition + = 19 + =
  • 20. 2. Data handling(vector&matrix)    Matrix : subtraction ー= 20 ー =
  • 21. 2. Data handling(vector&matrix)   Matrix:scalar multiplication/division 21 = × = 2 × / 2 = 1/2
  • 22. 2. Data handling(vector&matrix)    Matrix : multiplication 22 × × = = a little complicated?
  • 23. 2. Data handling(vector&matrix)    Let us see it more details ! [× 52 49] [1 2] 3 4 [ a=matrix(c(1,3,2,4),2,2) b=matrix(c(2,5,9,4),2,2) 1×9+ 2×4 =17 3×9+ 4×4 =43]= 23 = a%*%b [ 17] 26 43 1×2+ 2×5 =12 12 3×2+ 4×5 =26
  • 24. 2. Data handling(vector&matrix)  Matrix multiplication : not commutative 24 × × × = × =
  • 25. 2. Data handling(vector&matrix)    vector : multiplication 2 [24] 25 ]3 ]6 × = [6 12] 12 24 a=matrix(c(3,6),2,1) b=c(2,4) a%*%b
  • 26. 2. Data handling(vector&matrix) identity matrix • Diagonal elements are 1 • Any other elements are 0 • In multiplication with identity matrix, nothing is changed × = × = 26 diag(2)
  • 27. inverse matrix • If A is m×m matrix and if A has an inverse matrix AA=AA=I I : identity matrix -1 -1 × = × = a=matrix(c(1,3,2,4),2,2) > a [,1] [,2] [1,] 1 2 [2,] 3 4 27 > inv=solve(a) > inv [,1] [,2] [1,] -2.0 1.0 [2,] 1.5 -0.5 -1= -1 -1 2. Data handling(vector&matrix)
  • 28. transpose matrix • exchange elements of row and column 28 a=matrix(c(1,3,2,4),2,2) t(a) = T 2. Data handling(vector&matrix)
  • 29.    Least squares estimation • Vector and matrix are used in programming least squares estimation • J = 1/(2*m) * T(X*θ-Y)*(X*θ-Y):cost function (Squared error function) • m : number of sample data • X is a matrix, Y is a vector、θ is a parameter vector • T( )means transpose matrix • θ can be obtained so that J is minimized ( deference between predictions 29 and real value can be minimized) → Least squares estimation 2. Data handling(vector&matrix)
  • 30.   analysis by liner regression model “lm” > wineprice=lm(LPRICE2~WRAIN+DEGREES+HRAIN+TIME_SV, data=wine) > wineprice input variables 30 3. Liner regression with R After lm, put a variable to be predicted 、then ”~” and input variables、data=name of data file > ▼▼▼=lm(◯◯◯~△△△+■■■, data=◎◎◎) > ▼▼▼ a variable to be predicted http://www.liquidasset.com/winedata.html Data is available here
  • 31. 3. Liner regression with R    Parameters can be obtained! • Call: • lm(formula = LPRICE2 ~ WRAIN + DEGREES + HRAIN + TIME_SV, Let us compare them with formula of 31 data =wine) prediction of wine price • Coefficients: • (Intercept) WRAIN DEGREES HRAIN TIME_SV • -12.145007 0.001167 0.616365 -0.003861 0.023850
  • 32. 32 RStudio see p38 3. Liner regression with R
  • 33. 33 3. Liner regression with R □ prediction ◯ real price predict(wineprice,data.frame(wine))
  • 34.   analyze data by functions automatically • By function ‘lm’, parameters can be obtained with one line command • There are a lot of of functions in R. we can analyze data by these functions without wring functions by ourselves. • However we should understand how calculations are done in functions broadly. Blackbox approach is not recommendedただし、 • More we can understand functions, better we can select the functions for particular cases to solve • Let us be familiar with ‘lm’. Then you can understand other functions with ease 34 3. Liner regression with R
  • 35. recommender systems • amazon.com and Netflix are famous for recommendations • a variety of recommendations • Recommend the most popular product →same recommendation for everyone • Recommend the best products for the individual customer →need for personalization method ! 35
  • 36. Personalization • example of method for personalized recommendations • θ:customers’ preference(click the products or not? provide the rating or not?) • X:items features(in the case of movies:holler? romance? SF?・Who is the director, actor, actress?・When and where is it created?) • Obtain probabilities based on θX by logistic regression model • If probability is high, recommendations of the item are provided to the customer 36
  • 37. Quandl:data source 37 • Over 10M data is available for free • Data can be downloaded directly to R、 MATLAB、python https://www.quandl.com
  • 38. Website of R and RStudio • R is a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-90005107-0 URL http://www.R-project.org • I prepare short movie about how to use R. http://www.toshistats.net/introduction-to-r-language/ • RStudio is one of the best IDE for R. http://www.rstudio.com/products/rstudio/download/ 38
  • 39. Thanks for your attentions • TOSHI STATS SDN. BHD, Digital-learning center for statistical computing in Asia • CEO : Toshifumi Kuga, Certified financial services auditor • Company website : www.toshistats.net • Company FB page : www.facebook.com/toshistatsco • Company blog : http://toshistats.wordpress.com/aboutme/ • Company blog is updated on AM 10:00 every Thursday and reports the latest information about data analysis ! Please look at this blog or Company website. 39
  • 40. Disclaimer • TOSHI STATS SDN. BHD. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithm or ideas contained herein, or acting or refraining from acting as a result of such use. TOSHI STATS SDN. BHD. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on TOSHI STATS SDN. BHD. and me to correct any errors or defects in the codes and the software © 2014 TOSHI STATS SDN. BHD. All rights reserved 40