Accounting serx

PhD Talk:
Regression analysis using Stata: A hands on approach
By:
Dr. Redhwan Al-Dhamari
Bakr Ali Al-Gamrh

Presentation Outline
 introduction
 Data Structure
 Cross-Sectional Data
 Regression Diagnostics
 Other Regression Commands
 Presenting Your Results
 Suggested readings

Introduction
What is Stata?
Stata is a general-purpose statistical software package created in 1985
by Stata Corp. Most of its users work in research, especially in the
fields of economics, sociology, political science, biomedicine and
epidemiology.
Why Stata?
 Stata has been less popular than its market competitors, such as
SPSS and SAS, but it gaining in popularity every year.
 It is particularly user-friendly when it comes to analyzing
complicated data sets.

Introduction (Cont’d)
How is Stata different?
 The commands in Stata are much more intuitive and less fussy
regarding punctuation.
 In Stata, it is possible to download new applications that were
written by users to perform specific tasks, and use them as
commands.
 Dealing with longitudinal data sets with various different types of file
structures in Stata is much quicker and easier.

Introduction (Cont’d)
Windows in Stata and what they do
 the command window
The review window
The variables window
The results window
Do files
Log files
 data editor and data browser
•Set mem 50m
log using name.log,replace
Log close
Log off
Log on

Data structure
 Cross-sectional data
 Panel data
 Time series data

Cross-sectional data
Summary statistics
 Sum var
 Sum var, sep(0)
 Sum var, detail
 Tabstat var, s(n me sd min max ske kur) c(s)
 Tabstat var, s(n me sd min min max) by (var)

Correlations
Pearson’s product-moment correlation (r)
 It focuses on mean values
 it is used for interval variables
 Values below 0.30 suggest there is little association between the variables
(Hinkle et al. 1988).
 pwcorr var, obs sig star (0.05)
Spearman’s correlation (rho)
 it calculated based on ranks
 it used for ordinal variables
 spearman var, stats (rho obs p) star (0.05)
(Cont’d)

Differences in Means and medians
Independent two-sample t-test
 it helps to know if there are mean differences in data that might be interesting to pursue
with multivariate analysis
 there can not be more than two groups on witch you are comparing the mean value-the
grouping variable must be dichotomous.
 ttest var, by (grouping var)
 sdtest var, by (grouping var)
Mann-Whitney U-test
 it is used to examine the rank differences across some characteristic for two groups.
 ranksum var, by (grouping var)
Paired t-test and Wilcoxon signed rank test
 ttest ind07==ind08
 signrank ind07==ind08
(Cont’d)

 Theory of regression analysis
What is linear regression analysis?
• Finding the relationship between a dependent and
an independent variable.
Y= α + bx + e
(Cont’d)

Regression diagnostics
 Normality
 Heteroscedasticity
 Multicollinearity
 Model specification

Regression diagnostics (Cont’d).
Normality refers to normal distribution of the error terms
Testing the residuals for normality
Shapiro-Wilk W test
 Swilk res
Smirnov-Kolmogorov test
Sktest res
Testing the normality for a variable
Sktest var
Tabstat var, s(sk kur)

Outliers detection
Outlier detection involves the determination whether the residuals
(errors=predicted-actual) is an extreme negative or positive value.
Standardized residuals
 predict residstd, rstandard
 List residstd
 if the standardized residuals have values in excess of 3.5 and -3.5 they are
outliers.
Cook’s D
 Predict cook, cooksd
 List cook. If cook > 4/n
Winsorization
 Winsor2 (var), replace cuts (1 0.99)

Heteroskedasticity
Refers to a situation in which the error terms of the model have no
constant variances. This problem should be addressed as sometimes can
make significant variables appear to be statistically insignificant.
Testing the residuals for heteroskedasticity
 hettest
Solving heteroskedasticity problem
 reg var, robust

Multicollinearity
Refers to a high correlation of two or more independent variables in a
regression model. This problem may affect the regression estimates.
Testing for multicollinearity
 vif
Solving multicollinearity problem
 Centering or standardizing approach

Model specification
refer to including all relevant and excluding all irrelevant variables.
Testing for model specification
 ovtest
 Linktest

Other regression commands
 Logistic Regression
 logistic var
 Probit Regression is the other main method for analysing binary
dependent variables. Whereas logit (or logistic) regression is based on log
odds, probit uses the cumulative normal probability distribution.
 probit var
 Poisson Regression is for a count (non-negative integers)
dependent variable
 poission var

Presenting your results
For descriptive and correlation results
 Edit copy table
 Open a blank word document and press paste
 Table convert text to table
For regression results
 esttab
 esttab, se ar2

• The difference between cross-sectional, time
series and panel data
• Why panel?
• More observations mean more information
• Certain structure of the data allow better use
of the data

• Data need to be set as panel in Stata (time
and individual dimensions)
• Summary statistics for panel, xtsum, xtdes …
• Fixed effects
• Random effects models
• Pooled OLS

• Hausman test
• Breusch and Pagan Lagrangian Multiplier (LM)
test
• Modified Wald test for groupwise heteroskedasticity
• Wooldridge test for autocorrelation in panel data
• Pesaran's test of cross sectional dependence

Suggested readings
 Gujarati & Porter (2010) “Essentials of econometrics”, McGraw-Hill,
New York.
 Cameron & Trivedi (2009) “ Microeconometrics Using Stata”, A Stata
Press Publication, Stata LP, College Station, Texas, USA.
 Pevaline & Robson (2009) “the Stata Survival Manual”, Two Penn
Plaza, New York, USA.
 Woorldridge (2003) “ Introductory econometrics: A modern approach
(2nd Ed.), Thomsom South-Western, USA.

Accounting serx

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (19)

Similaire à Accounting serx

Similaire à Accounting serx (20)

Dernier

Dernier (20)

Accounting serx