This document provides an introduction and overview of key steps and methods for quantitative analysis in Eviews, including:
1) Data screening and cleaning to manage missing values and outliers, which is crucial for avoiding dubious results.
2) Three types of missing values - Missing Completely at Random, Missing at Random, and Missing Not at Random - and their characteristics.
3) Identification and treatment of outliers, including various tests and ways to remove outliers like winsorization and trimming.
4) Diagnostic testing methods for issues like multicollinearity, heteroscedasticity, autocorrelation, and normality assumptions in linear regression models.
5) Unit root tests, cointegration tests
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
Workshop on Introduction to eviews
1. Introduction to Eviews
Vignes Gopal Krishna
Fast track PhD student & SLAI fellow
Faculty of Economics and Administration
University of Malaya
Email Address: vignesgopal@yahoo.com/vignesgopal@um.edu.my
2. Steps for Quantitative Analysis
Missing values
Data Screening/Cleaning
Outliers
Standard errors/Standard
deviation
Data reliability & validity
Logical sequence of
numerical presentations
Conditions
Data Analysis
Sources & Measurement
of Data
Linear regression, Granger
causality, Cointegration
3. Data Cleaning/Screening
Deals with the management of missing values
and outliers.
Crucial element and it will be very helpful in
avoiding the dubiousness of results
Useful in monitoring the trends of numerical
presentations
4. Missing values
• Common occurrence in research
• Significant impact on the results except for
some tests such as survival analysis, impact
analysis and etc
• Types of missing values
a) Missing Completely at Random(MCAR)
b) Missing at Random(MAR)
c) Missing not at Random(MNAR)
5. Types of Missing Values
Missing Completely at Random(MCAR)
*Missing values of Y – do not depend on X & Y
* Ex: Selection of survey questions
Missing at Random(MAR)
*Missing values of Y –depend on X, but, not on Y
*Ex: Income reporting is quite weak among
respondents in service industry.
6. Missing not at Random(MNAR)
• Pr(Y,…)=f(Y,…)
• Example: Respondents with high income are
less likely to deal with income reporting
Methods:a) Heckman selection Model
b) Patterns of missing values
7. Outliers
•
Inconsistent with existing range of data points
•
Deals with lower and higher levels of outliers
•
Positively related to error terms/residuals
•
Inclusion and exclusion of outliers
•
Main methods that can be used to identify outliers
a) Chauvenet’s criterion
b) Grubbs test for outliers
c)
Peirce’s criterion
d) Box-Plot
e) Extreme Values
Ways to remove outliers
(a) By reducing the effects of autocorrelation
(b) Winsorizing
(c) Robustness of the standard errors –Minimization of standard errors
(d) Normalization
(e) Trimming
8. Normality
*requirement for parametric analysis
*Most of the tests require all the variables to be normally distributed in order
to ensure the normal distribution of error terms
• + (skewed to right), -(Skewed to left) – skewness = 0
Kurtosis should be approximately 3(JB test)
• Available normality tests
a) Jarque Bera test – Skewness & Kurtosis
b) Shapiro Wilk test
c) Shapiro Francia test
d) Zero skewness Log transform
e) Box Cox Transform
Data transformations
Histogram with normal curve, Quantile-Quantile (QQ-plot)/Qnorm, and etc
9. Linear regression
• Associations between DV and IV
• DV-continuous/interval/scale/ratio variable
• IV-continuous/interval/scale/ratio/categorical variables
• Assumptions:a)Linear parameters
b)No endogeneity problem
c) No multicollinearity
d) Homoscedasticity (No heteroscedasticity)
e) Number of variables>number of observations
f) Error terms must be normally distributed.
10. Diagnostic Testing
Multicollinearity(Deals with multivariate Analysis)
• Correlations between independent variables
• High R square, large covariances and correlations, more
insignificant t-ratios
• Variance Inflation Factor(VIF), Tolerance Value(TL),
Auxillary regressions – Graphical Method
• Ways to reduce the effects –
a)Drop variables that have high correlations
b)Data transformation
and etc
11. Heteroscedasticity
• No homoscedasticity (Unequal spread of
variances)
a) Error learning models
b) Outliers
c) Techniques of data collections
Common way – Remove
outliers(Winsorizing/Trimming), GLS, Park
Test, Glejser Test, Goldfeld –Quandt Test, White
Test, graphical method and etc
12. Autocorrelation (Correlation between residuals)
• Predicted error terms will underestimate the population
error terms
• R square will be overestimated
• Misleading results – F and t tests are not valid
• Methods:a) Graphical Method
b) Runs test
c) Durbin Watson d Test
d) Breusch Godfrey Test
e) Corrected version of GLS
f) Newey West Method
13. Unit root test
• Stationary & Non-stationary
• Intercept, Trend, Intercept + Trend
General Hypothesis:
Null Hypothesis: A variable has an unit root
Alternative Hypothesis: A variable has no unit root
(Applicable for Augmented Dickey Fuller(ADF), Dickey FullerGLS(DF-GLS), Phillips-Perron(PP),
ERS point Optimal and Ng-Perron)
The reversed hypotheses can be observed in the case of KPSS.
It is a requirement for cointegration tests (e.g. Johansen
Juselius cointegration test)
I(1) at the level form, I(0) at the first different form
14. Logit/Probit regression
• Type of probabilistic model
• Used when DV=Binary/Categorical variable
• Error terms should not be normally distributed
(Note: The error terms should be normally distributed for
Probit regression)
Pr(0,1) = f(X1,X2,X3…)
Methods to identify the goodness of fitness
(a) Likelihood ratio tests
(b) Pseudo R2
(c) Hosmer-Lemeshow test
(d) Binary classification
15. Cointegration
• Two or more variables are said to be cointegrated
if the two shares same portion of stochastic
trends/drifts.
• In general, the variables have to be I(1) at the
level form before getting to cointegration
Types of cointegration test
(a) Engle-Granger 2 step method
(b) Johansen Juselius test
(c) Phillips–Ouliaris cointegration test
(d) Autoregressive Distributed Lag(ARDL)