2. FINAL GROUP/INDIVIDUAL PROJECT
• Think about a relationship you’ve learned about in economics…
• CO2 Emissions and GDP growth
• International trade and growth
• Consumption and disposable income
• Economic growth and investment
• Government debt and economic growth
• Look at Our World in Data for ideas…
• Try to think about macro/international ideas. Data are much easier to find.
• There are also lots of good macro/regional data at the state level (check the U.S.
Census).
3. LITERATURE REVIEW
• Google scholar is your friend.
• Find at least 3 papers related to your topic. Summarize their sample,
data, methodology, and results
5. IN THIS TOPIC
• Types of data
• Cross-sectional
• Pooled
• Panel
• Data collection
• Data sources
6. SELECTED OBSERVATIONS ON TEST SCORES
AND OTHER VARIABLES FOR CALIFORNIA
SCHOOL DISTRICTS IN 1999
District Number District Average Test
Score (5th grade)
Student-Teacher Ratio Expenditure per Pupil
($)
Percentage of
Students Learning
English
1 690.8 17.89 6385 0.0
2 661.2 21.52 5099 4.6
3 643.6 18.70 5502 30
4 647.7 17.36 7102 0.0
5 640.8 18.67 5236 13.9
-
-
-
418 645.0 21.89 4403 24.3
419 672.2 20.20 4776 3.0
420 655.8 19.04 5993 5.0
7. SELECTED OBSERVATIONS ON THE GROWTH
RATE OF GDP IN THE U.S.: QUARTERLY
DATA, 1960Q1 – 2013Q1
Observation Number Date (year:quarter) GDP Growth Rate (% at
an annual rate)
1 1960:Q1 8.8
2 1960:Q2 -1.5
3 1960:Q3 1.0
4 1960:Q4 -4.9
5 1961:Q1 2.7
-
-
-
211 2012:Q3 2.7
212 2012:Q4 0.1
213 2013:Q1 1.1
8. SELECTED OBSERVATIONS ON CIGARETTE
SALES, PRICES, AND TAXES, BY STATE AND
YEAR FOR U.S. STATES, 1985-1995
State Year Cigarette Sales (Packs
per Capita)
Average Price per Pack
(including taxes), $
Total Taxes (Cigarette
excise tax + sales tax),
$
Alabama 1985 116.5 1.022 0.333
Arkansas 1985 128.5 1.015 0.370
Arizona 1985 104.5 1.086 0.362
-
-
West Virginia 1985 112.8 1.089 0.382
Wyoming 1985 129.4 0.935 0.240
Alabama 1986 117.2 1.080 0.334
-
-
Wyoming 1995 112.2 1.585 0.360
9. DATA COLLECTION
• Data collection means we are going to create a dataset by pooling data
from different sources into one file
• Just a reminder – each row will represent an observation, each column
will represent a different variable
• We are going to create a small cross -section data set by collecting data
from two separate sources
• World development indicators
• Economic Freedom Project
10. WORLD DEVELOPMENT INDICATORS
• Let’s collect Real GDP per capita data produced by Worldbank
• Simply google “World Bank real GDP per capita”, find what you are looking
for in the results and download the .csv file
• There are three files in the folder you have just downloaded, the largest one
contains the data. Please use excel to open it
• You only need the following three variables: country name, country code and the
data for 2016, please keep them and delete all the columns/rows you won’t need
when you upload the data to Stata.
• Let’s look at the observations, are all of them countries?
• Rename the “2016” column as RGDPPC and save the file as a .csv
• Barney… GDPPC_WB.xlsx
11. ECONOMIC FREEDOM PROJECT
• Please go to freetheworld.com, then ”dataset” then “download entire
dataset” or “download filtered dataset”
• You want summary index for every country for 2016. Please keep the
following columns: Iso code (please rename it into “country code”),
countries, and summary index (you can rename it). Make sure your data is
ready to be uploaded to Stata (i.e. remove unnecessary rows)
• Look at the list of countries, are any of them not countries?
• Please save this file as a .csv file
• We are going to be using the EFW dataset country list as the master country
list.
• Barney… EFW.xlsx
12. COMBINING THE TWO
• Please upload the World Bank data into Stata
• Save it as a Stata dataset: save GDPPC, replace
• Clear the window and upload your master data (EFW) into it
• Use the following command to merge the two datasets into one:
• merge 1:1 countrycode using GDPPC
• Stata output tells you how many observations it was able to match and how many it
was not.
• Sometimes country codes might differ slightly between one dataset and another. Let’s
see if we can find a pair for the unmatched master data ”by hand”?
• Sort your data according to its “merged” status
• sort _merge
13. TWO VARIABLES IS NOT ENOUGH
• If you wanted to find out the relationship between income and
institutions of two countries having two variables in the regression
would not be enough
• You can find more variables by looking up world development indicators
https://databank.worldbank.org/data/reports.aspx?source=world-
development-indicators
• Please pick another variable (possibly based on your knowledge from
ECON 202) to add to your dataset and regression.
14. COMBINING THE TWO CONT’D
• See if you can find matches for the unmatched observations in the
master data by using ctrl-f
• If you can, please change the country code in one of the datasets (either
GDPPC or EFW) and try merging again
• Now we just need to drop the observations that are unmatched from the
world bank and drop the _merge variable
• Done!
• We can now try to regress real income per capita on economic freedom
15. MERGING IN EXCEL BY USING VLOOKUP
• Vlookup is used for data organized vertically, if your data is organized
horizontally, please use hlookup
• Please have both datasets open in front of you, pick whichever one is
going to be your master data. In the “using” data the values by which
you are matching have to be in the first column
• In this example, master data – EFW index, using data – GDP
• Vlookup(the value you want to match by, the dataset you want to be
matched to the current dataset, the column with the return value, full
match (FALSE) or partial match (TRUE).
• Make sure your new dataset looks nice and clean.
16. RESHAPE
• The data might be organized in a long or a wide format
• The data you download from world bank is in a wide format
• To run regressions you need data to be in a long format/you also need
to perform the reshape function for DEP 1.
• To convert it to long format you can use Stata function “reshape”
• Let’s work on this part of DEP 1 together
17. RESHAPE THE TEMPERATURE
ANOMALIES DATA
• Download the CO2 data and the temperature anomalies data (doing
economics project website)
• Currently the temperature anomalies data is in a wide format
• We are going to use January and February temps, please pick two different
months for your own submission
• In Stata: keep year jan feb
• save jannfebtemps, replace
• reshape
• When you type in the last command it teaches you how to use the reshape
command
18. RESHAPE CONT-D
• To be able to use the reshape command our variables that are being
reshaped need to be re-named (i.e. a1 a2)
• Stata: rename jan anomaly1
• rename feb anomaly2
• reshape long anomaly, i(year) j(month)
• save jannfebtemps, replace
• clear
19. MERGE THE TWO DATASETS
• Upload the master data (CO2)
• Make sure the variables by which you are merging have the same names
across the two datasets
• Merge as before, but list all of the variables by which you are merging
(both year and month)
• Keep the matched observations
20. OTHER DATA SOURCES
• Some datasets: polity IV http://www.systemicpeace.org/inscrdata.html
• World bank data
• IMF data
• Federal reserve data
• World values survey http://www.worldvaluessurvey.org/wvs.jsp
• Census
• Book of the states http://knowledgecenter.csg.org/kc/category/content-type/content-
type/book-states
• National Center for Education Statistics https://nces.ed.gov
• Wealth of education data by state
21. OTHER DATA SOURCES CONT’D
• Political ideology data https://voteview.com/data
• Economic freedom of the world index https://www.fraserinstitute.org/economic-
freedom/dataset?geozone=world&year=2015&page=dataset&min-year=2&max-year=0&filter=0
• Economic freedom of North America https://www.fraserinstitute.org/economic-freedom/dataset?geozone=na&year=2015&page=dataset&min-
year=2&max-year=0&filter=0&selectedCountry=USA
• Pennworld tables https://www.rug.nl/ggdc/productivity/pwt/
• Globalization index https://www.kof.ethz.ch/en/forecasts-and-indicators/indicators/kof-globalisation-index.html
• CIRI Human Rights Data http://www.humanrightsdata.com/p/data-documentation.html
• Some Econ journals, i.e. American Economic Review
• https://www.aeaweb.org/journals/aer
• Some good Economists
• i.e. Daron Acemoglu https://economics.mit.edu/faculty/acemoglu
• Center for disease control
• https://data.cdc.gov/
Notes de l'éditeur
The remaining rows present data for other districts. The order of the rows is arbitrary, and the number of the district, which is called the observation number, is an arbitrarily assigned number that organizes the data. As you can see in the table, all the variables listed vary considerably.
The data in each row correspond to a different time period (year and quarter). In the first quarter of 1960, for example, GDP grew 8.8% at an annual rate. In other words, if GDP had continued growing for four quarters at its rate during the first quarter of 1960, the level of GDP would have increased by 8.8%.
The number of entities in a panel data set is denoted n, and the number of time periods is denoted T. In the cigarette data set, we have observations on n = 48 continental U.S. states (entities) for T = 11 years (time periods) from 1985 to 1995. Thus there is a total of n × T = 48 × 11 = 528 observations.