SlideShare une entreprise Scribd logo
1  sur  5
Télécharger pour lire hors ligne
CHAPTER 1: PREPARING DATA


1. Types of Data


Three kinds of datasets are distinguished:


    Cross-Sections (Unit of observation varies, time of observation fixed)


       Observations         Year        Growth   Savings   Middle East
       Afghanistan           2005          -5       0.3          0
               :             2005           2       0.2          0
       Lebanon               2005           1       0.1          1
               :             2005           2       0.4          0
       Zimbabwe              2005          -1       0.2          0


    Time series (Unit of observation fixed, time of observation varies)


       Observations           Year          Growth          Savings
       Lebanon                 2001           3.1              0.3
       Lebanon                 2002           2.8              0.2
       Lebanon                 2003           1.4              0.2
       Lebanon                 2004           2.1              0.3
       Lebanon                 2005           2.5              0.3




                                                                           1
 Panels (Unit of observations varies, time of observations varies)


       Observations         Year         Growth      Savings   Middle East
       Afghanistan           2000           -5          0.3          0
               :               :             2          0.2          0
       Afghanistan           2005            1          0.1          0
               :                             2          0.4          0
       Lebanon               2000           -1          0.2          1
               :               :             1          0.4          1
       Lebanon               2005            1          0.3          1
               :               :             3          0.2          0
       Zimbabwe              2000            2          0.1          0
               :               :            -1         0.05          0
       Zimbabwe              2005           -5         0.01          0


2. Preparing Datasets – Practical Tips


    In practice, you always want to keep a dataset as an Excel file, which
      you copy into a statistical analysis program.
    In preparing your datasets, you often retrieve data in table form and
      you need to stack your observations.
    For example, what you download looks like this


                     2006     2007       2008
       Afghanistan    a        b           c
            :          :        :          :
         Lebanon      d        e           f
            :          :        :          :
        Zimbabwe      g        h           i




                                                                              2
but you need it like this


      Country   Year   Observation
    Afghanistan   2006     a
    Afghanistan   2007     b
    Afghanistan   2008     c
         :      :          :
      Lebanon     2006     d
      Lebanon     2007     e
      Lebanon     2008     f
         :      :          :
     Zimbabwe     2006     g
     Zimbabwe     2007     h
     Zimbabwe     2008      i


   which means that you need to transpose your data first and then stack
   it.
 Dataset building needs a little bit of practice.
 The following macro does the stacking for you in Excel.




                                                                           3
 “Stacking Macro” for Excel (Example: 45 rows, 220 columns)


  Sub SORTY()
  '
  ' Macro1 Macro
  ' Macro recorded 3/19/2004 by mm53
  '
  ' Keyboard Shortcut: Ctrl+d
  '
  For i = 1 To 220
  Range("B1:B45").Select
  Selection.Copy
  Dim x As Object
  Set x = ActiveCell
  x.Offset(45 * i, -1).Select
  ActiveSheet.Paste
  Application.CutCopyMode = False
  Columns("B:B").Select
  Selection.Delete Shift:=x1ToLeft
  Range("B1").Select
  Next i
  End Sub




                                                               4
 A note on different software programs.


   In preparing datasets for use in “gretl”, make sure that all columns
   except the first contain numbers. Other programs, such as NCSS, are
   more tolerant in this regard and will also read columns with strings.
   This means that “gretl” requires for each regional dummy a separate
   column while NCSS could read different regions from one column.




                                                                           5

Contenu connexe

En vedette

En vedette (15)

Credentialing forum mark lewis 2014
Credentialing forum mark lewis 2014Credentialing forum mark lewis 2014
Credentialing forum mark lewis 2014
 
Presentation_Access_FR
Presentation_Access_FRPresentation_Access_FR
Presentation_Access_FR
 
KeyVox
KeyVoxKeyVox
KeyVox
 
Finger art
Finger artFinger art
Finger art
 
Prise-de-parole-en-public-
Prise-de-parole-en-public-Prise-de-parole-en-public-
Prise-de-parole-en-public-
 
Leveraging LinkedIn to Build Your Personal Brand
Leveraging LinkedIn to Build Your Personal BrandLeveraging LinkedIn to Build Your Personal Brand
Leveraging LinkedIn to Build Your Personal Brand
 
Frases de Motivación
Frases de MotivaciónFrases de Motivación
Frases de Motivación
 
Extreme sports
Extreme sportsExtreme sports
Extreme sports
 
Color
ColorColor
Color
 
Halloween contest
Halloween contestHalloween contest
Halloween contest
 
Loodusõpetus
LoodusõpetusLoodusõpetus
Loodusõpetus
 
Leituras recomendadas - 6º ano - PNL
Leituras recomendadas - 6º ano - PNLLeituras recomendadas - 6º ano - PNL
Leituras recomendadas - 6º ano - PNL
 
Work With Us
Work With UsWork With Us
Work With Us
 
How to facilitate group-based learning - Geddes Language Center Workshop
How to facilitate group-based learning - Geddes Language Center WorkshopHow to facilitate group-based learning - Geddes Language Center Workshop
How to facilitate group-based learning - Geddes Language Center Workshop
 
certif
certifcertif
certif
 

\\Win2k Aub Edu Lb\Files\Homet\Tfc01\My Documents\L01 Preparing Data

  • 1. CHAPTER 1: PREPARING DATA 1. Types of Data Three kinds of datasets are distinguished:  Cross-Sections (Unit of observation varies, time of observation fixed) Observations  Year Growth Savings Middle East Afghanistan 2005 -5 0.3 0 : 2005 2 0.2 0 Lebanon 2005 1 0.1 1 : 2005 2 0.4 0 Zimbabwe 2005 -1 0.2 0  Time series (Unit of observation fixed, time of observation varies) Observations  Year Growth Savings Lebanon 2001 3.1 0.3 Lebanon 2002 2.8 0.2 Lebanon 2003 1.4 0.2 Lebanon 2004 2.1 0.3 Lebanon 2005 2.5 0.3 1
  • 2.  Panels (Unit of observations varies, time of observations varies) Observations  Year Growth Savings Middle East Afghanistan 2000 -5 0.3 0 : : 2 0.2 0 Afghanistan 2005 1 0.1 0 : 2 0.4 0 Lebanon 2000 -1 0.2 1 : : 1 0.4 1 Lebanon 2005 1 0.3 1 : : 3 0.2 0 Zimbabwe 2000 2 0.1 0 : : -1 0.05 0 Zimbabwe 2005 -5 0.01 0 2. Preparing Datasets – Practical Tips  In practice, you always want to keep a dataset as an Excel file, which you copy into a statistical analysis program.  In preparing your datasets, you often retrieve data in table form and you need to stack your observations.  For example, what you download looks like this 2006 2007 2008 Afghanistan a b c : : : : Lebanon d e f : : : : Zimbabwe g h i 2
  • 3. but you need it like this Country Year Observation Afghanistan 2006 a Afghanistan 2007 b Afghanistan 2008 c : : : Lebanon 2006 d Lebanon 2007 e Lebanon 2008 f : : : Zimbabwe 2006 g Zimbabwe 2007 h Zimbabwe 2008 i which means that you need to transpose your data first and then stack it.  Dataset building needs a little bit of practice.  The following macro does the stacking for you in Excel. 3
  • 4.  “Stacking Macro” for Excel (Example: 45 rows, 220 columns) Sub SORTY() ' ' Macro1 Macro ' Macro recorded 3/19/2004 by mm53 ' ' Keyboard Shortcut: Ctrl+d ' For i = 1 To 220 Range("B1:B45").Select Selection.Copy Dim x As Object Set x = ActiveCell x.Offset(45 * i, -1).Select ActiveSheet.Paste Application.CutCopyMode = False Columns("B:B").Select Selection.Delete Shift:=x1ToLeft Range("B1").Select Next i End Sub 4
  • 5.  A note on different software programs. In preparing datasets for use in “gretl”, make sure that all columns except the first contain numbers. Other programs, such as NCSS, are more tolerant in this regard and will also read columns with strings. This means that “gretl” requires for each regional dummy a separate column while NCSS could read different regions from one column. 5