SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Introduction to the Stata Language
Mark Lunt
Centre for Epidemiology Versus Arthritis
University of Manchester
04/10/2022
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Topics Covered Today
Getting help
Stata Windows
Basic Concepts
Manipulation of variables
Manipulation of datasets
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Introduction to R for Stata Users
I prefer Stata for simple basic data analysis
Learning 2 languages at once would be confusing for most people
I suggest using Stata until you need to change
I’m writing “R for Stata users” which directly converts this course to R
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command-line vs. Point-and-Click
Command-line requires more initial learning than point-and-click
Commands must be entered exactly correctly
Only option for any serious work
1 Reproducible
2 Editable
3 More efficient
Some commands can be written more efficiently via point-and-click
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Getting Help
Help
Manuals
Search
Stata website
Statalist
Stata Journal
Me
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Stata Windows
2 must exist:
Results
Command
2 others usually exist
Review
Variables
Others can exist (data editor, graph, do-file editor, help/log viewer)
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Command Window: Syntax
command [varlist] [,options]
Roman letters: entered exactly
Italic letters: replaced by some text you enter
Square brackets: that item is optional
Example above means means:
Command is called “command”
Command name may be followed by a list of variables
Options may follow a comma
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Command Window
Can navigate through previous commands with PageUp and PageDown.
Pressing tab key will complete a variable name as far as possible
Case-sensitive: height and HEIGHT are different variables
Syntax must be exact (although abbreviations are possible)
Only one comma, before all options
Space before opening parenthesis was most common error, now accepted
(since Stata 12). (e.g. level(5), not level (5)).
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Variables window
List of all variables in current dataset
Clicking adds variable name to command window
May contain label if one has been defined
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Review Window
List of commands entered this session
Clicking on a command puts it in command window
Double-clicking runs the command
Can be saved as a script, called a “do-file”
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Command Window
Variables Window
Review Window
Results Window
Results Window
Limited size: use a log file to preserve results
Blue = clickable link
Scrolling controlled by Return, Space and q keys.
set more [on | off]
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Basic Concepts
Do-files
Log files
Interaction with Operating System
Macros
Variable and number lists
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Do-Files
List of commands
Can be run from stata with the command do "do-file.do"
All data manipulation and analysis should be done using a do-file.
Perfectly reproducible
Can see exactly what was done
Easy to modify
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Projects
A way to keep all files used in analysis easily accessible
Can contain do-files and datasets
Example
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Profile.do
Stata looks for a file called profile.do every time it starts.
If it finds it, it runs it
Useful for
Setting memory
User-defined menus
Logging commands
See help profilew for details
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Log Files
Results window of limited size: must log results
Can use plain text or SMCL (stata markup and control language)
Top of do file should be:
capture log close
log using myfile.log, [append]|[replace] ([text]|[smcl])
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Interaction with Operating System
cd Change directory
pwd Display current directory
mkdir Create directory
dir List files in current directory
shell Run another program
Can use either "/" or "" in directory names.
Safer to use "/"
Path names containing spaces must be surrounded by inverted commas.
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Macros
Macro name is replaced by definition text when command is run.
Very useful for making do-files portable
Directories used are defined first using macros
Change in location of data or do-files only means changing macro definitions
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Macro Example
Definition: global mymac C:/Project/Data
Use:
use "$mymac/data"
Loads the file C:/Project/Data/data
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Local vs. Global
Global macro retains definition until end of session
Local macro loses definition at end of do-file
Definition Use
Global global mymac defn $mymac
Local local mymac defn ‘mymac’
Local vs Global macros
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Variable Lists
Shorthand for referring to a lot of variables
prefix* means all variables beginning with prefix
firstvar-lastvar means all variables in the dataset from firstvar to
lastvar inclusive.
Type help varlist for more details
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Do-Files
Log Files
Interaction with Operating System
Macros
Lists
Number Lists
Symbol Meaning Example Expansion
list of numbers 1 2 3 1 2 3
x/y whole numbers from x to y inclusive 1/5 1 2 3 4 5
x y to z numbers from x to z, increasing by y − x 5 10 to 20 5 10 15 20
x y : z same as x y to z 5 10:20 5 10 15 20
x(y)z numbers from x to z, increasing by y 10(10)50 10 20 30 40 50
x[y]z same as x(y)z 10[10]50 10 20 30 40 50
Number Lists
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Manipulating Variables
generate & replace
egen
Labelling
Selecting variables
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
generate
Used to create a new variable
Syntax: generate [type] newvar = expression
newvar must not already exist
type, if present, defines the type of the data
expression defines the values: e.g.
generate ltitre = log(titre)
generate str6 head = substr(name, 1, 6)
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Variable Types
type size (bytes) min max precision missing
byte 1 -127 126 integers .
int 2 -32,767 32,766 integers .
long 4 -2,147,483,647 2,147,483,646 integers .
∗float 4 −1036 1036 7 digits .
double 8 −10308 10308 15 digits .
strn n ""
strL varies ""
Available data types
∗float is the default type.
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Missing Values
Numerical variables can have several different missing values:
., .a, .b, etc
May be useful if you know why a variable is missing
if variable != . may not catch all missing values
All missing values are greater than any number representable by that
datatype.
Can exclude all missing values with
if variable < .
gen old = age > 65 if age < .
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
replace
Similar to generate
Cannot change type
newvar must already exist
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
egen
Extended GENerate
Has more functions available
User can write their own egen functions
No ereplace: must drop the existing variable and create a new one
Examples of its use in the practical
See help egen for details
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Labelling
Need to label variables themselves
show exactly what the variable measures
Need to label values of a variable
Only for categorical variables
First define a label
Then assign it to a variable
Easier to assign same label to a number of variables
Can label different missing values
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Labelling a variable
Syntax: label variable varname "Description"
Example: label variable height "Height in m."
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Labelling values
Syntax: label define labelname 1 "string1" . . .
label values varname labelname
Example: label define yesno 0 "No" 1 "Yes"
label values question1 yesno
label values question2 yesno
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Selecting variables
drop varlist
keep varlist
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Creation & Modification
Labelling
Selecting variables
Formatting Variables
Adding a format to a variable changes how it is presented, not how it is stored
Most useful for dates:
Stored as days since 1/1/1960
Can be formatted in human readable form
Date format: “%d” followed by string
E.g. “%dD/N/CY” gives 01/01/1960
Type “help format” for details
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Manipulating Datasets
use & save
append
merge
browse and edit
preserve and restore
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
use
use "filename" reads a file into stata
If there is already a file in stata, need use "filename", clear
Always use inverted commas
Easier to use the menu or button-bar
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
save
save "filename" saves the current dataset as "filename"
If "filename" already exists, need save "filename", replace
Option saveold allows saving in format of a previous version of stata
If you do not include a directory in filename, stata will try to save it in the
current directory
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Combining Datasets
append
more subjects, same variables
append using filename
merge
same subjects, more variables
merge 1:1 identifier using filename
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Appending Data: Example
ID common_1 common_2 file1_1 file1_2
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
Appending Data: File 1
ID common_1 common_2 file2_1 file2_2
4 a4 b4 e4 f4
5 a5 b5 e5 f5
6 a6 b6 e6 f6
Appending Data: File 2
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Appending Data: Example
ID common_1 common_2 file1_1 file1_2 file2_1 file2_2
1 a1 b1 c1 d1 . .
2 a2 b2 c2 d2 . .
3 a3 b3 c3 d3 . .
4 a4 b4 . . e4 f4
5 a5 b5 . . e5 f5
6 a6 b6 . . e6 f6
Appending Data: Combined Files
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Merging Data
Need an identifier (one or more variables on which to match observations)
Both files must be sorted by this identifier
All observations from both files are used
Variable _merge says whether observation was in first file, second file or both.
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Merging Files: example
idno var1 var2
1 a1 b1
2 a2 b2
3 a3 b3
Merging Data: File 1
idno var3 var4
1 c1 d1
3 c3 d3
4 c4 d4
Merging Data: File 2
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Merging Files: example
idno var1 var2 var3 var4 _merge
1 a1 b1 c1 d1 3
2 a2 b2 . . 1
3 a3 b3 c3 d3 3
4 . . c4 d4 2
Merging Data: Combined Files
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
Ensuring Uniqueness
Usually, should only be one observation per unique identifier
May not be the case (e.g. adding family-level data to individual-level data)
If there should be one observation per identifier in both datasets, use the
command merge 1:1
If each record in current dataset corresponds to several in the merged
dataset, use merge 1:m
Equally, there are merge m:1 and merge 1:m commands
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
browse & edit
Can open a data editor window with browse
Can choose variables to browse with browse varlist
Cannot modify data while browsing
edit allows data to be changed: don’t use it
Introduction
Getting Help
Stata Windows
Basic Concepts
Manipulating Variables
Manipulating Datasets
Basics
Appending Datasets
Merging Datasets
Other dataset commands
preserve & restore
You may wish to change your data temporarily
E.g. collapse to means by group
Type preserve before changing data, restore after

Contenu connexe

Similaire à slides.pdf

SQL Server - Introduction to TSQL
SQL Server - Introduction to TSQLSQL Server - Introduction to TSQL
SQL Server - Introduction to TSQL
Peter Gfader
 
Data Access Tech Ed India
Data Access   Tech Ed IndiaData Access   Tech Ed India
Data Access Tech Ed India
rsnarayanan
 
Sas Enterprise Guide A Revolutionary Tool
Sas Enterprise Guide A Revolutionary ToolSas Enterprise Guide A Revolutionary Tool
Sas Enterprise Guide A Revolutionary Tool
sysseminar
 
Configuration Management
Configuration ManagementConfiguration Management
Configuration Management
elliando dias
 

Similaire à slides.pdf (20)

Stata tutorial
Stata tutorialStata tutorial
Stata tutorial
 
INTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptxINTRODUCTION TO STATA.pptx
INTRODUCTION TO STATA.pptx
 
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
SAS Online Training by Real Time Working Professionals in USA,UK,India,Middle...
 
Computing assignment 02 ms access (bilal maqbool 10) se-i
Computing assignment 02   ms access (bilal maqbool 10)          se-iComputing assignment 02   ms access (bilal maqbool 10)          se-i
Computing assignment 02 ms access (bilal maqbool 10) se-i
 
Bank mangement system
Bank mangement systemBank mangement system
Bank mangement system
 
QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7QTP Automation Testing Tutorial 7
QTP Automation Testing Tutorial 7
 
SQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET FundamentalsSQL Saturday 28 - .NET Fundamentals
SQL Saturday 28 - .NET Fundamentals
 
MIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome MeasuresMIS5101 WK10 Outcome Measures
MIS5101 WK10 Outcome Measures
 
A Lap Around Visual Studio 2010
A Lap Around Visual Studio 2010A Lap Around Visual Studio 2010
A Lap Around Visual Studio 2010
 
Vb essentials
Vb essentialsVb essentials
Vb essentials
 
Experience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data PlatformExperience SQL Server 2017: The Modern Data Platform
Experience SQL Server 2017: The Modern Data Platform
 
SQL Server - Introduction to TSQL
SQL Server - Introduction to TSQLSQL Server - Introduction to TSQL
SQL Server - Introduction to TSQL
 
Data Access Tech Ed India
Data Access   Tech Ed IndiaData Access   Tech Ed India
Data Access Tech Ed India
 
Sas Enterprise Guide A Revolutionary Tool
Sas Enterprise Guide A Revolutionary ToolSas Enterprise Guide A Revolutionary Tool
Sas Enterprise Guide A Revolutionary Tool
 
Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000Saying goodbye to SQL Server 2000
Saying goodbye to SQL Server 2000
 
Basics Of SAS Programming Language
Basics Of SAS Programming LanguageBasics Of SAS Programming Language
Basics Of SAS Programming Language
 
7. SQL.pptx
7. SQL.pptx7. SQL.pptx
7. SQL.pptx
 
About "Apache Cassandra"
About "Apache Cassandra"About "Apache Cassandra"
About "Apache Cassandra"
 
Configuration Management
Configuration ManagementConfiguration Management
Configuration Management
 
Project seminar
Project seminarProject seminar
Project seminar
 

Dernier

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Dernier (20)

Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 

slides.pdf

  • 1. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Introduction to the Stata Language Mark Lunt Centre for Epidemiology Versus Arthritis University of Manchester 04/10/2022
  • 2. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Topics Covered Today Getting help Stata Windows Basic Concepts Manipulation of variables Manipulation of datasets
  • 3. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Introduction to R for Stata Users I prefer Stata for simple basic data analysis Learning 2 languages at once would be confusing for most people I suggest using Stata until you need to change I’m writing “R for Stata users” which directly converts this course to R
  • 4. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command-line vs. Point-and-Click Command-line requires more initial learning than point-and-click Commands must be entered exactly correctly Only option for any serious work 1 Reproducible 2 Editable 3 More efficient Some commands can be written more efficiently via point-and-click
  • 5. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Getting Help Help Manuals Search Stata website Statalist Stata Journal Me
  • 6. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Stata Windows 2 must exist: Results Command 2 others usually exist Review Variables Others can exist (data editor, graph, do-file editor, help/log viewer)
  • 7. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Command Window: Syntax command [varlist] [,options] Roman letters: entered exactly Italic letters: replaced by some text you enter Square brackets: that item is optional Example above means means: Command is called “command” Command name may be followed by a list of variables Options may follow a comma
  • 8. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Command Window Can navigate through previous commands with PageUp and PageDown. Pressing tab key will complete a variable name as far as possible Case-sensitive: height and HEIGHT are different variables Syntax must be exact (although abbreviations are possible) Only one comma, before all options Space before opening parenthesis was most common error, now accepted (since Stata 12). (e.g. level(5), not level (5)).
  • 9. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Variables window List of all variables in current dataset Clicking adds variable name to command window May contain label if one has been defined
  • 10. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Review Window List of commands entered this session Clicking on a command puts it in command window Double-clicking runs the command Can be saved as a script, called a “do-file”
  • 11. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Command Window Variables Window Review Window Results Window Results Window Limited size: use a log file to preserve results Blue = clickable link Scrolling controlled by Return, Space and q keys. set more [on | off]
  • 12. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Basic Concepts Do-files Log files Interaction with Operating System Macros Variable and number lists
  • 13. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Do-Files List of commands Can be run from stata with the command do "do-file.do" All data manipulation and analysis should be done using a do-file. Perfectly reproducible Can see exactly what was done Easy to modify
  • 14. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Projects A way to keep all files used in analysis easily accessible Can contain do-files and datasets Example
  • 15. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Profile.do Stata looks for a file called profile.do every time it starts. If it finds it, it runs it Useful for Setting memory User-defined menus Logging commands See help profilew for details
  • 16. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Log Files Results window of limited size: must log results Can use plain text or SMCL (stata markup and control language) Top of do file should be: capture log close log using myfile.log, [append]|[replace] ([text]|[smcl])
  • 17. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Interaction with Operating System cd Change directory pwd Display current directory mkdir Create directory dir List files in current directory shell Run another program Can use either "/" or "" in directory names. Safer to use "/" Path names containing spaces must be surrounded by inverted commas.
  • 18. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Macros Macro name is replaced by definition text when command is run. Very useful for making do-files portable Directories used are defined first using macros Change in location of data or do-files only means changing macro definitions
  • 19. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Macro Example Definition: global mymac C:/Project/Data Use: use "$mymac/data" Loads the file C:/Project/Data/data
  • 20. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Local vs. Global Global macro retains definition until end of session Local macro loses definition at end of do-file Definition Use Global global mymac defn $mymac Local local mymac defn ‘mymac’ Local vs Global macros
  • 21. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Variable Lists Shorthand for referring to a lot of variables prefix* means all variables beginning with prefix firstvar-lastvar means all variables in the dataset from firstvar to lastvar inclusive. Type help varlist for more details
  • 22. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Do-Files Log Files Interaction with Operating System Macros Lists Number Lists Symbol Meaning Example Expansion list of numbers 1 2 3 1 2 3 x/y whole numbers from x to y inclusive 1/5 1 2 3 4 5 x y to z numbers from x to z, increasing by y − x 5 10 to 20 5 10 15 20 x y : z same as x y to z 5 10:20 5 10 15 20 x(y)z numbers from x to z, increasing by y 10(10)50 10 20 30 40 50 x[y]z same as x(y)z 10[10]50 10 20 30 40 50 Number Lists
  • 23. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Manipulating Variables generate & replace egen Labelling Selecting variables
  • 24. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables generate Used to create a new variable Syntax: generate [type] newvar = expression newvar must not already exist type, if present, defines the type of the data expression defines the values: e.g. generate ltitre = log(titre) generate str6 head = substr(name, 1, 6)
  • 25. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Variable Types type size (bytes) min max precision missing byte 1 -127 126 integers . int 2 -32,767 32,766 integers . long 4 -2,147,483,647 2,147,483,646 integers . ∗float 4 −1036 1036 7 digits . double 8 −10308 10308 15 digits . strn n "" strL varies "" Available data types ∗float is the default type.
  • 26. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Missing Values Numerical variables can have several different missing values: ., .a, .b, etc May be useful if you know why a variable is missing if variable != . may not catch all missing values All missing values are greater than any number representable by that datatype. Can exclude all missing values with if variable < . gen old = age > 65 if age < .
  • 27. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables replace Similar to generate Cannot change type newvar must already exist
  • 28. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables egen Extended GENerate Has more functions available User can write their own egen functions No ereplace: must drop the existing variable and create a new one Examples of its use in the practical See help egen for details
  • 29. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Labelling Need to label variables themselves show exactly what the variable measures Need to label values of a variable Only for categorical variables First define a label Then assign it to a variable Easier to assign same label to a number of variables Can label different missing values
  • 30. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Labelling a variable Syntax: label variable varname "Description" Example: label variable height "Height in m."
  • 31. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Labelling values Syntax: label define labelname 1 "string1" . . . label values varname labelname Example: label define yesno 0 "No" 1 "Yes" label values question1 yesno label values question2 yesno
  • 32. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Selecting variables drop varlist keep varlist
  • 33. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Creation & Modification Labelling Selecting variables Formatting Variables Adding a format to a variable changes how it is presented, not how it is stored Most useful for dates: Stored as days since 1/1/1960 Can be formatted in human readable form Date format: “%d” followed by string E.g. “%dD/N/CY” gives 01/01/1960 Type “help format” for details
  • 34. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Manipulating Datasets use & save append merge browse and edit preserve and restore
  • 35. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands use use "filename" reads a file into stata If there is already a file in stata, need use "filename", clear Always use inverted commas Easier to use the menu or button-bar
  • 36. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands save save "filename" saves the current dataset as "filename" If "filename" already exists, need save "filename", replace Option saveold allows saving in format of a previous version of stata If you do not include a directory in filename, stata will try to save it in the current directory
  • 37. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Combining Datasets append more subjects, same variables append using filename merge same subjects, more variables merge 1:1 identifier using filename
  • 38. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Appending Data: Example ID common_1 common_2 file1_1 file1_2 1 a1 b1 c1 d1 2 a2 b2 c2 d2 3 a3 b3 c3 d3 Appending Data: File 1 ID common_1 common_2 file2_1 file2_2 4 a4 b4 e4 f4 5 a5 b5 e5 f5 6 a6 b6 e6 f6 Appending Data: File 2
  • 39. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Appending Data: Example ID common_1 common_2 file1_1 file1_2 file2_1 file2_2 1 a1 b1 c1 d1 . . 2 a2 b2 c2 d2 . . 3 a3 b3 c3 d3 . . 4 a4 b4 . . e4 f4 5 a5 b5 . . e5 f5 6 a6 b6 . . e6 f6 Appending Data: Combined Files
  • 40. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Merging Data Need an identifier (one or more variables on which to match observations) Both files must be sorted by this identifier All observations from both files are used Variable _merge says whether observation was in first file, second file or both.
  • 41. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Merging Files: example idno var1 var2 1 a1 b1 2 a2 b2 3 a3 b3 Merging Data: File 1 idno var3 var4 1 c1 d1 3 c3 d3 4 c4 d4 Merging Data: File 2
  • 42. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Merging Files: example idno var1 var2 var3 var4 _merge 1 a1 b1 c1 d1 3 2 a2 b2 . . 1 3 a3 b3 c3 d3 3 4 . . c4 d4 2 Merging Data: Combined Files
  • 43. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands Ensuring Uniqueness Usually, should only be one observation per unique identifier May not be the case (e.g. adding family-level data to individual-level data) If there should be one observation per identifier in both datasets, use the command merge 1:1 If each record in current dataset corresponds to several in the merged dataset, use merge 1:m Equally, there are merge m:1 and merge 1:m commands
  • 44. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands browse & edit Can open a data editor window with browse Can choose variables to browse with browse varlist Cannot modify data while browsing edit allows data to be changed: don’t use it
  • 45. Introduction Getting Help Stata Windows Basic Concepts Manipulating Variables Manipulating Datasets Basics Appending Datasets Merging Datasets Other dataset commands preserve & restore You may wish to change your data temporarily E.g. collapse to means by group Type preserve before changing data, restore after