Quadratic assignment procedure (QAP) is a permutation test that controls for non-independence in network data where dyads are not independent by permuting the response variable to create a sampling distribution of the null hypothesis. QAP can perform correlations and regressions on network data and is easy to interpret. It works by performing a regression on the original data and then permuting the response variable many times to create random datasets for comparison. The p-value is the proportion of times the null coefficient is greater than or equal to the observed estimate.
Social network analysis: Quadratic assignment procedure
1. Social Network Analysis
Regressions with quadratic assignment procedure
{matthew gwynfryn thomas}
{human evolutionary ecology group: 26 march 2015}
2. What is quadratic assignment procedure (QAP)?
Network data is structured / dyads not independent / observations are correlated
QAP is a permutation test that controls for this non-independence
Can do correlations and multivariate regressions (linear and logistic)
Easy to interpret
A B C D
A - 0 0 0
B 1 - 1 1
C 0 1 - 0
D 0 1 0 -
3. What is QAP good for?
189 male hunters 30 gift givers, 75 potential recipients
35 households
50 nuclear families
4. How does QAP work?
Gift game example | Question: What predicts gift giving?
A B C D
A - 0 0 0
B 1 - 1 1
C 0 1 - 0
D 0 1 0 -
A B C D
A - 0 0 0
B 0 - 0.5 0
C 0 0.5 - 0
D 0 0 0 -
A B C D
A - 12 0 -2
B -12 - -5 7
C 0 5 - 1
D 2 -7 -1 -
~ +
Gift network Relatedness Age differences
1. Regression on response and predictors
2. Permute response variable lots of times to create random datasets
These give sampling distribution of null hypothesis
Preserves dependence between dyads – (person A’s values stay together during permutation)
– but removes relationship between response/predictors
5. How does QAP work?
Gift game example | Question: What predicts gift giving?
A B C D
A - 0 0 0
B 1 - 1 1
C 0 1 - 0
D 0 1 0 -
A B C D
A - 0 0 0
B 0 - 0.5 0.5
C 0 0.5 - 0
D 0 0.5 0 -
A B C D
A - 12 0 -2
B -12 - -5 7
C 0 5 - 1
D 2 -7 -1 -
~ +
Gift network Relatedness Age differences
1. Regression on response and predictors
2. Permute response variable lots of times to create random datasets
3. p value is proportion of times the null coefficient is ≥ observed estimate
6. How to run them?
R
– ‘sna’ package
• netlm for linear regressions
• netlogit for logistic regressions
– Matrices can be square or rectangular
• Use this code if you want to analyse rectangular matrices:
https://gist.github.com/matthewgthomas/728c53b7c7b99c12f1af
UCINET
– Ugly and painful to use but fast
– Can only analyse square matrices
Can also use Stata (qap command) and probably other packages…
A B C D
A - 0 0 0
B 1 - 1 1
C 0 1 - 0
D 0 1 0 -
A B C D E F G
A - 0 0 0 1 0 0
B 1 - 1 1 0 0 1
C 0 1 - 0 0 1 0
7. Downsides
Not very scalable – gets very slow and needs much more memory with increasing
network size and number of repetitions (“curse of dimensionality”)
– On my computer, 1,000 repetitions of a QAP logistic regression takes:
• ~20 seconds with 30 nodes
• half an hour with 300 nodes
• With 3,000 nodes got bored and switched it off after 18 hours
Can’t analyse interactions between predictors
Doesn’t report standard errors of estimated coefficients
– Empirical confidence interval is around the null, rather than around the sample estimate
8. Some references
‘sna’ package: http://cran.r-project.org/web/packages/sna/index.html
Nerdy stats papers:
- Hubert & Schultz (1976): Quadratic assignment as a general data analysis
strategy
- Krackhardt (1988): Predicting with networks: Nonparametric multiple regression
analysis of dyadic data
- Krackhardt (1992): A caveat on the use of the quadratic assignment procedure
- Dekker et al. (2007): Sensitivity of MRQAP tests to collinearity and
autocorrelation conditions
Notes de l'éditeur
After scrambling the response variable, you’d expect no relationship to predictors – i.e. null hypothesis
After scrambling the response variable, you’d expect no relationship to predictors – i.e. null hypothesis