•0 j'aime•149 vues

Télécharger pour lire hors ligne

Signaler

Power and Sample Size Simulations in R, presented by Robin Baudier, MSPH on 09/21/2023 as part of the OHSU OCTRI Research Forum

- 1. DATE: SEPTEMBER 21, 2023 PRESENTED BY: ROBIN BAUDIER, MSPH, MFA, STAFF BIOSTATISTICIAN WITH THE BDP Power and Sample Size Simulations in R OCTRI Research Forum
- 2. TOPICS COVERED Brief overviews of Power and Sample Size (PSS) concepts Monte Carlo Simulations Using Monte Carlo Simulations to calculate PSS How to perform PSS simulations in R Simulating variables Using loops Looping tests on simulated datasets to calculate power Examples of complex PSS simulations Packages that can make things a little easier
- 3. POWER & SAMPLE SIZE CALCULATIONS A VERY BRIEF OVERVIEW
- 4. FOR MORE INFORMATION ON THE BASICS OF PSS CALCULATIONS Check out the OCTRI PSS 101 Seminar, co-developed by Meike Niederhausen, PhD and Alicia Johnson, MPH, and presented most recently by Alicia Johnson in July: RECORDING: https://echo360.org/media/bd0064c8-327c-4b56-8279-9558cd307125/public SLIDES: https://drive.google.com/file/d/1691RWaDzzRkjbdhbfjfxO7yKESRUjuAn/view
- 5. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 Power is the probability of correctly rejecting the null hypothesis What is power? α is the probability of incorrectly rejecting the null hypothesis 1 - α 1 - β Critical Value
- 6. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 Statistical test example: one-tailed T-test α = 0.05 1 - α 1 - β 0 1.645 Reject the null hypothesis if t > 1.645 d.f.: n1+n2-2 t
- 7. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 1 - β Prediction example: bootstrapping ML model How do you determine if an machine learning (ML) model performs better than chance? You can create a null distribution by randomly permuting the outcome variable on your dataset a large number of times (e.g., 999) and run ML model on each dataset. 1 – (proportion of random R2 < real data model R2) = p-value ML model R2 H0: R2 = 0 HA: R2 > 0 Reject H0 if proportion of random R2 less than model R2 ≥95% α = 0.05
- 8. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 PSS simulation example: In PSS simulations, rather than modeling the null distribution as you do in bootstrapping, you simulate the HA distribution (with a specified N, effect size, SD, etc) a large number of times and assess what proportion of statistical tests performed on simulated data are able to detect the effect of interest What is the probability of correctly rejecting the null hypothesis (power) with the specified effect size, N, and α? Power = proportion of models (performed on data simulated with HA distributions) with p-values <α HA models with p< α HA models with p≥ α
- 9. Image adapted from: https://towardsdatascience.com/5-ways-to-increase-statistical-power-377c00dd0214 PSS simulation example: In PSS simulations, rather than modeling the null distribution as you do in bootstrapping, you simulate the HA distribution (with a specified N, effect size, SD, etc) a large number of times and assess what proportion of statistical tests performed on simulated data are able to detect the effect of interest What is the probability of correctly rejecting the null hypothesis (power) with the specified effect size, N, and α? Power = proportion of models (performed on data simulated with HA distributions) with p-values <α HA models with p< α HA models with p≥ α What if you don’t have good power?
- 11. Also need to know: • What’s your data distribution and structure? • What statistical test/model are you planning to use?
- 12. POWER & SAMPLE SIZE SIMULATIONS WHEN, WHAT & HOW
- 13. WHEN SHOULD YOU USE A PSS SIMULATION? When there isn’t an easier way. Before you start down the road of power simulations, check whether any R packages exist for the power simulation you want to perform!
- 14. PSS R PACKAGES https://med.und.edu/research/daccota/_files/pdfs/berd c_resource_pdfs/sample_size_r_module.pdf
- 15. Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs). Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco. They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research. WHAT IS A MONTE CARLO SIMULATION?
- 16. Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs). Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco. They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research. WHAT IS A MONTE CARLO SIMULATION?
- 17. WHAT IS A MONTE CARLO SIMULATION? Monte Carlo Simulation (aka, Monte Carlo Method or a multiple probability simulation) is a mathematical technique invented during WWII by John von Neumann and Stanislaw Ulam that is used to estimate the possible outcomes of an uncertain event (e.g., modeling nuclear fission to develop atomic bombs). Because the element of chance (similar to a game of roulette) is core to the modeling approach, it’s named after the well-known casino district in Monaco. They are used in a wide range of fields and applications, including financial risk assessment and long-term forecasting, estimating the duration or cost of projects, analyzing weather patterns, traffic flow, and in a wide swath of applications in biomedical research.
- 18. HOW DOES A MONTE CARLO FORECASTING SIMULATION WORK? Three basic steps: Specify probability distributions of the independent variables (distribution, range, correlations, etc) Set up your hypothesized predictive model, including dependent (outcome) variable and the independent variable(s) (predictors & covariates) Run simulations of your model repeatedly, using generated random values of the independent variables every time to created a predicted value of the outcome. Do this until enough results are gathered to make a probability distribution of your forecasted outcome.
- 19. HOW DOES A MONTE CARLO PSS SIMULATION WORK? Three basic steps: Specify probability distributions and of the independent and dependent variables (distribution, range, correlations, etc) based on your hypothesized predictive model of the true relationship between variables Run simulations repeatedly, generating random values of the independent and dependent variables every time, and test your model’s ability to detect their true relationship at the specified N /effect size/alpha. Do this repeatedly to assess what proportion of your simulated results can detect the true relationship between predictor and outcome (i.e., p-value < alpha). This is your power. If you’re happy with your power from the previous step, skip this step. If you’re power is too low (e.g., <0.80), increase your N or effect size. If you’re power is higher than you need, you can lower the N or effect size to find the minimum detectable at the desired power.
- 20. YES, BUT HOW DO YOU ACTUALLY DO THIS IN R? The rest of this seminar will be taking place in the R Studio environment. Code and data used is available in Posit cloud by following this link: https://posit.cloud/spaces/409887/join?access_code=VJ50mLqZ55QM8kAGEAGZfd5ZPnOfCw0 8LvW8i7Qv The next few slides will briefly walk you through the process of accessing seminar code, which can be done now or at a later date using the above link
- 21. JOINING POSIT SPACE Sign up for free Posit Cloud account: https://posit.cloud/plans/free After you are logged in, click link to the shared workspace: https://posit.cloud/spaces/409887/join?a ccess_code=VJ50mLqZ55QM8kAGEAGZfd5 ZPnOfCw08LvW8i7Qv When prompted, say “Yes” to Join Space
- 23. OPEN PROJECT Click this to open a temporary copy Click + to make yourself a permanent copy
- 24. OPEN PSS SIMULATIONS IN R.RMD FILE Open code
- 25. Questions?