Power and sample size calculation are major components of statistical analyses. The upstrap resampling method was proposed as a general solution to this problem.
We evaluate power estimation properties of the upstrap and provide a series of “read, adapt and use” R code examples for power estimation in simple and complex settings (bioRxiv preprint).
Table of Contents
Scientific problem
We consider the following problem: given an observed data sample $\mathbf{x}$ of sample size $N$, given specific null and alternative hypothesis, and a test statistic, assuming significance level $\alpha$, estimate sample size $M$ required to achieve power $1 \beta$ (i.e., to achieve probability $1 \beta$ of rejecting the null hypothesis when the null is true).
Here, we consider the tasks to estimate the power to detect:
 an effect size observed in the data;
 an effect size chosen by a researcher.
We aim to ddress complex settings, including testing significance of model coefficients in: LM, GLM, LMM, GLMM, GEE, and others.
Challenges
For multilevel data settings, the existing approaches tend to fall into two categories.

First are theoretical results for estimating power in specific multilevel data setups; these often use assumptions about the intraclass correlation coefficient, and/or assume a particular study design (e.g., that the data are balanced).

Another class is based on simulations. Such may require specifying a population model for the data, simulating data from the assumed model, and estimating the power via Monte Carlo simulations.
The former arguably lacks flexibility; in addition, finding and then determining applicability of a theoretical result may be difficult. The latter is flexible but involves potentially complex programming task.
Proposed solution
Upstrap
The upstrap resampling method (Crainiceanu, C.M., Crainiceanu, A. (2020)) was proposed as a general solution to this problem.
Upstrap starts with a sample (observed data) and resamples with replacement either fewer or more samples than in the original data set. The below example shows difference between bootstrap and upstrap resample.
# simulate data
x < rnorm(n = 10)
# bootstrap resample
x_b < sample(x, size = 10, replace = TRUE)
# upstrap resample (case: more samples than in original x)
x_u < sample(x, size = 20, replace = TRUE)
Upstrap for estimating power
Given observed data sample $\mathbf{x}$, to estimate power for a target sample size, we propose:

Generate $B$ resamples of target sample size by sampling with replacement from $\mathbf{x}$.

Perform hypothesis test on each resample.

Estimate power as the proportion of $B$ resamples where the null hypothesis was rejected.
The above procedure estimates power corresponding to the effect size observed in the sample $\mathbf{x}$. For example, for onesample ttest, the above procedure estimates:
power.t.test(delta = mean(x), sd = sd(x), ...)$power
Upstrap for estimating power: specific effect size
In practice, one is often is interested in estimating power for a specific effect size. For example, for onesample ttest, a specific effect size could be set to 0.3:
power.t.test(delta = 0.3, sd = sd(x), ...)$power
To address such cases, we propose to update response variable values in the observed sample $\mathbf{x}$ to ensure the effect size in this updated data is our target effect size. Details are provided in the preprint.
Code example: testing for significance of LM coefficient
Below, we demonstrate the upstrap power estimation method for testing for significance of LM coefficient. In the preprint, we provide more R code examples, including testing for significance of coefficient in LM, GLM, LMM, GLMM.
We define simulation parameters and simulate a sample of size $N = 50$.
(Click to see setup definition and R code.)
Setup
Consider a random sample with $N = 50$ independent observations (e.g., 50 subjects, 1 observation per subject). Assume a continuous response variable $Y$ and two covariates: dichotomous $X_1$, continuous $X_2$. We are interested in estimating power of test for significance of the coefficient $\beta_{1}$ in linear model $Y_{i}=\beta_{0}+\beta_{1} X_{1 i}+\beta_{2} X_{2 i}+\varepsilon_{i}$, where $i=1, \ldots, N$ and $\varepsilon_{i} \sim_{\text {iid }} N\left(0, \sigma^{2}\right)$.
# simulation parameters
N < 50
coef_x0 < 0
coef_x1 < 0.2
coef_x2 < 0.1
sigma2 < 1
# simulate sample
set.seed(1)
subjid_i < 1 : N # subject ID unique in data set
x1_i < rbinom(n = N, size = 1, prob = 0.5)
x2_i < rbinom(n = N, size = 1, prob = 0.5)
eps_i < rnorm(N, sd = sqrt(sigma2))
y_i < coef_x0 + (coef_x1 * x1_i) + (coef_x2 * x2_i) + eps_i
dat < data.frame(y = y_i, x1 = x1_i, x2 = x2_i, subjid = subjid_i)
str(dat)
# 'data.frame': 50 obs. of 4 variables:
# $ y : num 0.398 0.512 0.541 0.929 1.433 ...
# $ x1 : int 0 0 1 1 0 1 1 1 1 0 ...
# $ x2 : int 0 1 0 0 0 0 0 1 1 0 ...
# $ subjid: int 1 2 3 4 5 6 7 8 9 10 ...
We get the observed effect size.
fit < lm(y ~ x1 + x2, data = dat)
coef(fit)["x1"]
# 0.5585879
We estimate the test power via the upstrap.
Case: target sample size M = N = 50, target effect size as observed in the sample
# number of upstrap resamples
B_boot < 1000
out < rep(NA, B_boot)
for (rr in 1 : B_boot){
dat_rr_idx < sample(1 : nrow(dat), replace = TRUE)
dat_rr < dat[dat_rr_idx, ]
fit_rr < lm(y ~ x1 + x2, data = dat_rr)
pval_rr < summary(fit_rr)$coef["x1", 4]
out[rr] < (pval_rr < 0.05)
}
mean(out)
# [1] 0.493
Case: target sample size M = 100, target effect size as observed in the sample
out < rep(NA, B_boot)
for (rr in 1 : B_boot){
dat_rr_idx < sample(1 : nrow(dat), size = 100, replace = TRUE)
dat_rr < dat[dat_rr_idx, ]
fit_rr < lm(y ~ x1 + x2, data = dat_rr)
pval_rr < summary(fit_rr)$coef["x1", 4]
out[rr] < (pval_rr < 0.05)
}
mean(out)
# [1] 0.773
Case: target sample size M = 100, target effect size set to 0.8
# update the outcome in the sample to represent the target effect size
dat_upd < dat
dat_upd$y < dat_upd$y + (0.8  coef(fit)["x1"]) * dat_upd$x1
out < rep(NA, B_boot)
for (rr in 1 : B_boot){
dat_rr_idx < sample(1 : nrow(dat_upd), size = 100, replace = TRUE)
dat_rr < dat_upd[dat_rr_idx, ]
fit_rr < lm(y ~ x1 + x2, data = dat_rr)
pval_rr < summary(fit_rr)$coef["x1", 4]
out[rr] < (pval_rr < 0.05)
}
mean(out)
# [1] 0.92
Contributions
 Our preprint: Karas, M., Crainiceanu, C.M. (2021) Upstrap for estimating power and sample size in complex models is available on bioRxiv.