Foreword

  • Output options: the ‘tango’ syntax and the ‘readable’ theme.
  • Snippets and results.
  • Source: Several public sources (Internet).


Bootstrapping a Single Statistic (k = 1)

Using the boot package and function, we can bootstrap a single statistic (for example, a median) or a vector of several statistics (for example, regression weights).

  • 95% confidence interval.
  • Statistic = \(R^2\), in a linear regression of miles per gallon (mpg) on car weight (wt) and displacement (disp).
  • 1000 replications.

The deterministic result

# The regression
summary(lm(mpg ~ wt + disp, data = mtcars))[1]
## $call
## lm(formula = mpg ~ wt + disp, data = mtcars)
summary(lm(mpg ~ wt + disp, data = mtcars))[8]
## $r.squared
## [1] 0.7809306

The bootstrap

library(boot)

# Function to obtain R-Squared from the data
rsq <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample
  fit <- lm(formula, data = d)
  return(summary(fit)$r.square)
}

# Bootstrapping with 1000 replications
results <- boot(statistic = rsq,
   formula = mpg ~ wt + disp, data = mtcars, R = 1000)

# Results
results
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = mtcars, statistic = rsq, R = 1000, formula = mpg ~ 
##     wt + disp)
## 
## 
## Bootstrap Statistics :
##      original     bias    std. error
## t1* 0.7809306 0.01278098  0.04760536
# 95% confidence interval
boot.ci(results, type = 'bca')
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.6252,  0.8486 )  
## Calculations and Intervals on Original Scale
## Some BCa intervals may be unstable

Plot the results.

plot(results)



Bootstrapping Several Statistics (k > 1)

  • 95% CI.
  • Statistics: intercept, car weight (wt), displacement (disp), in the same regression as above
  • 1000 replications.

The deterministic result

summary(lm(mpg ~ wt + disp, data = mtcars))
## 
## Call:
## lm(formula = mpg ~ wt + disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4087 -2.3243 -0.7683  1.7721  6.3484 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.96055    2.16454  16.151 4.91e-16 ***
## wt          -3.35082    1.16413  -2.878  0.00743 ** 
## disp        -0.01773    0.00919  -1.929  0.06362 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.917 on 29 degrees of freedom
## Multiple R-squared:  0.7809, Adjusted R-squared:  0.7658 
## F-statistic: 51.69 on 2 and 29 DF,  p-value: 2.744e-10

The bootstrap

library(boot)

# Function to obtain regression weights
bs <- function(formula, data, indices) {
  d <- data[indices,] # allows boot to select sample
  fit <- lm(formula, data = d)
  return(coef(fit))
}
# Bootstrapping with 1000 replications
results <- boot(statistic = bs,
   formula = mpg ~ wt + disp, data = mtcars, R = 1000)

# Results
results
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = mtcars, statistic = bs, R = 1000, formula = mpg ~ 
##     wt + disp)
## 
## 
## Bootstrap Statistics :
##        original        bias    std. error
## t1* 34.96055404  0.2264555291  2.57365563
## t2* -3.35082533 -0.1110873061  1.22643344
## t3* -0.01772474  0.0003609026  0.00899893
# 95% confidence intervals
boot.ci(results, type = "bca", index = 1) # intercept
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results, type = "bca", index = 1)
## 
## Intervals : 
## Level       BCa          
## 95%   (29.56, 39.57 )  
## Calculations and Intervals on Original Scale
boot.ci(results, type = "bca", index = 2) # wt
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results, type = "bca", index = 2)
## 
## Intervals : 
## Level       BCa          
## 95%   (-5.648, -0.680 )  
## Calculations and Intervals on Original Scale
boot.ci(results, type = "bca", index = 3) # disp 
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = results, type = "bca", index = 3)
## 
## Intervals : 
## Level       BCa          
## 95%   (-0.0358, -0.0004 )  
## Calculations and Intervals on Original Scale
plot(results, index = 1) # intercept

plot(results, index = 2) # wt

plot(results, index = 3) # disp

Conclusion

We can generate both nonparametric and parametric resampling.

For the nonparametric bootstrap, resampling methods include ordinary, balanced, antithetic and permutation.

For the nonparametric bootstrap, stratified resampling is supported. Importance resampling weights can also be specified.

The boot.ci function can generate 5 different types of two-sided nonparametric confidence intervals: the first order normal approximation, the basic bootstrap interval, the studentized bootstrap interval, the bootstrap percentile interval, and the adjusted bootstrap percentile (BCa) interval.


Münchhausen pulling himself up by his own bootstraps to get out of a difficult situation.