bootstrap

astropy.stats.bootstrap(data, bootnum=100, samples=None, bootfunc=None)[source] [edit on github]

Performs bootstrap resampling on numpy arrays.

Bootstrap resampling is used to understand confidence intervals of sample estimates. This function returns versions of the dataset resampled with replacement (“case bootstrapping”). These can all be run through a function or statistic to produce a distribution of values which can then be used to find the confidence intervals.

Parameters:
data : numpy.ndarray

N-D array. The bootstrap resampling will be performed on the first index, so the first index should access the relevant information to be bootstrapped.

bootnum : int, optional

Number of bootstrap resamples

samples : int, optional

Number of samples in each resample. The default None sets samples to the number of datapoints

bootfunc : function, optional

Function to reduce the resampled data. Each bootstrap resample will be put through this function and the results returned. If None, the bootstrapped data will be returned

Returns:
boot : numpy.ndarray

If bootfunc is None, then each row is a bootstrap resample of the data. If bootfunc is specified, then the columns will correspond to the outputs of bootfunc.

Examples

Obtain a twice resampled array:

>>> from astropy.stats import bootstrap
>>> import numpy as np
>>> from astropy.utils import NumpyRNGContext
>>> bootarr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 0])
>>> with NumpyRNGContext(1):
...     bootresult = bootstrap(bootarr, 2)
...
>>> bootresult  
array([[6., 9., 0., 6., 1., 1., 2., 8., 7., 0.],
       [3., 5., 6., 3., 5., 3., 5., 8., 8., 0.]])
>>> bootresult.shape
(2, 10)

Obtain a statistic on the array

>>> with NumpyRNGContext(1):
...     bootresult = bootstrap(bootarr, 2, bootfunc=np.mean)
...
>>> bootresult  
array([4. , 4.6])

Obtain a statistic with two outputs on the array

>>> test_statistic = lambda x: (np.sum(x), np.mean(x))
>>> with NumpyRNGContext(1):
...     bootresult = bootstrap(bootarr, 3, bootfunc=test_statistic)
>>> bootresult  
array([[40. ,  4. ],
       [46. ,  4.6],
       [35. ,  3.5]])
>>> bootresult.shape
(3, 2)

Obtain a statistic with two outputs on the array, keeping only the first output

>>> bootfunc = lambda x:test_statistic(x)[0]
>>> with NumpyRNGContext(1):
...     bootresult = bootstrap(bootarr, 3, bootfunc=bootfunc)
...
>>> bootresult  
array([40., 46., 35.])
>>> bootresult.shape
(3,)