# Astrostatistics Tools (`astropy.stats`)¶

## Introduction¶

The `astropy.stats` package holds statistical functions or algorithms used in astronomy. While the `scipy.stats` and statsmodels packages contains a wide range of statistical tools, they are general-purpose packages and are missing some tools that are particularly useful or specific to astronomy. This package is intended to provide such functionality, but not to replace `scipy.stats` if its implementation satisfies astronomers’ needs.

## Getting Started¶

A number of different tools are contained in the stats package, and they can be accessed by importing them:

```>>> from astropy import stats
```

A full list of the different tools are provided below. Please see the documentation for their different usages. For example, sigma clipping, which is a common way to estimate the background of an image, can be performed with the `sigma_clip()` function. By default, the function returns a masked array, a type of Numpy array used for handling missing or invalid entries. Masked arrays retain the original data but also store another boolean array of the same shape where `True` indicates that the value is masked. Most Numpy ufuncs will understand masked arrays and treat them appropriately. For example, consider the following dataset with a clear outlier:

```>>> import numpy as np
>>> from astropy.stats import sigma_clip
>>> x = np.array([1, 0, 0, 1, 99, 0, 0, 1, 0])
```

The mean is skewed by the outlier:

```>>> x.mean()
11.333333333333334
```

Sigma-clipping (3 sigma by default) returns a masked array, and so functions like `mean` will ignore the outlier:

```>>> clipped = sigma_clip(x)
>>> clipped
masked_array(data=[1, 0, 0, 1, --, 0, 0, 1, 0],
mask=[False, False, False, False,  True, False, False, False,
False],
fill_value=999999)
>>> clipped.mean()
0.375
```

If you need to access the original data directly, you can use the `.data` property. Combined with the `.mask` property, you can get the original outliers, or the values that were not clipped:

```>>> outliers = clipped.data[clipped.mask]
>>> outliers
array()
>>> valid
array([1, 0, 0, 1, 0, 0, 1, 0])
```

### Examples¶

To estimate the background of an image:

```>>> data = [1, 5, 6, 8, 100, 5, 3, 2]
>>> stats.sigma_clip(data, sigma=2, maxiters=5)
masked_array(data=[1, 5, 6, 8, --, 5, 3, 2],
mask=[False, False, False, False,  True, False, False, False],
fill_value=999999)
```

Alternatively, the `SigmaClip` class provides an object-oriented interface to sigma clipping, which also returns a masked array by default:

```>>> sigclip = stats.SigmaClip(sigma=2, maxiters=5)
>>> sigclip(data)
masked_array(data=[1, 5, 6, 8, --, 5, 3, 2],
mask=[False, False, False, False,  True, False, False, False],
fill_value=999999)
```

In addition, there are also several convenience functions for making the calculation of statistics even more convenient. For example, `sigma_clipped_stats()` will return the mean, median, and standard deviation of a sigma-clipped array:

```>>> stats.sigma_clipped_stats(data, sigma=2, maxiters=5)
(4.2857142857142856, 5.0, 2.2497165354319457)
```

There are also tools for calculating robust statistics, sampling the data, circular statistics, confidence limits, spatial statistics, and adaptive histograms.

Most tools are fairly self-contained, and include relevant examples in their docstrings.

## Using `astropy.stats`¶

More detailed information on using the package is provided on separate pages, listed below.

## Constants¶

The `astropy.stats` package defines two constants useful for converting between Gaussian sigma and full width at half maximum (FWHM):

gaussian_sigma_to_fwhm

Factor with which to multiply Gaussian 1-sigma standard deviation to convert it to full width at half maximum (FWHM).

```>>> from astropy.stats import gaussian_sigma_to_fwhm
>>> gaussian_sigma_to_fwhm
2.3548200450309493
```
gaussian_fwhm_to_sigma

Factor with which to multiply Gaussian full width at half maximum (FWHM) to convert it to 1-sigma standard deviation.

```>>> from astropy.stats import gaussian_fwhm_to_sigma
>>> gaussian_fwhm_to_sigma
0.42466090014400953
```

## Performance Tips¶

If you are finding sigma clipping to be slow, and if you have not already done so, consider installing the bottleneck package, which will speed up some of the internal computations. In addition, if you are using standard functions for `cenfunc` and/or `stdfunc`, make sure you specify these as strings rather than passing a NumPy function — that is, use:

```>>> sigma_clip(array, cenfunc='median')
```

```>>> sigma_clip(array, cenfunc=np.nanmedian)
```

Using strings will allow the sigma-clipping algorithm to pick the fastest implementation available for finding the median.

## Reference/API¶

### astropy.stats Package¶

This subpackage contains statistical tools provided for or used by Astropy.

While the `scipy.stats` package contains a wide range of statistical tools, it is a general-purpose package, and is missing some that are particularly useful to astronomy or are used in an atypical way in astronomy. This package is intended to provide such functionality, but not to replace `scipy.stats` if its implementation satisfies astronomers’ needs.

#### Functions¶

 `binom_conf_interval`(k, n[, ...]) Binomial proportion confidence interval given k successes, n trials. `binned_binom_proportion`(x, success[, bins, ...]) Binomial proportion and confidence interval in bins of a continuous variable `x`. `poisson_conf_interval`(n[, interval, sigma, ...]) Poisson parameter confidence interval given observed counts. `median_absolute_deviation`(data[, axis, ...]) Calculate the median absolute deviation (MAD). `mad_std`(data[, axis, func, ignore_nan]) Calculate a robust standard deviation using the median absolute deviation (MAD). `signal_to_noise_oir_ccd`(t, source_eps, ...) Computes the signal to noise ratio for source being observed in the optical/IR using a CCD. `bootstrap`(data[, bootnum, samples, bootfunc]) Performs bootstrap resampling on numpy arrays. `kuiper`(data[, cdf, args]) Compute the Kuiper statistic. `kuiper_two`(data1, data2) Compute the Kuiper statistic to compare two samples. Compute the false positive probability for the Kuiper statistic. `cdf_from_intervals`(breaks, totals) Construct a callable piecewise-linear CDF from a pair of arrays. Compute the length of overlap of two intervals. `histogram_intervals`(n, breaks, totals) Histogram of a piecewise-constant weight function. `fold_intervals`(intervals) Fold the weighted intervals to the interval (0,1). `biweight_location`(data[, c, M, axis, ignore_nan]) Compute the biweight location. `biweight_scale`(data[, c, M, axis, ...]) Compute the biweight scale. `biweight_midvariance`(data[, c, M, axis, ...]) Compute the biweight midvariance. `biweight_midcovariance`(data[, c, M, ...]) Compute the biweight midcovariance between pairs of multiple variables. `biweight_midcorrelation`(x, y[, c, M, ...]) Compute the biweight midcorrelation between two variables. `sigma_clip`(data[, sigma, sigma_lower, ...]) Perform sigma-clipping on the provided data. `sigma_clipped_stats`(data[, mask, ...]) Calculate sigma-clipped statistics on the provided data. Performs jackknife resampling on numpy arrays. `jackknife_stats`(data, statistic[, ...]) Performs jackknife estimation on the basis of jackknife resamples. `circmean`(data[, axis, weights]) Computes the circular mean angle of an array of circular data. `circstd`(data[, axis, weights, method]) Computes the circular standard deviation of an array of circular data. `circvar`(data[, axis, weights]) Computes the circular variance of an array of circular data. `circmoment`(data[, p, centered, axis, weights]) Computes the `p`-th trigonometric circular moment for an array of circular data. `circcorrcoef`(alpha, beta[, axis, ...]) Computes the circular correlation coefficient between two array of circular data. `rayleightest`(data[, axis, weights]) Performs the Rayleigh test of uniformity. `vtest`(data[, mu, axis, weights]) Performs the Rayleigh test of uniformity where the alternative hypothesis H1 is assumed to have a known mean angle `mu`. `vonmisesmle`(data[, axis, weights]) Computes the Maximum Likelihood Estimator (MLE) for the parameters of the von Mises distribution. `bayesian_blocks`(t[, x, sigma, fitness]) Compute optimal segmentation of data with Scargle's Bayesian Blocks. `histogram`(a[, bins, range, weights]) Enhanced histogram function, providing adaptive binnings. `scott_bin_width`(data[, return_bins]) Return the optimal histogram bin width using Scott's rule. `freedman_bin_width`(data[, return_bins]) Return the optimal histogram bin width using the Freedman-Diaconis rule. `knuth_bin_width`(data[, return_bins, quiet]) Return the optimal histogram bin width using Knuth's rule. `calculate_bin_edges`(a[, bins, range, weights]) Calculate histogram bin edges like `numpy.histogram_bin_edges`. `bayesian_info_criterion`(log_likelihood, ...) Computes the Bayesian Information Criterion (BIC) given the log of the likelihood function evaluated at the estimated (or analytically derived) parameters, the number of parameters, and the number of samples. `bayesian_info_criterion_lsq`(ssr, n_params, ...) Computes the Bayesian Information Criterion (BIC) assuming that the observations come from a Gaussian distribution. `akaike_info_criterion`(log_likelihood, ...) Computes the Akaike Information Criterion (AIC). `akaike_info_criterion_lsq`(ssr, n_params, ...) Computes the Akaike Information Criterion assuming that the observations are Gaussian distributed.

#### Classes¶

 `SigmaClip`([sigma, sigma_lower, sigma_upper, ...]) Class to perform sigma clipping. `FitnessFunc`([p0, gamma, ncp_prior]) Base class for bayesian blocks fitness functions. `Events`([p0, gamma, ncp_prior]) Bayesian blocks fitness for binned or unbinned events. `RegularEvents`(dt[, p0, gamma, ncp_prior]) Bayesian blocks fitness for regular events. `PointMeasures`([p0, gamma, ncp_prior]) Bayesian blocks fitness for point measures. `RipleysKEstimator`(area[, x_max, y_max, ...]) Estimators for Ripley's K function for two-dimensional spatial data.