biweight_midcovariance¶

astropy.stats.
biweight_midcovariance
(data, c=9.0, M=None, modify_sample_size=False)[source]¶ Compute the biweight midcovariance between pairs of multiple variables.
The biweight midcovariance is a robust and resistant estimator of the covariance between two variables.
This function computes the biweight midcovariance between all pairs of the input variables (rows) in the input data. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable (see
biweight_midvariance()
). The offdiagonal elements will be the biweight midcovariances between each pair of input variables.For example, if the input array
data
contains three variables (rows)x
,y
, andz
, the outputndarray
midcovariance matrix will be:\[\begin{split}\begin{pmatrix} \zeta_{xx} & \zeta_{xy} & \zeta_{xz} \\ \zeta_{yx} & \zeta_{yy} & \zeta_{yz} \\ \zeta_{zx} & \zeta_{zy} & \zeta_{zz} \end{pmatrix}\end{split}\]where \(\zeta_{xx}\), \(\zeta_{yy}\), and \(\zeta_{zz}\) are the biweight midvariances of each variable. The biweight midcovariance between \(x\) and \(y\) is \(\zeta_{xy}\) (\(= \zeta_{yx}\)). The biweight midcovariance between \(x\) and \(z\) is \(\zeta_{xz}\) (\(= \zeta_{zx}\)). The biweight midcovariance between \(y\) and \(z\) is \(\zeta_{yz}\) (\(= \zeta_{zy}\)).
The biweight midcovariance between two variables \(x\) and \(y\) is given by:
\[\zeta_{xy} = n \ \frac{\Sigma_{u_i < 1, \ v_i < 1} \ (x_i  M_x) (1  u_i^2)^2 (y_i  M_y) (1  v_i^2)^2} {(\Sigma_{u_i < 1} \ (1  u_i^2) (1  5u_i^2)) (\Sigma_{v_i < 1} \ (1  v_i^2) (1  5v_i^2))}\]where \(M_x\) and \(M_y\) are the medians (or the input locations) of the two variables and \(u_i\) and \(v_i\) are given by:
\[ \begin{align}\begin{aligned}u_{i} = \frac{(x_i  M_x)}{c * MAD_x}\\v_{i} = \frac{(y_i  M_y)}{c * MAD_y}\end{aligned}\end{align} \]where \(c\) is the biweight tuning constant and \(MAD_x\) and \(MAD_y\) are the median absolute deviation of the \(x\) and \(y\) variables. The biweight midvariance tuning constant
c
is typically 9.0 (the default).For the standard definition of biweight midcovariance \(n\) is the total number of observations of each variable. That definition is used if
modify_sample_size
isFalse
, which is the default.However, if
modify_sample_size = True
, then \(n\) is the number of observations for which \(u_i < 1\) and \(v_i < 1\), i.e.\[n = \Sigma_{u_i < 1, \ v_i < 1} \ 1\]which results in a value closer to the true variance for small sample sizes or for a large number of rejected values.
Parameters:  data : 2D or 1D arraylike
Input data either as a 2D or 1D array. For a 2D array, it should have a shape (N_variables, N_observations). A 1D array may be input for observations of a single variable, in which case the biweight midvariance will be calculated (no covariance). Each row of
data
represents a variable, and each column a single observation of all those variables (same as thenumpy.cov
convention). c : float, optional
Tuning constant for the biweight estimator (default = 9.0).
 M : float or 1D arraylike, optional
The location estimate of each variable, either as a scalar or array. If
M
is an array, then its must be a 1D array containing the location estimate of each row (i.e.a.ndim
elements). IfM
is a scalar value, then its value will be used for each variable (row). IfNone
(default), then the median of each variable (row) will be used. modify_sample_size : bool, optional
If
False
(default), then the sample size used is the total number of observations of each variable, which follows the standard definition of biweight midcovariance. IfTrue
, then the sample size is reduced to correct for any rejected values (see formula above), which results in a value closer to the true covariance for small sample sizes or for a large number of rejected values.
Returns:  biweight_midcovariance :
ndarray
A 2D array representing the biweight midcovariances between each pair of the variables (rows) in the input array. The output array will have a shape of (N_variables, N_variables). The diagonal elements will be the biweight midvariances of each input variable. The offdiagonal elements will be the biweight midcovariances between each pair of input variables.
References
[1] http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/biwmidc.htm Examples
Compute the biweight midcovariance between two random variables:
>>> import numpy as np >>> from astropy.stats import biweight_midcovariance >>> # Generate two random variables x and y >>> rng = np.random.RandomState(1) >>> x = rng.normal(0, 1, 200) >>> y = rng.normal(0, 3, 200) >>> # Introduce an obvious outlier >>> x[0] = 30.0 >>> # Calculate the biweight midcovariances between x and y >>> bicov = biweight_midcovariance([x, y]) >>> print(bicov) # doctest: +FLOAT_CMP [[ 0.82483155 0.18961219] [0.18961219 9.80265764]] >>> # Print standard deviation estimates >>> print(np.sqrt(bicov.diagonal())) # doctest: +FLOAT_CMP [ 0.90820237 3.13091961]