Masked Values (
Often, data sets are incomplete or corrupted and it would be handy to be able
to mask certain values. Astropy provides a
Masked class to help represent
such data sets.
Masked is experimental! While we hope basic usage will remain
similar, we are not yet sure whether it will not be necessary to change it
to make things work throughout Astropy. This also means that comments and
suggestions for improvements are especially welcome!
Masked is similar to Numpy’s
but it supports subclasses much better and also has some important
differences in behaviour.
As a result, the behaviour of functions inside
numpy.ma is poorly
defined, and one should instead use regular
numpy functions, which
are overridden to work properly with masks (with non-obvious
choices documented in
report numpy functions that do not work properly with
>>> import numpy as np >>> from astropy import units as u >>> from astropy.utils.masked import Masked >>> ma = Masked([1., 2., 3.], mask=[False, False, True]) >>> ma MaskedNDArray([1., 2., ——]) >>> mq = ma * u.m >>> mq + 25 * u.cm <MaskedQuantity [1.25, 2.25, ———] m>
>>> mq.unmasked <Quantity [1., 2., 3.] m> >>> mq.filled(fill_value=-75*u.cm) <Quantity [ 1. , 2. , -0.75] m>
For reductions such as sums, the mask propagates as if the sum was done directly:
>>> ma = Masked([[0., 1.], [2., 3.]], mask=[[False, True], [False, False]]) >>> ma.sum(axis=-1) MaskedNDArray([——, 5.]) >>> ma.sum() MaskedNDArray(——)
You might wonder why masked elements are propagated, instead of just being
skipped (as is done in
MaskedArray; see below). The rationale is that this leaves a
sum which is generally not useful unless one knows the number of masked
elements. In contrast, for sample properties such as the mean, for which the
number of elements are counted, it seems natural to simply omit the masked
elements from the calculation:
>> ma.mean(-1) MaskedNDArray([0.0, 2.5])
>>> np_ma = np.ma.MaskedArray([1., 2., 3.], mask=[False, True, False]) >>> (np_ma + 1).data array([2., 2., 4.]) >>> (Masked(np_ma) + 1).unmasked array([2., 3., 4.])
The main reason for this decision is that for some masked subclasses, like
Quantity, keeping the original value makes no sense (e.g., consider
dividing a length by a time: if the unit of a masked quantity is changing, why
should its value not change?). But it also helps to keep the implementation
considerably simpler, as the
Masked class now primarily has to deal with
propagating the mask rather than deciding what to do with values.
A second difference is that for reductions, the mask propagates as it would have if the operations were done on the individual elements:
>>> np_ma.prod() 3.0 >>> np_ma * np_ma * np_ma masked >>> Masked(np_ma).prod() MaskedNDArray(——)
The rationale for this becomes clear again by thinking about subclasses like a
Quantity. For instance, consider an array
s of lengths with
(N, 3), in which the last axis represents width, height, and depth.
With this, you could compute corresponding volumes by taking the product of
the values in the last axis,
s.prod(axis=-1). But if masked elements were
skipped, the physical dimension of entries in the result would depend how many
elements were masked, which is something
Quantity could not represent (and
would be rather surprising!). As noted above, however, masked elements are
skipped for operations for which this is well defined, such as for getting the
mean and other sample properties such as the variance and standard deviation.
A third difference is more conceptual. For
instance that is created is a masked version of the unmasked instance, i.e.,
MaskedArray remembers that is has wrapped a subclass like
Quantity, but does not share any of its methods. Hence, even though the
resulting class looks reasonable at first glance, it does not work as expected:
>>> q = [1., 2.] * u.m >>> np_mq = np.ma.MaskedArray(q, mask=[False, True]) >>> np_mq masked_Quantity(data=[1.0, --], mask=[False, True], fill_value=1e+20) >>> np_mq.unit Traceback (most recent call last): ... AttributeError: 'MaskedArray' object has no attribute 'unit' >>> np_mq / u.s <Quantity [1., 2.] 1 / s>
Masked is always wrapped around the data properly, i.e., a
MaskedQuantity is a quantity which has masked values, but with a unit that
is never masked. Indeed, one can see this from the class hierarchy:
>>> mq.__class__.__mro__ (<class 'astropy.utils.masked.core.MaskedQuantity'>, <class 'astropy.units.quantity.Quantity'>, <class 'astropy.utils.masked.core.MaskedNDArray'>, <class 'astropy.utils.masked.core.Masked'>, <class 'astropy.utils.shapes.NDArrayShapeMethods'>, <class 'numpy.ndarray'>, <class 'object'>)
This choice has made the implementation much simpler:
Masked only has to
worry about how to deal with masked values, while
Quantity can worry just
about unit propagation, etc. Indeed, an experiment showed that applying
Column (which is a subclass of
the result is a new
MaskedColumn that “just works”, with no need for the
overrides and special-casing that were needed to make
Column. (Because the behaviour does change
somewhat, however, we chose not to replace the existing implementation.)
In some respects, rather than think of
Masked as similar to
MaskedArray, it may be more useful to think of
Masked as similar
to marking bad elements in arrays with NaN (not-a-number). Like those NaN,
the mask just propagates, except that for some operations like taking the mean
the equivalence of
nanmean is used.
Built-in mask mixin class.
The design uses
Masked as a factory class which automatically
generates new subclasses for any data class that is itself a
subclass of a predefined masked class, with
providing such a predefined class for
A scalar value or array of values with associated mask.
Masked version of ndarray.
Class Inheritance Diagram¶
Helpers for letting numpy functions interact with Masked arrays.
The module supplies helper routines for numpy functions that propagate
masks appropriately., for use in the
MaskedNDArray. They are not
very useful on their own, but the ones with docstrings are included in
the documentation so that there is a place to find out how the mask is
Count number of occurrences of each value in array of non-negative ints.
Broadcast arrays to a common shape.
Broadcast array to the given shape.
Construct an array from an index array and a set of arrays to choose from.
Copies values from one array to another, broadcasting as necessary.
Counts the number of non-zero values in the array
Return a new array with the same shape and type as a given array.
Return a full array with the same shape and type as a given array.
Insert values along the given axis before the given indices.
One-dimensional linear interpolation.
Perform an indirect stable sort using a sequence of keys.
Return an array of ones with the same shape and type as a given array.
Evaluate a piecewise-defined function.
Change elements of an array based on conditional and input values.
Replaces specified elements of an array with given values.
Changes elements of an array based on conditional and input values.
Return an array drawn from elements in choicelist, depending on conditions.
Return an array of zeros with the same shape and type as a given array.