Astropy Table and DataFrames#

Pandas is a popular data manipulation library for Python that provides a DataFrame object which is similar to astropy.table. A common question is why Astropy does not use DataFrame as the base table object. The answer stems from a number of domain-specific requirements related to astronomical data and analysis.

Units and Quantities#

Astronomy is a physical science, and the data often have units associated with them. The astropy.table package natively supports Quantity columns, which are a powerful way to attach units to array data and perform unit-aware operations. In addition, the base Column class holds a unit attribute as metadata to allow tracking of the units of the data for applications not using Quantity.

Pandas does not provide support for units.

Multi-dimensional and Structured Columns#

Astronomers deal with images, spectra, and other multi-dimensional data that are commonly stored in a table. An example is a source catalog with an image thumbnail and a spectrum for each source. Structured columns are less common, but are useful for storing vectorized data like an EarthLocation in a table.

Pandas is not able to natively store multi-dimensional or structured columns.

Lossless representation of FITS and VOTable data via metadata#

The astropy.table package strives to provide lossless representation of FITS and VOTable data. This means that when you read a FITS or VOTable file into a table and then write it back out, the data will be effectively identical. This is made possible by robust support for table and column metadata which allows storing and propagating common column information such as the unit, description, and format. For VOTable data, more information like the UCD is maintained.

Pandas provides limited support for metadata, but as of late-2024 it is highlighted as “experimental” in the documentation.

Time and Coordinates#

Time and coordinates are fundamental to astronomy, and astropy provides robust support for them with the Time and SkyCoord classes. Arrays of times and coordinates can be natively stored in astropy.table, meaning that the full power of these objects is available when working with them as columns within a table.

Pandas supports timeseries data, but with key limitations:

  • Leap seconds are not supported. In many circumstances (for instance planning an observation) this limitation is not acceptable.

  • Pandas times are stored with 64-bit precision, which is not sufficient for some astronomical applications. Astropy uses 128-bit precision for time to allow sub-nanosecond precision over the age of the universe.

  • Different time scales common in astronomy (e.g., TAI, UT1) are not supported.

  • Time formats used in astronomy such as the FITS time format are not supported.

Pandas does not support sky coordinate columns.

Responsiveness to Community Needs#

The astropy.table package is developed by the Astropy community, which is focused on the needs of astronomers and astrophysicists. This means that the development of the package can be responsive to the needs of this community and we can develop features without being constrained by the potential impact to the far broader user base of Pandas.

Interoperability#

We recognize that Pandas is a popular library and that there are many users who are familiar with it. For this reason, we have made it easy to convert between astropy.table and DataFrame, as documented in Interfacing with the Pandas Package. This allows users to take advantage of the features of both packages as needed, within the limitations stated above.

We are also committed to supporting interoperability with a more generalized concept of the DataFrame, with packages like polars gaining popularity.