VOTable Handling (astropy.io.votable
)#
Introduction#
The astropy.io.votable
sub-package converts VOTable XML files to and
from numpy
record arrays.
Note
If you want to read or write a single table in VOTable format, the recommended method is via the High-level Unified File I/O interface. In particular, see the Unified I/O VO Tables section.
Getting Started#
Reading a VOTable File#
To read a VOTable file, pass a file path to
parse
:
from astropy.io.votable import parse
votable = parse("votable.xml")
votable
is a VOTableFile
object, which
can be used to retrieve and manipulate the data and save it back out
to disk.
Writing a VOTable File#
This section describes writing table data in the VOTable format using the
votable
package directly. For some cases, however, the high-level
High-level Unified File I/O will often suffice and is somewhat more convenient to use. See
the Unified I/O VOTable section for details.
To save a VOTable file, call the
to_xml
method. It accepts
either a string or Unicode path, or a Python file-like object:
votable.to_xml('output.xml')
There are a number of data storage formats supported by
astropy.io.votable
. The TABLEDATA
format is XML-based and
stores values as strings representing numbers. The BINARY
format
is more compact, and stores numbers in base64-encoded binary. VOTable
version 1.3 adds the BINARY2
format, which allows for masking of
any data type, including integers and bit fields which cannot be
masked in the older BINARY
format. The storage format can be set
on a per-table basis using the format
attribute, or globally using the
set_all_tables_format
method:
votable.get_first_table().format = 'binary'
votable.set_all_tables_format('binary')
votable.to_xml('binary.xml')
The VOTable elements#
VOTables are built from nested elements. Let’s for example build a
votable containing an INFO
element:
>>> from astropy.io.votable.tree import VOTableFile, Info
>>> vot = VOTableFile()
>>> vot.infos.append(Info(name="date_obs", value="2025-01-01"))
These elements can be:
Here are some detailed explanations on some of these elements:
Using astropy.io.votable
#
Standard Compliance#
astropy.io.votable.tree.TableElement
supports the VOTable Format Definition
Version 1.1,
Version 1.2,
Version 1.3,
Version 1.4,
and Version 1.5,
Some flexibility is provided to support the 1.0 draft version and
other nonstandard usage in the wild, see Verifying VOTables for more
details.
Note
Each warning and VOTABLE-specific exception emitted has a number and is documented in more detail in Warnings and Exceptions.
Output always conforms to the 1.1, 1.2, 1.3, 1.4, 1.5 spec, depending on the input.
Verifying VOTables#
Many VOTable files in the wild do not conform to the VOTable specification. You
can set what should happen when a violation is encountered with the verify
keyword, which can take three values:
'ignore'
- Attempt to parse the VOTable silently. This is the default setting.
'warn'
- Attempt to parse the VOTable, but raise appropriate Warnings. It is possible to limit the number of warnings of the same type to a maximum value using theastropy.io.votable.exceptions.conf.max_warnings
item in the Configuration System (astropy.config).
'exception'
- Do not parse the VOTable and raise an exception.
The verify
keyword can be used with the parse()
or parse_single_table()
functions:
from astropy.io.votable import parse
votable = parse("votable.xml", verify='warn')
It is possible to change the default verify
value through the
astropy.io.votable.conf.verify
item in the
Configuration System (astropy.config).
Note that 'ignore'
or 'warn'
mean that astropy
will attempt to
parse the VOTable, but if the specification has been violated then success
cannot be guaranteed.
It is good practice to report any errors to the author of the application that generated the VOTable file to bring the file into compliance with the specification.
Data Serialization Formats#
VOTable supports a number of different serialization formats.
TABLEDATA stores the data in pure XML, where the numerical values are written as human-readable strings.
BINARY is a binary representation of the data, stored in the XML as an opaque
base64
-encoded blob.BINARY2 was added in VOTable 1.3, and is identical to “BINARY”, except that it explicitly records the position of missing values rather than identifying them by a special value.
FITS stores the data in an external FITS file. This serialization is not supported by the
astropy.io.votable
writer, since it requires writing multiple files.PARQUET
stores the data in an external PARQUET file, similar to FITS serialization. Reading and writing is fully supported by theastropy.io.votable
writer and theastropy.io.votable.parse
reader. The parquet file can be referenced with either absolute and relative paths. The parquet serialization can be used as part of the unified Table I/O (see next section), by setting theformat
argument to'votable.parquet'
.
The serialization format can be selected in two ways:
1) By setting the
format
attribute of aastropy.io.votable.tree.TableElement
object:votable.get_first_table().format = "binary" votable.to_xml("new_votable.xml")2) By overriding the format of all tables using the
tabledata_format
keyword argument when writing out a VOTable file:votable.to_xml("new_votable.xml", tabledata_format="binary")
Converting to/from an astropy.table.Table
#
The VOTable standard does not map conceptually to an
astropy.table.Table
. However, a single table within the VOTable
file may be converted to and from an astropy.table.Table
:
from astropy.io.votable import parse_single_table
table = parse_single_table("votable.xml").to_table()
As a convenience, there is also a function to create an entire VOTable file with just a single table:
from astropy.io.votable import from_table, writeto
votable = from_table(table)
writeto(votable, "output.xml")
Note
By default, to_table
will use the ID
attribute from the files to
create the column names for the Table
object. However,
it may be that you want to use the name
attributes instead. For this,
set the use_names_over_ids
keyword to True
. Note that since field
names
are not guaranteed to be unique in the VOTable specification,
but column names are required to be unique in numpy
structured arrays (and
thus astropy.table.Table
objects), the names may be renamed by appending
numbers to the end in some cases.
Performance Considerations#
File reads will be moderately faster if the TABLE
element includes
an nrows attribute. If the number of rows is not specified, the
record array must be resized repeatedly during load.
Data Origin#
Introduction#
Extract basic provenance information from VOTable header. The information is described in DataOrigin IVOA note: https://www.ivoa.net/documents/DataOrigin/.
DataOrigin includes both the query information (such as publisher, contact, versions, etc.) and the Dataset origin (such as Creator, bibliographic links, URL, etc.)
This API retrieves Metadata from INFO in VOTable.
Getting Started#
To extract DataOrigin from VOTable
Example: VizieR catalogue J/AJ/167/18
>>> from astropy.io.votable import parse
>>> from astropy.io.votable.dataorigin import extract_data_origin
>>> votable = parse("https://vizier.cds.unistra.fr/viz-bin/conesearch/J/AJ/167/18/table4?RA=265.51&DEC=-22.71&SR=0.1")
>>> data_origin = extract_data_origin(votable)
>>> print(data_origin)
publisher: CDS
server_software: 7.4.5
service_protocol: ivo://ivoa.net/std/ConeSearch/v1.03
request_date: 2025-03-05T14:18:05
contact: cds-question@unistra.fr
publisher: CDS
ivoid: ivo://cds.vizier/j/aj/167/18
citation: doi:10.26093/cds/vizier.51670018
reference_url: https://cdsarc.cds.unistra.fr/viz-bin/cat/J/AJ/167/18
rights_uri: https://cds.unistra.fr/vizier-org/licences_vizier.html
creator: Hong K.
...
Contents and metadata#
astropy.io.votable.dataorigin.extract_data_origin
returns a astropy.io.votable.dataorigin.DataOrigin
(class) container which is made of:
a
astropy.io.votable.dataorigin.QueryOrigin
(class) container describing the request.QueryOrigin
is considered to be unique for the whole VOTable. It includes metadata like the publisher, the contact, date of execution, query, etc.a list of
astropy.io.votable.dataorigin.DatasetOrigin
(class) container for each Element having DataOrigin information.DataSetOrigin
is a basic provenance of the datasets queried. Each attribute is a list. It includes metadata like authors, ivoid, landing pages, ….
Examples#
Get the (Data Center) publisher and the Creator of the dataset
>>> print(data_origin.query.publisher)
CDS
>>> print(data_origin.origin[0].creator)
['Hong K.']
Other capabilities#
DataOrigin container includes VO Elements:
Extract list of
astropy.io.votable.tree.Info
>>> # get DataOrigin with the description of each INFO >>> for dataset_origin in data_origin.origin: ... for info in dataset_origin.infos: ... print(f"{info.name}: {info.value} ({info.content})") ivoid: ivo://cds.vizier/j/aj/167/18 (IVOID of underlying data collection) creator: Hong K. (First author or institution) cites: bibcode:2024AJ....167...18H (Article or Data origin sources) editor: Astronomical Journal (AAS) (Editor name (article)) original_date: 2024 (Year of the article publication) ...
Extract tree node
astropy.io.votable.tree.Element
The following example extracts the citation from the header (in APA style).
>>> # get the Title retrieved in Element
>>> origin = data_origin.origin[0]
>>> vo_elt = origin.get_votable_element()
>>> title = vo_elt.description if vo_elt else ""
>>> print(f"APA: {','.join(origin.creator)} ({origin.publication_date[0]}). {title} [Dataset]. {data_origin.query.publisher}. {origin.citation[0]}")
APA: Hong K. (2024-11-06). Period variations of 32 contact binaries (Hong+, 2024) [Dataset]. CDS. doi:10.26093/cds/vizier.51670018
Add Data Origin INFO into VOTable:
>>> votable = parse("votable.xml")
>>> dataorigin.add_data_origin_info(votable, "query", "Data center name")
>>> dataorigin.add_data_origin_info(votable.resources[0], "creator", "Author name")