VOTable XML Handling (
astropy.io.votable sub-package converts VOTable XML files to and
numpy record arrays. This subpackage was originally developed
This section provides a quick introduction of using
goal is to demonstrate the package’s basic features without getting into too
Reading a VOTable File¶
To read in a VOTable file, pass a file path to
from astropy.io.votable import parse votable = parse("votable.xml")
votable is a
VOTableFile object, which
can be used to retrieve and manipulate the data and save it back out
VOTable files are made up of nested
RESOURCE elements, each of
which may contain one or more
TABLE elements. The
elements contain the arrays of data.
To get at the
TABLE elements, you can write a loop over the
resources in the
for resource in votable.resources: for table in resource.tables: # ... do something with the table ... pass
However, if the nested structure of the resources is not important,
you can use
return a flat list of all tables:
for table in votable.iter_tables(): # ... do something with the table ... pass
Finally, if you expect only one table in the file, it might be most convenient
table = votable.get_first_table()
Alternatively, there is a convenience method to parse a VOTable file and return the first table all in one step:
from astropy.io.votable import parse_single_table table = parse_single_table("votable.xml")
Table object, you can get the data itself
array member variable:
data = table.array
This data is a
numpy record array.
The columns get their names from both the
attributes of the
FIELD elements in the
Suppose we had a
FIELD specified as follows:
<FIELD ID="Dec" name="dec_targ" datatype="char" ucd="POS_EQ_DEC_MAIN" unit="deg"> <DESCRIPTION> representing the ICRS declination of the center of the image. </DESCRIPTION> </FIELD>
The mapping from VOTable
ID attributes to
titles is highly confusing.
ID is guaranteed to be unique, but is not
name is not guaranteed to be unique, but is
numpy record dtypes,
names are required to be unique and
titles are not required, and are not required
to be unique.
ID most closely maps to
names, and VOTable’s
name most closely maps to
titles. However, in some cases where a VOTable
ID is not
name will be generated based on the VOTable
name. Unfortunately, VOTable fields do not have an attribute
that is both unique and required, which would be the most
convenient mechanism to uniquely identify a column.
When converting from an
astropy.io.votable.tree.Table object to
astropy.table.Table object, you can specify whether to give
ID attributes when naming the
columns. By default,
ID is given preference. To give
name preference, pass the keyword argument
This column of data can be extracted from the record array using:
>>> table.array['dec_targ'] array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826, 17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136, 17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055, 17.1553884541, 17.15539736932, 17.15539752176, 17.25736014763, # ... 17.2765703], dtype=object)
>>> table.array['Dec'] array([17.15153360566, 17.15153360566, 17.15153360566, 17.1516686826, 17.1516686826, 17.1516686826, 17.1536197136, 17.1536197136, 17.1536197136, 17.15375479055, 17.15375479055, 17.15375479055, 17.1553884541, 17.15539736932, 17.15539752176, 17.25736014763, # ... 17.2765703], dtype=object)
Building a New Table from Scratch¶
It is also possible to build a new table, define some field datatypes, and populate it with data.
To build a new table from a VOTable file:
from astropy.io.votable.tree import VOTableFile, Resource, Table, Field # Create a new VOTable file... votable = VOTableFile() # ...with one resource... resource = Resource() votable.resources.append(resource) # ... with one table table = Table(votable) resource.tables.append(table) # Define some fields table.fields.extend([ Field(votable, name="filename", datatype="char", arraysize="*"), Field(votable, name="matrix", datatype="double", arraysize="2x2")]) # Now, use those field definitions to create the numpy record arrays, with # the given number of rows table.create_arrays(2) # Now table.array can be filled with data table.array = ('test1.xml', [[1, 0], [0, 1]]) table.array = ('test2.xml', [[0.5, 0.3], [0.2, 0.1]]) # Now write the whole thing to a file. # Note, we have to use the top-level votable file object votable.to_xml("new_votable.xml")
Outputting a VOTable File¶
This section describes writing table data in the VOTable format using the
votable package directly. For some cases, however, the high-level
Unified File Read/Write Interface will often suffice and is somewhat more convenient to use. See
the Unified I/O VOTable section for details.
To save a VOTable file, call the
to_xml method. It accepts
either a string or Unicode path, or a Python file-like object:
There are a number of data storage formats supported by
TABLEDATA format is XML-based and
stores values as strings representing numbers. The
is more compact, and stores numbers in base64-encoded binary. VOTable
version 1.3 adds the
BINARY2 format, which allows for masking of
any data type, including integers and bit fields which cannot be
masked in the older
BINARY format. The storage format can be set
on a per-table basis using the
attribute, or globally using the
votable.get_first_table().format = 'binary' votable.set_all_tables_format('binary') votable.to_xml('binary.xml')
astropy.io.votable.tree.Table supports the VOTable Format Definition
and Version 1.4.
Some flexibility is provided to support the 1.0 draft version and
other nonstandard usage in the wild, see Verifying VOTables for more
Output always conforms to the 1.1, 1.2, 1.3, or 1.4 spec, depending on the input.
Many VOTable files in the wild do not conform to the VOTable specification. You
can set what should happen when a violation is encountered with the
keyword, which can take three values:
'ignore'- Attempt to parse the VOTable silently. This is the default setting.
'warn'- Attempt to parse the VOTable, but raise appropriate Warnings. It is possible to limit the number of warnings of the same type to a maximum value using the
astropy.io.votable.exceptions.conf.max_warningsitem in the Configuration System (astropy.config).
'exception'- Do not parse the VOTable and raise an exception.
from astropy.io.votable import parse votable = parse("votable.xml", verify='warn')
'warn' mean that
astropy will attempt to
parse the VOTable, but if the specification has been violated then success
cannot be guaranteed.
It is good practice to report any errors to the author of the application that generated the VOTable file to bring the file into compliance with the specification.
Any value in the table may be “missing”.
numpy masked array in each
instance. This behaves like an ordinary
numpy masked array, except
for variable-length fields. For those fields, the datatype of the
column is “object” and another
numpy masked array is stored there.
Therefore, operations on variable-length columns will not work — this
is because variable-length columns are not directly supported
numpy masked arrays.
The datatype specified by a
FIELD element is mapped to a
type according to the following table:
char (variable length)
O - A
char (fixed length)
unicodeChar (variable length)
O - A
unicodeChar (fixed length)
If the field is a fixed-size array, the data is stored as a
If the field is a variable-size array (that is,
a ‘*’), the cell will contain a Python list of
numpy values. Each
value may be either an array or scalar depending on the
Examining Field Types¶
To look up more information about a field:
>>> field = table.get_field_by_id('Dec') >>> field.datatype 'char' >>> field.unit 'deg'
Field descriptors should not be mutated. To change the set of
columns, convert the Table to an
astropy.table.Table, make the
changes, and then convert it back.
Data Serialization Formats¶
VOTable supports a number of different serialization formats.
TABLEDATA stores the data in pure XML, where the numerical values are written as human-readable strings.
BINARY is a binary representation of the data, stored in the XML as an opaque
BINARY2 was added in VOTable 1.3, and is identical to “BINARY”, except that it explicitly records the position of missing values rather than identifying them by a special value.
The serialization format can be selected in two ways:
1) By setting the
formatattribute of a
astropy.io.votable.tree.Tableobject:votable.get_first_table().format = "binary" votable.to_xml("new_votable.xml")
2) By overriding the format of all tables using the
tabledata_formatkeyword argument when writing out a VOTable file:votable.to_xml("new_votable.xml", tabledata_format="binary")
Converting to/from an
from astropy.io.votable import parse_single_table table = parse_single_table("votable.xml").to_table()
As a convenience, there is also a function to create an entire VOTable file with just a single table:
from astropy.io.votable import from_table, writeto votable = from_table(table) writeto(votable, "output.xml")
to_table will use the
ID attribute from the files to
create the column names for the
Table object. However,
it may be that you want to use the
name attributes instead. For this,
use_names_over_ids keyword to
True. Note that since field
names are not guaranteed to be unique in the VOTable specification,
but column names are required to be unique in
numpy structured arrays (and
astropy.table.Table objects), the names may be renamed by appending
numbers to the end in some cases.
File reads will be moderately faster if the
TABLE element includes
an nrows attribute. If the number of rows is not specified, the
record array must be resized repeatedly during load.
This package reads and writes data formats used by the Virtual Observatory (VO) initiative, particularly the VOTable XML format.
Prints a validation report for the given file.
Reads the header of a file to determine if it is a VOTable file.
Configuration parameters for
LINK elements: used to reference external documents and servers through a URI.
INFO elements: arbitrary key-value pairs for extensions to the standard.
FIELD element: describes the datatype of a particular column of data.
PARAM element: constant-valued columns in the data.
COOSYS element: defines a coordinate system.
TIMESYS element: defines a time system.
TABLE element: optionally contains data.
VOTABLE element: represents an entire file.
A base class for all classes that represent XML elements in the VOTABLE file.
Get an appropriate converter instance for a given field.
The base class for all converters.
This file contains routines to verify the correctness of UCD strings.
Parse the UCD into its component parts.
Returns False if ucd is not a valid unified content descriptor.
Various utilities and cookbook-like things.
Returns a writable file-like object suitable for streaming output.
Coerces and/or verifies the object p into a valid range-list-format parameter.
Validates a large collection of web-accessible VOTable files, and generates a report as a directory tree of HTML files.
Validates a large collection of web-accessible VOTable files.
Various XML-related utilities.
Given an arbitrary string, create one that can be used as an xml id.
Validates the given file against the appropriate VOTable schema.