3.2. General Design
As astronomical instruments have become more complex, there has been an increasing need for bespoke reduction packages and pipelines to deal with the specific needs of each instrument. Despite this complexity, many of the reduction steps can be very similar and the overall effort could be reduced significantly by sharing code. In practice, however, there are often issues regarding the manner in which the data are stored internally. The purpose of AstroData is to provide a uniform interface to the data and metadata, in a manner that is independent both of the specific instrument and the way the data are stored on disk, thereby facilitating this code-sharing. It is not a new astronomical data format.
One of the main features of AstroData is the use of descriptors, which
provide a level of abstraction between the metadata and the code accessing it.
Somebody using the AstroData interface who wishes to know the exposure time
of a particular astronomical observation represented by the AstroData
object
ad
can simply write ad.exposure_time()
without needing to concern
themselves about how that value is stored internally, for example, the name
of the FITS header keyword. These are discussed further in Descriptors.
AstroData also provides a clearer representation of the relationships
between different parts of the data produced from a single astronomical
observation. Modern astronomical instruments often contain multiple
detectors that are read out separately and the multi-extension FITS (MEF)
format used by many institutions, including Gemini Observatory, handles
the raw data well. In this format, each detector’s data and metadata is
assigned to its own extension,
while there is also a separate extension (the Primary Header Unit,
or PHU) containing additional metadata that applies to the entire
observation. However, as the data are processed, more data and/or
metadata may be added whose relationship is obscured by the limitations
of the MEF format. One example is the creation and propagation of information
describing the quality and uncertainty of the scientific data: while
this was a feature of
Gemini IRAF[1], the coding required to implement it was cumbersome
and AstroData uses the astropy.nddata.NDData
class,
as discussed in Data Containers. This makes the relationship between these
data much clearer, and AstroData creates a syntax that makes readily apparent the
roles of other data and metadata that may be created during the reduction
process.
An AstroData
object therefore consists of one or more self-contained
“extensions” (data and metadata) plus additional data and metadata that is
relevant to all the extensions. In many data reduction processes, the same
operation will be performed on each extension (e.g., subtracting an overscan
region from a CCD frame) and an axiom of AstroData is that iterating over
the extensions produces AstroData “slices” which retain knowledge of the
top-level data and metadata. Since a slice has one (or more) extensions
plus this top-level (meta)data, it too is an AstroData
object and,
specifically, an instance of the same subclass as its parent.
A final feature of AstroData is the implementation of very high-level metadata.
These data, called tags
, facilitate a key part of the Gemini data reduction
system, DRAGONS, by linking the astronomical data to the recipes
required to process them. They are explained in detail in Tags and the
Recipe System Programmers Manual[2].
Note
AstroData and DRAGONS have been developed for the reduction of data from Gemini Observatory, which produces data in the FITS format that is still the most widely-used format for astronomical data. In light of this, and the limited resources in the Science User Support Department, we have only developed support for FITS, even though the AstroData format is designed to be independent of the file format. In some cases, this has led to uncertainty and internal disagreement over where precisely to engage in abstraction and, should AstroData support a different file format, we may find alternative solutions that result in small, but possibly significant, changes to the API.