4. Data Providers¶
AstroData derivative classes act as a front end. Most of the heavy lifting
is actually performed by a
DataProvider class. There will typically be one
data provider per kind of data structure (so far, DRAGONS offers only
astrodata.fits.FitsProvider), and possibly one data provider proxy,
used to simplify the handling of data slicing (mapping most of it operations to
a regular data provider.)
The data provider acts as a hierarchical data storage. At the top level, it contains:
- A sequence of “extensions”, representing individual data planes and their
associated metadata, likely to
represent separate detectors or amplifiers. One can
access these extensions by index (e.g.,
ad). Indexing starts at 0, following Python’s convention.
- Global objects like masks or tables affecting all the extensions.
Each extension, in turn, is an instance of a Data Container, keeping important metadata (e.g., a FITS HDU’s header) and the main data for the extension (e.g., the data for a SCI extension, on Gemini data), along with any other associated data (masks, variance plane, tables, etc).
astrodata.core.DataProvider is, again, an abstract class, defining the
minimum interface expected from a data provider. This interface is described in
greater detail in the API Reference Guide, but among other things, one would
need to implement:
is_settable: AstroData exposes attributes from its data provider through its own
__setattr__. When trying to set a value for an attribute,
AstroDatawill use this method to discover whether the attribute can be modified.
append: a very important method, used to add new top-level components to the provider.
__getitem__: which returns a sliced view of the provider itself, meant to work with isolated extensions (examples of such a view are instances of
astrodata.fits.FitsProviderProxy). The view should behave in almost every way as a normal provider.
__len__: number of science extensions contained in this instance.
__itruediv__; used to perform in-place operations over the data.
variance: properties used to access certain common content. These methods generally return lists, with one element per extension.
There are also a number of properties that are not declared as abstract, but
still need to be reimplemented if one would want any kind of proper behavior
from the class:
exposed (used to determine if a certain attribute is to be
“exposed” to the user through the AstroData class),
is_single. Of particular interest is this later one:
is_single is a
predicate that should return
True only if a data provider has been sliced
using a single index, e.g.:
>>> d1 = provider[:4] >>> d1.is_sliced, d1.is_single (True, False) >>> d2 = provider >>> d2.is_sliced, d2.is_single (True, True)
This is important for the AstroData interface. When a data provider is being
considered a “single” slice, the behavior of many methods change. For example,
we mentioned that the
data property generally returns a list. If the
data provider in question is a single slice, then data would return a single
(i.e., scalar) element. This behavior is often seen also in Descriptors.
Refer always to the to documentation of a method to figure out how they behave. As
programmers, you should always include this explicitly in the documentation,
even if it’s implicit to AstroData.
4.1. Implementation Guidelines¶
AstroData does not impose any restriction on how to organize the data
internally, or how to deal with slicing. On slicing, we chose to use a “proxy”
FitsProvider. So, when sliced (through __getitem__), a
FitsProvider will return a
FitsProviderProxy, which is also a
DataProvider and reproduces the interface of its “proxied”
FitsProviderProxy keeps an internal mapping of the sliced
extensions. So, we may be referring to
sliced and this would be mapped
FitsProviderProxy can be studied as an example
implementation, but there is no need to follow them: please, evaluate carefully
the needs for your design, and feel free to depart from ours. As long as the
minimum interface is honored, AstroData will work as intended.
Note also that these classes were subject to heavy changes during development and a future release cycle should see them refactored for clarity and to drop any remnants of interfaces that were deprecated before the initial DRAGONS public release.
As a last comment: remember that
AstroData exposes its underlying
DataProvider interface up to a certain point. This can be used to
dynamically expose to the user additional attributes, dependent on the
underlying technology, or even to the instrument, if needed. This is all fine
and encouraged as long as everything is well documented, and the user
understands that certain parts of the interface may not be available when using
different observatory’s files, for example.
4.2. Registering a Data Provider to be Used with AstroData¶
Once we have a new data provider class, we need to let AstroData know how to
use it, and which class will make use of it. Normally a new data provider will
be associated to a new second level AstroData class (ie. a direct descendant to
AstroData, and a sibling of
AstroDataFits). This does not have to be
always the case, though: if an observatory organizes their FITS files in a way
that significantly departs from Gemini’s, then creating a separate data
provider may be justified, if it makes it easier to deal with the data.
There are no instances of this as of yet, but we’ve made a conscious effort during the design phase to make as easy as possible to plug in new providers. Future release of this document will address this topic.
|||For efficiency reasons, and to keep just one version of the data. The method may decide to return a sliced copy instead, but this is a design decision.|