dataselect

The tool dataselect will help with the bookkeeping and with creating lists of input files to feed to the Recipe System. The tool has a command line and an API. This tool finds files that match certain criteria defined through AstroData Tags and expressions involving AstroData Descriptors.

You can access the basic documentation from the command line by typing:

$ dataselect --help

usage: dataselect [-h] [--tags TAGS] [--xtags XTAGS] [--expr EXPRESSION]
                  [--strict] [--output OUTPUT] [--verbose] [--debug]
                  inputs [inputs ...]

Find files that matches certain criteria defined by tags and expression
involving descriptors.

positional arguments:
  inputs                Input FITS file

optional arguments:
  -h, --help            show this help message and exit
  --tags TAGS, -t TAGS  Comma-separated list of required tags.
  --xtags XTAGS         Comma-separated list of tags to exclude
  --expr EXPRESSION     Expression to apply to descriptors (and tags)
  --strict              Toggle on strict expression matching for exposure_time
                        (not just close) and for filter_name (match component
                        number).
  --output OUTPUT, -o OUTPUT
                        Name of the output file
  --verbose, -v         Toggle verbose mode when using -o
  --debug               Toggle debug mode

dataselect Command Line Tool

dataselect accepts list of input files separated by space, and wildcards. Below are some usage examples.

  1. This command selects all the FITS files inside the raw directory with a tag that matches DARK.

    $ dataselect raw/*.fits --tags DARK
    
  2. To select darks of a specific exposure time:

    $ dataselect raw/*.fits --tags DARK --expr='exposure_time==20'
    
  3. To send that list to a file that can be used later:

    $ dataselect raw/*.fits --tags DARK --expr='exposure_time==20' -o dark20s.lis
    
  4. This commands prints all the files in the current directory that do not have the CAL tag (calibration files).

    $ dataselect raw/*.fits --xtags CAL
    
  5. The xtags can be used with tags. To select images that are not flats:

    $ dataselect raw/*.fits --tags IMAGE --xtags FLAT
    
  6. This command selects all the files with a specific target name:

    $ dataselect --expr 'object=="FS 17"' raw/*.fits
    
  7. This command selects all the files with an “observation_class” descriptor that matches the “science” value and a specific exposure time:

    $ dataselect --expr '(observation_class=="science" and exposure_time==60.)' raw/*.fits
    

dataselect API

The same selections presented in the command line section above can be done from the dataselect API. Here is the API versions of the examples presented in the previous sections.

The list of files on disk must first be obtained with Python’s glob module.

>>> import glob
>>> all_files = glob.glob('raw/*.fits')

The dataselect module is located in gempy.adlibrary and must first be imported:

>>> from gempy.adlibrary import dataselect
  1. This command selects all the FITS files inside the raw directory with a tag that matches DARK.

    >>> all_darks = dataselect.select_data(all_files, ['DARK'])
    
  2. To select darks of a specific exposure time:

    >>> expression = 'exposure_time==20'
    >>> parsed_expr = dataselect.expr_parser(expression)
    >>> darks20 = dataselect.select_data(all_files, ['DARK'], [], parsed_expr)
    
  3. To send that list to a file that can be used later:

    >>> expression = 'exposure_time==20'
    >>> parsed_expr = dataselect.expr_parser(expression)
    >>> darks20 = dataselect.select_data(all_files, ['DARK'], [], parsed_expr)
    >>> with open('dark20s.lis', 'w') as f:
    ...     for filename in dark20:
    ...         f.write(filename + '\n')
    ...
    >>>
    

    Note that the need to send a list of a file on disk will probably not be very common when using the API as Reduce will take the Python list directly.

  4. This commands prints all the files in the current directory that do not have the CAL tag (calibration files).

    >>> non_cals = dataselect.select_data(all_files, [], ['CAL'])
    
  5. The xtags can be used with tags. To select images that are not flats:

    >>> has_tags = ['IMAGE']
    >>> has_not_tags = ['FLAT']
    >>> non_flat_images = dataselect.select_data(all_files, has_tags, has_not_tags)
    
  6. This command selects all the files with a specific target name:

    >>> expression = 'object="FS 17"'
    >>> parsed_expr = dataselect.expr_parser(expression)
    >>> stds = dataselect.select_data(all_files, expression=parsed_expr)
    
  7. This command selects all the files with an “observation_class” descriptor that matches the “science” value and a specific exposure time:

    >>> expression = '(observation_class=="science" and exposure_time==60.)'
    >>> parsed_expr = dataselect.expr_parser(expression)
    >>> sci60 = dataselect.select_data(all_files, expression=parsed_expr)
    

The strict Flag

The strict flag applies to the descriptors exposure_time() and filter_name(). To keep the user interface more friendly, in the expressions, the exposure time is matched on a “close enough” principle and the filter name is matched on a “general bandpass name” principle.

For example, if the exposure time in the header is 10.001 second, from a user’s perspective, asking to match “10” seconds is a lot nicer, exposure_time==10. Similarly, asking for the “H”-band filter is more natural than asking for the “H_G0203” filter.

However, there might be cases where the exposure time or the filter name must be matched exactly. In such case, the strict flag should be activated. For example:

$ dataselect raw/*.fits --strict --expr='exposure_time==0.95'

And:

>>> expression = 'exposure_time==0.95'
>>> parsed_expr = dataselect.expr_parser(expression, strict=True)
>>> filelist = dataselect.select_data(all_files, expression=parsed_expr)