6. Tags

We described above how to generate tags for an AstroData derivative. In this section we’ll describe the algorithm that generates the complete tag set out of the individual TagSet instances. The algorithm collects all the tags in a list and then decides whether to apply them or not following certain rules, but let’s talk about TagSet first.

TagSet is actually a standard named tuple customized to generate default values (None) for its missing members. Its signature is:

TagSet(add=None, remove=None, blocked_by=None, blocks=None,
       if_present=None)

The most common TagSet is an additive one: TagSet(['FOO', 'BAR']). If all you need is to add tags, then you’re done here. But the real power of our tag generating system is that you can specify some conditions to apply a certain TagSet, or put restrictions on others. The different arguments to TagSet all expect a list (or some others work in the the following way):

  • add: if this TagSet is selected, then add all these members to the tag set.
  • remove: if this TagSet is selected, then prevent all these members from joining the tag set.
  • blocked_by: if any of the tags listed in here exist in the tag set, then discard this TagSet altogether.
  • blocks: discard from the list of unprocessed ones any TagSet that would add any of the tags listed here.
  • if_present: process this tag only if all the tags listed in here exist in the tag set at this point.

Note that blocked_by and blocks look like two sides of the same coin. This is intentional: which one to use is up to the programmer, depending on what will reduce the amount of typing and/or make the logic easier (sometimes one wants to block a bunch of other tags from a single one; sometimes one wants a tag to be blocked by a bunch of others). Furthermore, while blocks and blocked_by prevent the entire TagSet from being added if it contains a tag affected by these, remove only affects the specific tag.

Now, the algorithm works like this:

  1. Collect all the TagSet generated by methods in the instance that are decorated using astro_data_tag.
  2. Then we sort them out:
    1. Those that subtract tags from the tag set go first (the ones with non-empty remove or blocks), allowing them to act early on
    2. Those with non-empty blocked_by are moved to the end of the list, to ensure that other tags can be generated before them.
    3. Those with non-empty if_present are moved behind those with blocked_by.
  3. Now that we’ve sorted the tags, process them sequentially and for each one:
    1. If they require other tags to be present, make sure that this is the case. If the requirements are not met, drop the tagset. If not…
    2. Figure out if any other tag is blocking the tagset. This will be the case if any of the tags to be added is in the “blocked” list, or if any of the tags added by previous tag sets are in the blocked_by list of the one being processed. Then…
    3. If all the previous hurdles have been passed, apply the changes declared by this tag (add, remove, and/or block others).

Note that Python’s sort algorithm is stable. This means, that if two elements are indistinguishable from the point of view of the sorting algorithm, they are guaranteed to stay in the same relative position. To better understand how this affects our tags, and the algorithm itself, let’s follow up with an example taken from real code (the Gemini-generic and GMOS modules):

# Simple tagset, with only a constant, additive content
@astro_data_tag
def _tag_instrument(self):
    return TagSet(['GMOS'])

# Simple tagset, also with additive content. This one will
# check if the frame fits the requirements to be classified
# as "GMOS imaging". It returns a value conditionally:
# if this is not imaging, then it will return None, which
# means the algorithm will ignore the value
@astro_data_tag
def _tag_image(self):
    if self.phu.get('GRATING') == 'MIRROR':
        return TagSet(['IMAGE'])

# This is a slightly more complex TagSet (but fairly simple, anyway),
# inherited by all Gemini instruments.
@astro_data_tag
def _type_gcal_lamp(self):
    if self.phu.get('GCALLAMP') == 'IRhigh':
        shut = self.phu.get('GCALSHUT')
        if shut == 'OPEN':
            return TagSet(['GCAL_IR_ON', 'LAMPON'],
                          blocked_by=['PROCESSED'])
        elif shut == 'CLOSED':
            return TagSet(['GCAL_IR_OFF', 'LAMPOFF'],
                          blocked_by=['PROCESSED'])

# This tagset is only active when we detect that the frame is
# a bias. In that case we want to prevent the frame from being
# classified as "imaging" or "spectroscopy", which depend on the
# configuration of the instrument
@astro_data_tag
def _tag_bias(self):
    if self.phu.get('OBSTYPE') == 'BIAS':
        return TagSet(['BIAS', 'CAL'], blocks=['IMAGE', 'SPECT'])

These four simple tag methods will serve to illustrate the algorithm. Let’s pretend that the requirements for all four of them are somehow met, meaning that we get four TagSet instances in our list, in some random order. After step 1 in the algorithm, then, we may have collected the following list:

[ TagSet(['GMOS']),
  TagSet(['GCAL_IR_OFF', 'LAMPOFF'], blocked_by=['PROCESSED']),
  TagSet(['BIAS', 'CAL'], blocks=['IMAGE', 'SPECT']),
  TagSet(['IMAGE']) ]

The algorithm then proceeds to sort them. First, it will promote the TagSet with non-empty blocks or remove:

[ TagSet(['BIAS', 'CAL'], blocks=['IMAGE', 'SPECT']),
  TagSet(['GMOS']),
  TagSet(['GCAL_IR_OFF', 'LAMPOFF'], blocked_by=['PROCESSED']),
  TagSet(['IMAGE']) ]

Note that the other three TagSet stay in exactly the same order. Now the algorithm will sort the list again, moving the ones with non-empty blocked_by to the end:

[ TagSet(['BIAS', 'CAL'], blocks=['IMAGE', 'SPECT']),
  TagSet(['GMOS']), TagSet(['IMAGE']),
  TagSet(['GCAL_IR_OFF', 'LAMPOFF'], blocked_by=['PROCESSED']) ]

Note that at each step, all the instances (except the ones “being moved”) have kept the same position relative to each other -here’s where the “stability” of the sorting comes into play,- ensuring that each step does not affect the previous one. Finally, there are no if_present in our example, so no more instances are moved around.

Now the algorithm prepares three empty sets (tags, removals, and blocked), and starts iterating over the TagSet list.

  1. For the first TagSet there are no blocks or removals, so we just add its contents to the current sets: tags = {'BIAS', 'CAL'}, blocked = {'IMAGE', 'SPECT'}.
  2. Then comes TagSet(['GMOS']). Again, there are no removals in place, and GMOS is not in the list of blocked tags. Thus, we just add it to the current tag set: tags = {'BIAS', 'CAL', 'GMOS'}.
  3. When processing TagSet(['IMAGE']), the algorithm observes that this IMAGE is in the blocked set, and stops processing this tag set.
  4. Finally, neither GCAL_IR_OFF nor LAMPOFF are in blocked, and PROCESSED is not in tags, meaning that we can add this tag set to the final one.

Our result will look something like: {'BIAS', 'CAL', 'GMOS', 'GCAL_IR_OFF', 'LAMPOFF'}