cngi.vis.join_vis
¶
-
join_vis
(mxds, vis1, vis2)[source]¶ Concatenate together two Visibility xds’s of compatible shape from the same mxds
The data variables of the two datasets are merged together, with some limitations (see “Current Limitations” in the Notes section).
Coordinate values that are not also being used as dimensions are compared for equality.
Certain known attributes are updated, namely “ddi”. For the rest, they are merged where keys are present in one dataset but not the other, or the values from the first dataset override those from the second where the keys are the same.
- Parameters
mxds (xarray.core.dataset.Dataset) – input multi-xarray Dataset with global data
vis1 (str) – first visibility partition in the mxds to join
vis2 (str) – second visibility partition in the mxds to join
- Returns
New output multi-xarray Dataset with global data
- Return type
xarray.core.dataset.Dataset
Warning
Joins are highly discouraged for datasets that don’t share a common ‘global’ DDI (ie are sourced from different .zarr archives). Think really hard about if a join would even mean anything before doing so.
Warning
DDIs are separated by spectral window and correlation (polarity) because it is a good indicator of how data is gathered in hardware. Ultimately, if source data comes from different DDIs, it means that the data followed different paths through hardware during the measurement. This is important in that there are more likely to be discontinuities across DDIs than within them. Therefore, don’t haphazardly join DDIs, because it could break the inherent link between data and hardware.
Notes
Conflicts in data variable values between datasets:
There are many ways that data values could end up differing between datasets for the same coordinate values. One example is the error bars represented in SIGMA or WEIGHT could differ for a different spw.
There are many possible solutions to dealing with conflicting data values:
only allow joins that don’t have conflicts (current solution)
add extra indexes CHAN, POL, and/or SPW to the data variables that conflict
add extra indexes CHAN, POL, and/or SPW to all data variables
numerically merge the values (average, max, min, etc)
override the values in xds1 with the values in xds2
Joins are allowed for:
Datasets that have all different dimension values.
Example: xds1 covers time range 22:00-22:59, and xds2 covers time range 23:00-24:00
Datasets that have overlapping dimension values with matching data values at all of those coordinates.
Example: xds1.PROCESSOR_ID[0][0] == xds2.PROCESSOR_ID[0][0]
Current Limitations:
Joins are not allowed for datasets that have overlapping dimension values with mismatched data values at any of those coordinates.
Example: xds1.PROCESSOR_ID[0][0] != xds2.PROCESSOR_ID[0][0]
See “Conflicts in data variable values”, above
Joins between ‘global’ datasets, such as those returned by cngi.dio.read_vis(ddi=’global’), are probably meaningless and should be avoided. Datasets do not need to have the same shape.
Example xds1.DATA.shape != xds2.DATA.shape
Examples
### Use cases (some of them), to be turned into examples. Note: these use cases come from CASA’s mstransform(combinespws=True) and may not apply to ddijoin.
universal calibration across spws
autoflagging with broadband rfi
uvcontfit and uvcontsub
joining datasets that had previously been split, operated on, and are now being re-joined