Visibilities

An overview of the visibility data structure and manipulation.

This walkthrough is designed to be run in a Jupyter notebook on Google Colaboratory. To open the notebook in colab, go here

[1]:
# Installation
# for this demonstration we will use the data from the ALMA First Look at Imaging CASAguide
import os
print("installing cngi (takes a few minutes)...")
os.system("apt-get install libgfortran3")
os.system("pip install --index-url https://casa-pip.nrao.edu/repository/pypi-casa-release/simple casatools==6.0.0.27")
os.system("pip install cngi-prototype==0.0.48")

print("downloading MeasurementSet from CASAguide First Look at Imaging...")
os.system("wget https://bulk.cv.nrao.edu/almadata/public/working/sis14_twhya_calibrated_flagged.ms.tar")
os.system("tar -xvf sis14_twhya_calibrated_flagged.ms.tar")

print('complete')
installing cngi (takes a few minutes)...
downloading MeasurementSet from CASAguide First Look at Imaging...
complete

Initialize the Environment

InitializeFramework instantiates a client object (does not need to be returned and saved by caller). Once this object exists, all Dask objects automatically know to use it for parallel execution.

>>> from cngi.direct import InitializeFramework
>>> client = InitializeFramework(workers=4, memory='2GB')
>>> print(client)
<Client: 'tcp://127.0.0.1:33227' processes=4 threads=4, memory=8.00 GB>

Omitting this step will cause the subsequent Dask dataframe operations to use the default built-in scheduler for parallel execution (which can actually be faster on local machines anyway)

Google Colab doesn’t really support the dask.distributed environment particularly well, so we will let Dask use its default scheduler.

Convert an MS to Xarray/Zarr

CNGI uses an xarray data model and the zarr storage format to hold the contents of the MS.

The zarr format is a directory structure with partitions and chunks held under the top level folder. The MS data will be partitioned by the Data Description ID (DDI) column. A DDI is a unique SPW/Polarization pair so the data has a fixed shape within each DDI (but potentially different shapes across DDIs).

Various index columns are converted to xarray coordinates rather than data variables. Polarization and Spectral Window table data pertaining to each DDI is held in the coordinates and attributes section.

Remaining tables (other than the main visibilities, spw and polarization) are held in a global partition of all other meta data.

[2]:
from cngi.conversion import convert_ms

convert_ms('sis14_twhya_calibrated_flagged.ms');
processing sis14_twhya_calibrated_flagged.ms
writing global partition
meta data processing time  0.886523962020874
**********************************
Processing ddi 0
Selecting and sorting time  0.026074647903442383
n_time: 410   n_baseline: 210   n_chan: 384   n_pol: 2  chunking:  [100, 400, 20, 1]  batchsize:  100
Completed ddi 0  process time  19.260385274887085
**********************************

In our example, there is one DDI (0), so the resulting visibilities zarr directory has one partition plus the global data

[3]:
!ls sis14_twhya_calibrated_flagged.vis.zarr/
0  global

Open and Inspect Visibility Data

Each SPW in the visiblity data must go in to its own xarray dataset.

Before trying to open something, we can use the describe_vis function to see what’s what in the zarr directory.

[4]:
from cngi.dio import describe_vis

describe_vis('sis14_twhya_calibrated_flagged.vis.zarr')
[4]:
spw_id size_GB channels times baselines fields
ddi
0 0 1.053953 384 410 210 410

If multiple DDI partitions exist, then each one would need to be read individually. In our example, there is only DDI 0, so we will read that one.

[ ]:
from cngi.dio import read_vis

# returns an xarray dataset representation of the specified zarr directory partition
vis_xds = read_vis('sis14_twhya_calibrated_flagged.vis.zarr', ddi=0)

Within the xarray dataset visibility structure, we find the old column names from the MS main table spread between several locations

Most of the main table columns are stored in the Data variables section, while the Coordinates section holds what used to be id and index columns. The Dimensions section holds the size of the visibility data axes as determined by the coordinates. Lastly, the Attributes section holds the remaining metadata from the polarization and spectral window tables associated with this DDI.

[6]:
print(vis_xds)
<xarray.Dataset>
Dimensions:        (baseline: 210, chan: 384, pair: 2, pol: 2, receptor: 2, spw: 1, time: 410, uvw_index: 3)
Coordinates:
    antennas       (baseline, pair) int32 dask.array<chunksize=(210, 2), meta=np.ndarray>
  * baseline       (baseline) int64 0 1 2 3 4 5 6 ... 204 205 206 207 208 209
  * chan           (chan) float64 3.725e+11 3.725e+11 ... 3.728e+11 3.728e+11
    chan_width     (chan) float64 dask.array<chunksize=(384,), meta=np.ndarray>
    corr_product   (receptor, pol) int32 dask.array<chunksize=(2, 2), meta=np.ndarray>
    effective_bw   (chan) float64 dask.array<chunksize=(384,), meta=np.ndarray>
    field          (time) <U10 dask.array<chunksize=(100,), meta=np.ndarray>
    interval       (time) float64 dask.array<chunksize=(100,), meta=np.ndarray>
    observation    (time) <U22 dask.array<chunksize=(100,), meta=np.ndarray>
  * pol            (pol) int32 9 12
    processor      (time) <U14 dask.array<chunksize=(100,), meta=np.ndarray>
    resolution     (chan) float64 dask.array<chunksize=(384,), meta=np.ndarray>
    scan           (time) int32 dask.array<chunksize=(100,), meta=np.ndarray>
  * spw            (spw) int32 0
    state          (time) <U82 dask.array<chunksize=(100,), meta=np.ndarray>
  * time           (time) datetime64[ns] 2012-11-19T07:37:00 ... 2012-11-19T09:11:01.631999969
  * uvw_index      (uvw_index) <U2 'uu' 'vv' 'ww'
Dimensions without coordinates: pair, receptor
Data variables:
    ARRAY_ID       (time, baseline) int32 dask.array<chunksize=(100, 210), meta=np.ndarray>
    DATA           (time, baseline, chan, pol) complex128 dask.array<chunksize=(100, 210, 20, 1), meta=np.ndarray>
    EXPOSURE       (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    FEED1          (time, baseline) int32 dask.array<chunksize=(100, 210), meta=np.ndarray>
    FEED2          (time, baseline) int32 dask.array<chunksize=(100, 210), meta=np.ndarray>
    FLAG           (time, baseline, chan, pol) bool dask.array<chunksize=(100, 210, 20, 1), meta=np.ndarray>
    FLAG_ROW       (time, baseline) bool dask.array<chunksize=(100, 210), meta=np.ndarray>
    SIGMA          (time, baseline, pol) float64 dask.array<chunksize=(100, 210, 1), meta=np.ndarray>
    TIME_CENTROID  (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    UVW            (time, baseline, uvw_index) float64 dask.array<chunksize=(100, 210, 3), meta=np.ndarray>
    WEIGHT         (time, baseline, pol) float64 dask.array<chunksize=(100, 210, 1), meta=np.ndarray>
Attributes:
    assoc_nature:       ['', '', '', '', '', '', '', '', '', '', '', '', '', ...
    assoc_spw_id:       [25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, ...
    auto_correlations:  0
    bbc_no:             2
    ddi:                0
    freq_group:         0
    freq_group_name:
    if_conv_chain:      0
    meas_freq_ref:      5
    name:               ALMA_RB_07#BB_2#SW-01#FULL_RES
    net_sideband:       2
    num_chan:           384
    ref_frequency:      372533086425.9812
    total_bandwidth:    234375000.0

We can access the global meta data by reading the global partition. This is not associated with any single DDI and refers to the entire collection. Tables with contents that can be indexed cleanly are stored as data variables and coordinates, while remaining tables with more variable structure are stored as attributes.

Note that there are no visiblities or other main table columns in this partition.

[7]:
global_xds = read_vis('sis14_twhya_calibrated_flagged.vis.zarr', ddi='global')

print(global_xds)
<xarray.Dataset>
Dimensions:                 (antenna: 26, d1: 1, d2: 2, d3: 3, feed: 1, field: 7, observation: 1, processor: 3, receptors: 3, source: 5, spw: 1, state: 20, time_fcmd: 5067, time_hist: 1132)
Coordinates:
  * antenna                 (antenna) int64 0 1 2 3 4 5 6 ... 20 21 22 23 24 25
  * feed                    (feed) int64 0
  * field                   (field) <U10 'J0522-364' 'J0539+145' ... '3c279'
  * observation             (observation) <U22 'uid://A002/X327408/X6f'
  * processor               (processor) <U14 'CORRELATOR (0)' ... 'CORRELATOR (2)'
  * receptors               (receptors) int64 0 1 2
  * source                  (source) int32 0 1 2 3 4
  * spw                     (spw) int64 0
  * state                   (state) <U82 'CALIBRATE_BANDPASS#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (0)' ... 'OBSERVE_TARGET#ON_SOURCE (19)'
  * time_fcmd               (time_fcmd) datetime64[ns] 2012-11-19T07:28:06.096000671 ... 2012-11-19T09:11:31.978034973
  * time_hist               (time_hist) datetime64[ns] 2012-11-28T06:33:06.728000641 ... 2014-05-01T14:00:24.296000481
Dimensions without coordinates: d1, d2, d3
Data variables:
    ANT_DISH_DIAMETER       (antenna) float64 dask.array<chunksize=(26,), meta=np.ndarray>
    ANT_FLAG_ROW            (antenna) bool dask.array<chunksize=(26,), meta=np.ndarray>
    ANT_MOUNT               (antenna) <U16 dask.array<chunksize=(26,), meta=np.ndarray>
    ANT_NAME                (antenna) <U16 dask.array<chunksize=(26,), meta=np.ndarray>
    ANT_OFFSET              (antenna, d3) float64 dask.array<chunksize=(26, 3), meta=np.ndarray>
    ANT_POSITION            (antenna, d3) float64 dask.array<chunksize=(26, 3), meta=np.ndarray>
    ANT_STATION             (antenna) <U16 dask.array<chunksize=(26,), meta=np.ndarray>
    ANT_TYPE                (antenna) <U16 dask.array<chunksize=(26,), meta=np.ndarray>
    FCMD_APPLICATION        (time_hist) <U16 dask.array<chunksize=(1132,), meta=np.ndarray>
    FCMD_APPLIED            (time_fcmd) bool dask.array<chunksize=(5067,), meta=np.ndarray>
    FCMD_COMMAND            (time_fcmd) <U75 dask.array<chunksize=(1267,), meta=np.ndarray>
    FCMD_INTERVAL           (time_fcmd) float64 dask.array<chunksize=(5067,), meta=np.ndarray>
    FCMD_LEVEL              (time_fcmd) int32 dask.array<chunksize=(5067,), meta=np.ndarray>
    FCMD_MESSAGE            (time_hist) <U188 dask.array<chunksize=(283,), meta=np.ndarray>
    FCMD_OBJECT_ID          (time_hist) int32 dask.array<chunksize=(1132,), meta=np.ndarray>
    FCMD_OBSERVATION_ID     (time_hist) int32 dask.array<chunksize=(1132,), meta=np.ndarray>
    FCMD_ORIGIN             (time_hist) <U24 dask.array<chunksize=(1132,), meta=np.ndarray>
    FCMD_PRIORITY           (time_hist) <U16 dask.array<chunksize=(1132,), meta=np.ndarray>
    FCMD_REASON             (time_fcmd) <U47 dask.array<chunksize=(1267,), meta=np.ndarray>
    FCMD_SEVERITY           (time_fcmd) int32 dask.array<chunksize=(5067,), meta=np.ndarray>
    FCMD_TYPE               (time_fcmd) <U16 dask.array<chunksize=(2534,), meta=np.ndarray>
    FEED_BEAM_ID            (spw, antenna, feed) int32 dask.array<chunksize=(1, 26, 1), meta=np.ndarray>
    FEED_BEAM_OFFSET        (spw, antenna, feed, d2, receptors) float64 dask.array<chunksize=(1, 26, 1, 2, 3), meta=np.ndarray>
    FEED_INTERVAL           (spw, antenna, feed) float64 dask.array<chunksize=(1, 26, 1), meta=np.ndarray>
    FEED_NUM_RECEPTORS      (spw, antenna, feed) int32 dask.array<chunksize=(1, 26, 1), meta=np.ndarray>
    FEED_POLARIZATION_TYPE  (spw, antenna, feed, receptors) <U16 dask.array<chunksize=(1, 26, 1, 3), meta=np.ndarray>
    FEED_POL_RESPONSE       (spw, antenna, feed, receptors, receptors) complex128 dask.array<chunksize=(1, 26, 1, 3, 3), meta=np.ndarray>
    FEED_POSITION           (spw, antenna, feed, d3) float64 dask.array<chunksize=(1, 26, 1, 3), meta=np.ndarray>
    FEED_RECEPTOR_ANGLE     (spw, antenna, feed, receptors) float64 dask.array<chunksize=(1, 26, 1, 3), meta=np.ndarray>
    FEED_TIME               (spw, antenna, feed) datetime64[ns] dask.array<chunksize=(1, 26, 1), meta=np.ndarray>
    FIELD_CODE              (field) <U16 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_DELAY_DIR         (field, d2, d1) float64 dask.array<chunksize=(7, 2, 1), meta=np.ndarray>
    FIELD_DelayDir_Ref      (field) int32 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_FLAG_ROW          (field) bool dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_NUM_POLY          (field) int32 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_PHASE_DIR         (field, d2, d1) float64 dask.array<chunksize=(7, 2, 1), meta=np.ndarray>
    FIELD_PhaseDir_Ref      (field) int32 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_REFERENCE_DIR     (field, d2, d1) float64 dask.array<chunksize=(7, 2, 1), meta=np.ndarray>
    FIELD_RefDir_Ref        (field) int32 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_SOURCE_ID         (field) int32 dask.array<chunksize=(7,), meta=np.ndarray>
    FIELD_TIME              (field) datetime64[ns] dask.array<chunksize=(7,), meta=np.ndarray>
    OBS_FLAG_ROW            (observation) bool dask.array<chunksize=(1,), meta=np.ndarray>
    OBS_OBSERVER            (observation) <U16 dask.array<chunksize=(1,), meta=np.ndarray>
    OBS_RELEASE_DATE        (observation) float64 dask.array<chunksize=(1,), meta=np.ndarray>
    OBS_SCHEDULE_TYPE       (observation) <U16 dask.array<chunksize=(1,), meta=np.ndarray>
    OBS_TELESCOPE_NAME      (observation) <U16 dask.array<chunksize=(1,), meta=np.ndarray>
    OBS_TIME_RANGE          (observation, d2) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray>
    PROC_FLAG_ROW           (processor) bool dask.array<chunksize=(3,), meta=np.ndarray>
    PROC_MODE_ID            (processor) int32 dask.array<chunksize=(3,), meta=np.ndarray>
    PROC_SUB_TYPE           (processor) <U21 dask.array<chunksize=(3,), meta=np.ndarray>
    PROC_TYPE_ID            (processor) int32 dask.array<chunksize=(3,), meta=np.ndarray>
    SRC_CALIBRATION_GROUP   (spw, source) int32 dask.array<chunksize=(1, 5), meta=np.ndarray>
    SRC_CODE                (spw, source) <U16 dask.array<chunksize=(1, 5), meta=np.ndarray>
    SRC_DIRECTION           (spw, source, d2) float64 dask.array<chunksize=(1, 5, 2), meta=np.ndarray>
    SRC_INTERVAL            (spw, source) float64 dask.array<chunksize=(1, 5), meta=np.ndarray>
    SRC_NAME                (spw, source) <U16 dask.array<chunksize=(1, 5), meta=np.ndarray>
    SRC_NUM_LINES           (spw, source) int32 dask.array<chunksize=(1, 5), meta=np.ndarray>
    SRC_PROPER_MOTION       (spw, source, d2) float64 dask.array<chunksize=(1, 5, 2), meta=np.ndarray>
    SRC_TIME                (spw, source) datetime64[ns] dask.array<chunksize=(1, 5), meta=np.ndarray>
    STATE_CAL               (state) float64 dask.array<chunksize=(20,), meta=np.ndarray>
    STATE_FLAG_ROW          (state) bool dask.array<chunksize=(20,), meta=np.ndarray>
    STATE_LOAD              (state) float64 dask.array<chunksize=(20,), meta=np.ndarray>
    STATE_REF               (state) bool dask.array<chunksize=(20,), meta=np.ndarray>
    STATE_SIG               (state) bool dask.array<chunksize=(20,), meta=np.ndarray>
    STATE_SUB_SCAN          (state) int32 dask.array<chunksize=(20,), meta=np.ndarray>
Attributes:
    asdm_antennaid:               ['Antenna_0', 'Antenna_1', 'Antenna_2', 'An...
    asdm_antennamake:             ['AEM_12', 'AEM_12', 'AEM_12', 'AEM_12', 'A...
    asdm_antennaname:             ['DA41', 'DA42', 'DA44', 'DA45', 'DA46', 'D...
    asdm_antennatype:             ['GROUND_BASED', 'GROUND_BASED', 'GROUND_BA...
    asdm_assocantennaid:          ['', '', '', '', '', '', '', '', '', '', ''...
    asdm_caldataid:               ['CalData_0', 'CalData_0', 'CalData_0', 'Ca...
    asdm_calreductionid:          ['CalReduction_0', 'CalReduction_0', 'CalRe...
    asdm_chanfreq:                [[[184550000000.0], [186550000000.0], [1888...
    asdm_chanwidth:               [[[1500000000.0], [2500000000.0], [20000000...
    asdm_dishdiameter:            [12.0, 12.0, 12.0, 12.0, 12.0, 12.0, 12.0, ...
    asdm_drypath:                 [[[1.5075339078903198]], [[1.50631928443908...
    asdm_endvalidtime:            [4860030637.2595, 4860030637.2595, 48600306...
    asdm_freqlo:                  [[[97963275000.0]], [[97963275000.0], [9968...
    asdm_frequencyband:           ['UNSPECIFIED', 'ALMA_RB_03', 'ALMA_RB_03',...
    asdm_inputantennanames:       [[['DA41']], [['DA42']], [['DA44']], [['DA4...
    asdm_name:                    ['WVR', 'UNDEFINED', 'UNDEFINED', 'UNDEFINE...
    asdm_numchan:                 [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, ...
    asdm_numinputantennas:        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
    asdm_numlo:                   [1, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, ...
    asdm_numpoly:                 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
    asdm_offset:                  [[[0.0], [0.0], [0.0]], [[0.0], [0.0], [0.0...
    asdm_pathcoeff:               [[[[[5.696597872884013e-05]], [[2.828374090...
    asdm_polyfreqlimits:          [[[67000000000.0], [90000000000.0]], [[6700...
    asdm_position:                [[[2225081.0304043], [-5440064.01101605], [...
    asdm_receiverid:              [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, ...
    asdm_receiversideband:        ['NOSB', 'TSB', 'TSB', 'TSB', 'TSB', 'TSB',...
    asdm_reftemp:                 [[[[119.45744705200195], [48.73897772568922...
    asdm_sidebandlo:              [[['DSB']], [['LSB'], ['LSB'], ['LSB']], [[...
    asdm_spectralwindowid:        ['SpectralWindow_0', 'SpectralWindow_27', '...
    asdm_startvalidtime:          [4860026976.564501, 4860026976.564501, 4860...
    asdm_stationid:               ['Station_0', 'Station_1', 'Station_2', 'St...
    asdm_time:                    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
    asdm_timeinterval:            [[[4860026863.941999], [4363345172.912776]]...
    asdm_type:                    ['ANTENNA_PAD', 'ANTENNA_PAD', 'ANTENNA_PAD...
    asdm_water:                   [0.00043721461364834303, 0.0004537176442467...
    asdm_wetpath:                 [[[0.003352516097947955]], [[0.003476256737...
    asdm_wvrmethod:               ['ATM_MODEL', 'ATM_MODEL', 'ATM_MODEL', 'AT...
    cald_antenna_id:              [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, ...
    cald_cal_load_names:          [[['AMBIENT_LOAD'], ['HOT_LOAD']], [['AMBIE...
    cald_feed_id:                 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    cald_interval:                [4363345170.433152, 4363345169.85944, 43633...
    cald_num_cal_load:            [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
    cald_num_receptor:            [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ...
    cald_spectral_window_id:      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
    cald_temperature_load:        [[[16.850000381469727], [83.25000762939453]...
    cald_time:                    [7041699451.638201, 7041699451.925056, 7041...
    weat_antenna_id:              [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1...
    weat_dew_point:               [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0....
    weat_dew_point_flag:          [False, False, False, False, False, False, ...
    weat_interval:                [0.048, 0.048, 0.048, 0.048, 0.048, 0.048, ...
    weat_ns_wx_station_id:        [26, 27, 26, 27, 26, 27, 26, 27, 26, 27, 26...
    weat_ns_wx_station_position:  [[2225262.12, -5440307.3, -2480962.57], [22...
    weat_pressure:                [555.3649291992188, 555.3472290039062, 555....
    weat_pressure_flag:           [False, False, False, False, False, False, ...
    weat_rel_humidity:            [7.26800012588501, 7.462800025939941, 7.296...
    weat_rel_humidity_flag:       [False, False, False, False, False, False, ...
    weat_temperature:             [269.92529296875, 269.8935852050781, 269.96...
    weat_temperature_flag:        [False, False, False, False, False, False, ...
    weat_time:                    [4860026989.023999, 4860026989.023999, 4860...
    weat_wind_direction:          [2.96705961227417, 1.1519173383712769, 2.44...
    weat_wind_direction_flag:     [False, False, False, False, False, False, ...
    weat_wind_speed:              [1.5, 1.899999976158142, 2.700000047683716,...
    weat_wind_speed_flag:         [False, False, False, False, False, False, ...

Plotting

We can quickly spot check data fields. This is handy during subsequent analysis.

The visplot function supports both standard x-y plots and x-y-c color mesh visualization. The mode is determined by the number of dimensions in the data passed in

[8]:
from cngi.vis import visplot

# 1-D plot - let's just see the fields
# this plots field values over the range of data
visplot(vis_xds.field)
_images/visibilities_17_0.png
[9]:
# 2-D plot - fields versus scans
visplot(vis_xds.field, axis='scan')
_images/visibilities_18_0.png
[10]:
# 2-D plot of 4-D data - DATA over time
# when we specify an x-axis (time), the other axes (baseline, chan, pol) are averaged together
visplot(vis_xds.DATA, axis='time')
_images/visibilities_19_0.png
[11]:
# 3-D plot of 4-D data - DATA over time and channel
# we can give two axes to create a color mesh visualization
visplot(vis_xds.DATA, axis=['time','chan'])
_images/visibilities_20_0.png

Coordinate Strings

Whenever possible, integer index values from the old MS format have been replaced with the actual string name of the value being indexed.

This allows selection by either the index position or the actual string name (as shown in the following section).

[12]:
import numpy as np

# field_id (int) -> field (string)
print(np.unique(vis_xds.field))
['3c279' 'Ceres' 'J0522-364' 'J1037-295' 'TW Hya (5)']
[13]:
# processor_id -> processor
print(np.unique(vis_xds.processor))
['CORRELATOR (2)']
[14]:
# observation_id -> observation
print(np.unique(vis_xds.observation))
['uid://A002/X327408/X6f']
[15]:
# state_id -> state
print(np.unique(vis_xds.state))
['CALIBRATE_AMPLI#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (4)'
 'CALIBRATE_AMPLI#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (5)'
 'CALIBRATE_AMPLI#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (6)'
 'CALIBRATE_AMPLI#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (7)'
 'CALIBRATE_BANDPASS#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (0)'
 'CALIBRATE_BANDPASS#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (1)'
 'CALIBRATE_BANDPASS#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (2)'
 'CALIBRATE_BANDPASS#ON_SOURCE,CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (3)'
 'CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (8)'
 'CALIBRATE_PHASE#ON_SOURCE,CALIBRATE_WVR#ON_SOURCE (9)'
 'OBSERVE_TARGET#ON_SOURCE (10)' 'OBSERVE_TARGET#ON_SOURCE (11)'
 'OBSERVE_TARGET#ON_SOURCE (12)' 'OBSERVE_TARGET#ON_SOURCE (13)'
 'OBSERVE_TARGET#ON_SOURCE (14)' 'OBSERVE_TARGET#ON_SOURCE (15)'
 'OBSERVE_TARGET#ON_SOURCE (16)' 'OBSERVE_TARGET#ON_SOURCE (17)'
 'OBSERVE_TARGET#ON_SOURCE (18)' 'OBSERVE_TARGET#ON_SOURCE (19)']

Selection and Splitting

The xarray Dataset format has extensive built-in functions for selecting data and splitting by different criteria.

Rather than cluttering each CNGI function with many parameters for data selection, instead the Dataset should be appropriately split before calling the relevant CNGI API function(s).

[16]:
# select by dimension indices
new_xds = vis_xds.isel(time=[76,77,78], chan=[6,7,8,12])

visplot(new_xds.DATA, ['time', 'chan'])
_images/visibilities_27_0.png
[17]:
# select by range of dimension values
new_xds = vis_xds.sel(chan=slice(372.59e9, 372.63e9))

visplot(new_xds.DATA, ['time', 'chan'])
_images/visibilities_28_0.png
[18]:
# select by coordinate values
new_xds = vis_xds.where((vis_xds.field == 'Ceres') & (vis_xds.chan < 372.7e9), drop=True)

visplot(new_xds.DATA, ['time', 'chan'])
_images/visibilities_29_0.png
[19]:
# select by data variable values - find all baselines with antenna 5
new_xds = vis_xds.where((vis_xds.antennas[:,0] == 5) | (vis_xds.antennas[:,1] == 5), drop=True)

visplot(new_xds.baseline)
_images/visibilities_30_0.png

Flagging

Any boolean data variable can be used to flag any other data variable of common dimension(s).

Flagged values are set to np.nan, and subsequent math/analysis should be of the type that ignores nan values.

This is a simple operation and does not necessarily require a special function, but one is provided for conveinience. Here we will show both ways.

[20]:
# here is what the raw DATA column looks like
visplot(vis_xds.DATA, ['time', 'chan'])

# another way to look at the data, pick a random baseline (34)
visplot(vis_xds.DATA[:,34], 'chan')
_images/visibilities_32_0.png
_images/visibilities_32_1.png
[21]:
# apply the FLAG and FLAG_ROW data variables
flagged_xds = vis_xds.where((vis_xds.FLAG == 0) & (vis_xds.FLAG_ROW == 0))

# the white line is a set of flagged values (np.nan's)
visplot(flagged_xds.DATA, ['time', 'chan'])
visplot(flagged_xds.DATA[:,34], 'chan')
_images/visibilities_33_0.png
_images/visibilities_33_1.png
[22]:
# or with the cngi applyflags function
from cngi.vis import applyflags

flagged_xds = applyflags(vis_xds, flags=['FLAG', 'FLAG_ROW'])

visplot(flagged_xds.DATA[:,34], 'chan')
_images/visibilities_34_0.png
[23]:
# one difference is that the applyflags function will remove the flag variables from the return dataset
print(flagged_xds.data_vars)
Data variables:
    ARRAY_ID       (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    DATA           (time, baseline, chan, pol) complex128 dask.array<chunksize=(100, 210, 20, 1), meta=np.ndarray>
    EXPOSURE       (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    FEED1          (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    FEED2          (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    SIGMA          (time, baseline, pol) float64 dask.array<chunksize=(100, 210, 1), meta=np.ndarray>
    TIME_CENTROID  (time, baseline) float64 dask.array<chunksize=(100, 210), meta=np.ndarray>
    UVW            (time, baseline, uvw_index) float64 dask.array<chunksize=(100, 210, 3), meta=np.ndarray>
    WEIGHT         (time, baseline, pol) float64 dask.array<chunksize=(100, 210, 1), meta=np.ndarray>

Averaging and Smoothing

Averaging functions will change the shape of the resulting xarray dataset along the dimension being averaged.

Smoothing functions always return an xarray dataset with the same dimensions as the original.

Channel Averaging

The channel averaging function looks for all data variables in the dataset with a channel dimension and averages by the specified bin width. The returned dataset will have a different channel dimension size.

[24]:
from cngi.vis import chanaverage, visplot

# average 5 channels of original unflagged data
avg_xds = chanaverage(vis_xds, width=5)

# compare the original to the channel averaged
visplot(vis_xds.DATA[30,0,:,0], 'chan')
visplot(avg_xds.DATA[30,0,:,0], 'chan')
_images/visibilities_38_0.png
_images/visibilities_38_1.png
[25]:
# confirm new channel dimension after averaging
print(dict(vis_xds.dims))
print(dict(avg_xds.dims))
{'baseline': 210, 'chan': 384, 'pair': 2, 'pol': 2, 'receptor': 2, 'spw': 1, 'time': 410, 'uvw_index': 3}
{'baseline': 210, 'chan': 76, 'pair': 2, 'pol': 2, 'receptor': 2, 'spw': 1, 'time': 410, 'uvw_index': 3}

Since every variable with a channel dimension in the dataset is averaged, this will also include the FLAG variable. FLAG is a boolean type with values of 0 or 1 that are averaged over the width, resulting in a decimal number. The result is then typcast back to boolean, which is the same as just rounding up.

Long story short, if any channel in the width is flagged, the resulting averaged channel will also be flagged

[26]:
# compare the original flags to the channel averaged flags
visplot(vis_xds.FLAG[30,0,:,0], 'chan')
visplot(avg_xds.FLAG[30,0,:,0], 'chan')
_images/visibilities_41_0.png
_images/visibilities_41_1.png

One might want to apply flags before channel averaging

[27]:
from cngi.vis import applyflags

flg_xds = applyflags(vis_xds, flags=['FLAG'])
avg_flg_xds = chanaverage(flg_xds, width=5)

visplot(flg_xds.DATA[30,0,:,0], 'chan')
visplot(avg_flg_xds.DATA[30,0,:,0], 'chan')
_images/visibilities_43_0.png
_images/visibilities_43_1.png

Or apply flags after channel averaging

[28]:
flg_avg_xds = applyflags(avg_xds, flags=['FLAG'])

visplot(flg_avg_xds.DATA[30,0,:,0], 'chan')
_images/visibilities_45_0.png

There is a small difference of two points, made more clear when we overlay the plots

[29]:
visplot(avg_flg_xds.DATA[30,0,:,0], 'chan', drawplot=False)
visplot(flg_avg_xds.DATA[30,0,:,0], 'chan', overplot=True)
_images/visibilities_47_0.png

Time Averaging

The time averaging function looks for all data variables in the dataset with a time dimension and averages by the specified bin width. Spanning across states within a scan, scans of a each state, or both yields different results. The returned dataset will have a different time dimension size.

First let’s see the relationship between scans, states, and time in this data

[30]:
from cngi.vis import visplot

visplot(vis_xds.scan, 'time')
visplot(vis_xds.state, 'time')
visplot(vis_xds.state, 'scan')
_images/visibilities_49_0.png
_images/visibilities_49_1.png
_images/visibilities_49_2.png

Let’s start by not spanning anything. The comparison plots will look at all times across baseline 100, channel 100, and polarization 0 (random selection)

[31]:
from cngi.vis import timeaverage

# average 7 time steps, don't span anything (separate by scan and state)
avg_xds = timeaverage(vis_xds, width=7, timespan='none')

# compare the original to the time averaged
visplot(vis_xds.DATA[:,100,100,0], 'time', drawplot=False)
visplot(avg_xds.DATA[:,100,100,0], 'time', overplot=True)
_images/visibilities_51_0.png
[32]:
# confirm new channel dimension after averaging
print(dict(vis_xds.dims))
print(dict(avg_xds.dims))
{'baseline': 210, 'chan': 384, 'pair': 2, 'pol': 2, 'receptor': 2, 'spw': 1, 'time': 410, 'uvw_index': 3}
{'baseline': 210, 'chan': 384, 'pair': 2, 'pol': 2, 'receptor': 2, 'spw': 1, 'time': 82, 'uvw_index': 3}
[33]:
# average 7 time steps spanning states within each scan
state_avg_xds = timeaverage(vis_xds, width=7, timespan='state')

visplot(state_avg_xds.DATA[:,100,100,0], 'time')
_images/visibilities_53_0.png
[34]:
# average 7 time steps spanning all scans per state
scan_avg_xds = timeaverage(vis_xds, width=7, timespan='scan')

visplot(scan_avg_xds.DATA[:,100,100,0], 'time')
_images/visibilities_54_0.png
[35]:
# average 7 time steps spanning all scans and states
both_avg_xds = timeaverage(vis_xds, width=7, timespan='both')

visplot(both_avg_xds.DATA[:,100,100,0], 'time')
_images/visibilities_55_0.png

We can inspect the change to scans over time as a result of the different spanning

[36]:
visplot(avg_xds.scan, 'time', drawplot=False)
visplot(state_avg_xds.scan, 'time', overplot=True, drawplot=False)
visplot(scan_avg_xds.scan, 'time', overplot=True, drawplot=False)
visplot(both_avg_xds.scan, 'time', overplot=True)
_images/visibilities_57_0.png

Flagging works the same way as in channel averaging. Flags are averaged with all other data sharing the time axis. They may be applied before or after time averaging.

[37]:
from cngi.vis import applyflags

avg_flg_xds = timeaverage(applyflags(vis_xds, flags=['FLAG']), width=5, timespan='both')
flg_avg_xds = applyflags(timeaverage(vis_xds, width=5, timespan='both'), flags=['FLAG'])

visplot(avg_flg_xds.DATA[:,100,100,0], 'time', drawplot=False)
visplot(flg_avg_xds.DATA[:,100,100,0], 'time', overplot=True)
_images/visibilities_59_0.png

Channel Smoothing

Channel smoothing allows a variety of window shapes to be convolved across the channel dimension to smooth over changes from bin to bin. The standard CASA hanning smooth is supported as the default option. The returned dataset will have the same dimensions as the original.

[38]:
from cngi.vis import chansmooth

smooth_xds = chansmooth(vis_xds)

visplot(vis_xds.DATA[30,0,:,0], 'chan', drawplot=False)
visplot(smooth_xds.DATA[30,0,:,0], 'chan', overplot=True)
_images/visibilities_61_0.png

chansmooth uses the scipy.signal package to set the window shape. Therefore, all the window functions supported by scipy are also supported here. The size of the window width is configurable.

https://docs.scipy.org/doc/scipy/reference/signal.windows.html#module-scipy.signal.windows

[39]:
bohman_xds = chansmooth(vis_xds, type='bohman', size=7)

visplot(smooth_xds.DATA[30,0,:,0], 'chan', drawplot=False)
visplot(bohman_xds.DATA[30,0,:,0], 'chan', overplot=True)
_images/visibilities_63_0.png

non-unity gains are also supported to amplify or attenuate the output

[40]:
hann_xds = chansmooth(vis_xds, type='hann', size=5, gain=1.0)
amp_hann_xds = chansmooth(vis_xds, type='hann', size=5, gain=3.0)

visplot(hann_xds.DATA[30,0,:,0], 'chan', drawplot=False)
visplot(amp_hann_xds.DATA[30,0,:,0], 'chan', overplot=True)
_images/visibilities_65_0.png

UV Fitting

Various modeling can be performed in the UV-domain through polynomial regression on the visibility data.

The uvcontfit function takes a source data variable in the xarray dataset and fits a polynomial of fitorder across the channel axis. The resulting model is placed in the target data variable and retains the dimensionality of the source. Specified channels may be excluded to focus on fitting only continuum visibilities and not line emissions.

In our example dataset we know that the Ceres has a line emission, so let’s focus on just that field.

[41]:
from cngi.vis import uvcontfit, visplot

ceres_xds = vis_xds.where(vis_xds.field == 'Ceres', drop=True)

fit_xds = uvcontfit(ceres_xds, source='DATA', target='CONTFIT')

visplot(fit_xds.DATA, ['time', 'chan'])
visplot(fit_xds.CONTFIT, ['time', 'chan'])
_images/visibilities_67_0.png
_images/visibilities_67_1.png

What we are seeing is merged across baselines and polarizations. Let’s look at just one time/baseline/pol sample and examine the fit. The linear fit is skewed by the line emission towards one side of the channel axis.

[42]:
visplot(fit_xds.DATA[10,10,:,0], ['chan'], drawplot=False)
visplot(fit_xds.CONTFIT[10,10,:,0], ['chan'], overplot=True)
_images/visibilities_69_0.png

Let’s omit the channels with the line emission and do the fit again. We should see less skewing in the fitted continuum.

The fits appear lower in value than the mean of the data when viewing the magnitude of complex visibilities. The real and imaginary parts are fitted individually. We examine just the real piece here to see a that the fit is indeed a decent approximation through the center of the data.

[43]:
excludechans = np.where( (vis_xds.chan > 3.726e11) & (vis_xds.chan < 3.7262e11) )[0]

fit_xds2 = uvcontfit(ceres_xds, source='DATA', target='CONTFIT', fitorder=1, excludechans=excludechans)

includechans = np.setdiff1d( range(vis_xds.dims['chan']), excludechans)
visplot(fit_xds2.DATA[10,10,includechans,0].real, ['chan'], drawplot=False)
visplot(fit_xds.CONTFIT[10,10,:,0].real, ['chan'], drawplot=False, overplot=True)
visplot(fit_xds2.CONTFIT[10,10,:,0].real, ['chan'], overplot=True)
_images/visibilities_71_0.png

For even more fun, we can try a higher order fit. Let’s use all channels to allow the line emission to exaggerate the non-linearity, but only plot the continuum DATA channels so the y-scale stays smaller.

[44]:
fit_xds3 = uvcontfit(ceres_xds, source='DATA', target='CONTFIT', fitorder=3, excludechans=[])

visplot(fit_xds3.DATA[10,10,includechans,0].real, ['chan'], drawplot=False)
visplot(fit_xds3.CONTFIT[10,10,:,0].real, ['chan'], overplot=True)
_images/visibilities_73_0.png

Several metrics related to the quality of the fit are stored in the xarray dataset attributes section as metadata. They are named with a prefix matching the target parameter in uvcontfit.

[45]:
metrics = [kk for kk in fit_xds.attrs.keys() if kk.startswith('CONTFIT')]
print(metrics)

print(fit_xds.CONTFIT_rms_error)
['CONTFIT_rms_error', 'CONTFIT_min_max_error', 'CONTFIT_bw_frac', 'CONTFIT_freq_frac']
<xarray.DataArray ()>
dask.array<pow, shape=(), dtype=complex128, chunksize=(), chunktype=numpy.ndarray>

One thing to note is that these metrics are dask elements, and are not actually computed until explicity requested. Things like visplot explicity force the computation of the data needed for plotting, but nothing has forced these metrics to be computed yet. So we will call .compute() to see their values.

[46]:
print('rms error with line included: ', fit_xds.CONTFIT_rms_error.compute().item())
print('rms error with line excluded: ', fit_xds2.CONTFIT_rms_error.compute().item())
rms error with line included:  (16.444908051664182-0.7566763926282183j)
rms error with line excluded:  (0.37306136672597817-0.007993456030923633j)