cngi.conversion.convert_ms2
¶
-
convert_ms2
(infile, outfile=None, ddis=None, ignore=['HISTORY'], compressor=None, chunks=(100, 400, 32, 1), sub_chunks=10000, append=False)[source]¶ Convert legacy format MS to xarray Visibility Dataset and zarr storage format
The CASA MSv2 format is converted to the MSv3 schema per the specified definition here: https://drive.google.com/file/d/10TZ4dsFw9CconBc-GFxSeb2caT6wkmza/view?usp=sharing
The MS is partitioned by DDI, which guarantees a fixed data shape per partition. This results in different subdirectories under the main vis.zarr folder. There is no DDI in MSv3, so this simply serves as a partition id in the zarr directory.
- Parameters
infile (str) – Input MS filename
outfile (str) – Output zarr filename. If None, will use infile name with .vis.zarr extension
ddis (list) – List of specific DDIs to convert. DDI’s are integer values, or use ‘global’ string for subtables. Leave as None to convert entire MS
ignore (list) – List of subtables to ignore (case sensitive and generally all uppercase). This is useful if a particular subtable is causing errors. Default is None. Note: default is now temporarily set to ignore the HISTORY table due a CASA6 issue in the table tool affecting a small set of test cases (set back to None if HISTORY is needed)
compressor (numcodecs.blosc.Blosc) – The blosc compressor to use when saving the converted data to disk using zarr. If None the zstd compression algorithm used with compression level 2.
chunks (4-D tuple of ints) – Shape of desired chunking in the form of (time, baseline, channel, polarization), use -1 for entire axis in one chunk. Default is (100, 400, 20, 1) Note: chunk size is the product of the four numbers, and data is batch processed by time axis, so that will drive memory needed for conversion.
sub_chunks (int) – Chunking used for subtable conversion (except for POINTING which will use time/baseline dims from chunks parameter). This is a single integer used for the row-axis (d0) chunking only, no other dims in the subtables will be chunked.
append (bool) – Keep destination zarr store intact and add new DDI’s to it. Note that duplicate DDI’s will still be overwritten. Default False deletes and replaces entire directory.
- Returns
Master xarray dataset of datasets for this visibility set
- Return type
xarray.core.dataset.Dataset