Development

READ THIS BEFORE YOU CONTRIBUTE CODE!!!

The CNGI code base is not object oriented, and instead follows a more functional paradigm. Objects are indeed used to hold Visibility and Image data, but they come directly from the underlying xarray/dask framework and are not extended in any way. The API consists of stateless Python functions only. They take in a Visibility or Image object and return a new Visibility or Image object with no global variables.


Organization

CNGI is organized in to modules as described below. Each module is responsible for a different functional area.

  • conversion : convert legacy CASA data files to CNGI compatible files

  • dio : data objects to/from CNGI compatible data files

  • vis : operations on visibility data objects

  • image : operations on image data objects

  • direct : access to underlying parallel processing framework

Generally, the sequence of events is as follows:

  1. direct module is used to initialize processing environment

  2. conversion module is used to create CNGI compatible data from existing CASA data

  3. dio module is used to create visibilty and image xarray dataset objects

  4. vis and image module operations are performed

  5. dio module is used to save the results


Architecture

The CNGI application programming interface (API) is a set of flat, stateless functions that take an xarray Dataset as an input parameter and return a new xarray Dataset as output. The term “flat” means that the functions are not allowed to call each other, and the term “stateless” means that they many not access any global data outside the parameter list, nor maintain any persistent internal data.

The cngi_prototype repository on GitHub contains the cngi package along with supporting folders docs and tests. Within the cngi package there are a number of modules. Within each module there are one or more python files.

CNGI adheres to a strict design philosophy with the following RULES:

  1. Each file in a module must have exactly one function exposed to the external API (by docstring and __init__.py).

  2. The exposed function name should match the file name.

  3. Must use stateless functions, not classes.

  4. Files in a module cannot import each other.

  5. Files in separate modules cannot import each other.

  6. A single special _helper module exists for internal functions meant to be shared across modules/files. But each module file should be as self contained as possible.

  7. Nothing in _helper may be exposed to the external API.

cngi_prototype
|-- cngi
|    |-- module1
|    |     |-- __init__.py
|    |     |-- file1.py
|    |     |-- file2.py
|    |     | ...
|    |-- module2
|    |     |-- __init__.py
|    |     |-- file3.py
|    |     |-- file4.py
|    |     | ...
|    |-- _helper
|    |     |-- __init__.py
|    |     |-- file5.py
|    |     |-- file6.py
|    |     | ...
|-- docs
|    | ...
|-- tests
|    | ...
|-- requirements.txt
|-- setup.py

File1, file2, file3 and file4 MUST be documented in the API exactly as they appear. They must NOT import each other. File5 and file6 must NOT be documented in the API. They may be imported by file1 - 4.

There are several important files to be aware of:

  • __init__.py : dictates what is seen by the API and importable by other functions

  • requirements.txt : lists all library dependencies for development, used by IDE during setup

  • setup.py : defines how to package the code for pip, including version number and library dependencies for installation


Documentation

All CNGI documentation is automatically rendered from files placed in the docs folder using the Sphinx tool. A Readthedocs service scans for updates to the Github repository and automatically calls Sphinx to build new documentation as necessary. The resulting documentation html is hosted by readthedocs as a CNGI website.

sphinx

Compatible file types in the docs folder that can be rendered by Sphinx include:

  • Markdown (.md)

  • reStructuredText (.rst)

  • Jupyter notebook (.ipynb)

Sphinx extension modules are used to automatically crawl the cngi code directories and pull out function definitions. These definitions end up in the API section of the documentation. All CNGI functions must conform to the numpy docstring format.

The nbsphinx extension module is used to render Jupyter notebooks to html.


IDE

The CNGI team recommends the use of the PyCharm IDE for developing CNGI code. PyCharm provides a simple (relatively) unified environment that includes Github integration, code editor, python shell, system terminal, and venv setup.

PyCharm

CNGI also relies heavily on Google Colaboratory for both documentation and code execution examples. Google colab notebooks integrate with Github and allow markdown-style documentation interleaved with executable python code. Even in cases where no code is necessary, colab notebooks are the preferred choice for markdown documentation. This allows other team members to make documentation updates in a simple, direct manner.

Colab


PyPi Packages

CNGI is distributed and installed via pip by hosting packages on pypi. The pypi test server is available to all authorized CNGI developers to upload an evaluate their code branches.

Pypi

Typically, the Colab notebook documentation and examples will need a pip installation of CNGI to draw upon. The pypi test server allows notebook documentation to temporarily draw from development branches until everything is finalized in a Github pull request and production pypi distribution.

Developers should create a .pypirc file in their home directory for convenient uploading of distributions to the pip test server. It should look something like:

[distutils]
index-servers =
     pypi
     pypitest

[pypi]
username = yourusername
password = yourpassword

[pypitest]
repository = https://test.pypi.org/legacy/
username = yourusername
password = yourpassword

Production packages are uploaded to the main pypi server by a subset of authorized CNGI developers when a particular version is ready for distribution.


Step by Step

Concise steps for contributing code to CNGI

Install IDE

  1. Request that your Github account be added to the contributors of the CNGI repository

  2. Make sure Python 3.6 and Git are installed on your machine

  3. Download and install the free PyCharm Community edition. On Linux, it is just a tar file. Expand it and execute pycharm.sh in the bin folder via something like:

    $ ./pycharm-community-2020.1/bin/pycharm.sh
    
  4. From the welcome screen, click Get from Version Control

  5. Add your Github account credentials to PyCharm and then you should see a list of all repositories you have access to

  6. Select the CNGI repository and set an appropriate folder location/name. Click “Clone”.

  7. Go to:

    File -> Settings -> Project: xyz -> Python Intrepreter
    

    and click the little cog to add a new Project Interpreter. Make a new Virtualenv environment, with the location set to a venv subfolder in the project directory. Make sure to use Python 3.6.

  8. Double click the requirements.txt file that was part of the git clone to open it in the editor. That should prompt PyCharm to ask you if you want to “Install requirements” found in this file. Yes, you do. You can ignore the stuff about plugins.

  9. All necessary supporting Python libraries will now be installed in to the venv created for this project (isolating them from your base system). Do NOT add any project settings to Git.

Develop stuff

  1. Double click on files to open in editor and make changes.

  2. Create new files with:

    right-click -> New
    
  3. Move / rename / delete files with:

    right-click -> Refactor
    
  4. Run code interactively by selecting “Python Console” from the bottom of the screen. This is your venv enviornment with everything from requirements.txt installed in addition to the cngi package. You can do things like this:

    >>> from cngi.dio import read_vis
    >>> xds = read_vis('path\to\data.vis.zarr')
    
  5. When you make changes to a module (lets say read_vis for example), close the Python Console and re-open it, then import the module again to see the changes.

  6. Commit changes to your local branch with

    right-click -> Git -> Commit File
    
  7. Merge latest version of Github master trunk to your local branch with

    right-click -> Git -> Repository -> Pull
    
  8. Push your local branch up to the Github master trunk with

    right-click -> Git -> Repository -> Push
    

Make a Pip Package

  1. If not already done, create an account on pip (and the test server) and have a CNGI team member grant access to the package. Then create a .pypirc file in your home directory.

  2. Set a unique version number in setup.py by using the release candidate label, as in:

    version='0.0.48rc1'
    
  3. Build the source distribution by executing the following commands in the PyCharm Terminal (button at the bottom left):

    $ rm -fr dist
    $ python setup.py sdist
    
  4. call twine to upload the sdist package to pypi-test:

    $ python -m twine upload dist/* -r pypitest
    
  5. Enjoy your pip package as you would a real production one by pointing to the test server:

    $ pip install -i https://test.pypi.org/simple/ cngi-prototype==0.0.48rc1
    

Update the Documentation

  1. A bulk of the documentation is in the docs folder and in the ‘.ipynb’ format. These files are visible through PyCharm, but should be edited and saved in Google Colab. The easiest way to do this is not navigate to the Github docs folder and click on the .ipynb file you want to edit. There is usually an open in colab button at the top.

  2. Alternatively, notebooks can be accessed in Colab by combining a link prefix with the name of the .ipynb file in the repository docs folder. For example, this page you are reading now can be edited by combining the colab prefix:

    https://colab.research.google.com/github/casangi/cngi_prototype/blob/master/docs/
    

    with the filename of this notebook:

    development.ipynb
    

    producing a link of: https://colab.research.google.com/github/casangi/cngi_prototype/blob/master/docs/development.ipynb

  3. In Colab, make the desired changes and then select

    File -> Save a copy in Github
    

    enter you Github credentials if not already stored with Google, and then select the CNGI repository and the appropriate path/filename, i.e. docs/development.ipynb

  4. Readthedocs will detect changes to the Github master and automatically rebuild the documentation hosted on their server (this page you are reading now, for example). This can take ~15 minutes

In the docs folder, some of the root index files are stored as .md or .rst format and may be edited by double clicking and modifying in the PyCharm editor. They can then be pushed to the master trunk in the same manner as source code.

After modifying an .md or .rst file, double check that it renders correctly by executing the following commands in the PyCharm Terminal

$ cd docs/
$ rm -fr _api/api
$ rm -fr build
$ sphinx-build -b html . ./build

Then open up a web browser and navigate to

file:///path/to/project/docs/build/index.html

Do NOT add api or build folders to Git, they are intermediate build artifacts. Note that **_api** is the location of actual documentation files that automatically parse the docstrings in the sourcecode, so that should be in Git.


Coding Standards

Documentation is generated using Sphinx, with the autodoc and napoleon extensions enabled. Function docstrings should be written in NumPy style. For compatibility with Sphinx, import statements should generally be underneath function definitions, not at the top of the file.

A complete set of formal and enforced coding standards have not yet been formally adopted. Some alternatives under consideration are:

  • Google’s style guide

  • Python Software Foundation’s style guide

  • Following conventions established by PyData projects (examples one and two)

We are evaluating the adoption of PEP 484 convention, mypy, or param for type-checking, and flake8 or pylint for enforcement.