READ THIS BEFORE YOU CONTRIBUTE CODE!!!
The CNGI code base is not object oriented, and instead follows a more functional paradigm. Objects are indeed used to hold Visibility and Image data, but they come directly from the underlying xarray/dask framework and are not extended in any way. The API consists of stateless Python functions only. They take in a Visibility or Image object and return a new Visibility or Image object with no global variables.
CNGI is organized in to modules as described below. Each module is responsible for a different functional area.
conversion : convert legacy CASA data files to CNGI compatible files
dio : data objects to/from CNGI compatible data files
vis : operations on visibility data objects
image : operations on image data objects
direct : access to underlying parallel processing framework
Generally, the sequence of events is as follows:
direct module is used to initialize processing environment
conversion module is used to create CNGI compatible data from existing CASA data
dio module is used to create visibilty and image xarray dataset objects
vis and image module operations are performed
dio module is used to save the results
The CNGI application programming interface (API) is a set of flat, stateless functions that take an xarray Dataset as an input parameter and return a new xarray Dataset as output. The term “flat” means that the functions are not allowed to call each other, and the term “stateless” means that they many not access any global data outside the parameter list, nor maintain any persistent internal data.
The cngi_prototype repository on GitHub contains the cngi package along with supporting folders
tests. Within the cngi package there are a number of modules. Within each module there are one or more python files.
CNGI adheres to a strict design philosophy with the following RULES:
Each file in a module must have exactly one function exposed to the external API (by docstring and __init__.py).
The exposed function name should match the file name.
Must use stateless functions, not classes.
Files in a module cannot import each other.
Files in separate modules cannot import each other.
A single special _helper module exists for internal functions meant to be shared across modules/files. But each module file should be as self contained as possible.
Nothing in _helper may be exposed to the external API.
cngi_prototype |-- cngi | |-- module1 | | |-- __init__.py | | |-- file1.py | | |-- file2.py | | | ... | |-- module2 | | |-- __init__.py | | |-- file3.py | | |-- file4.py | | | ... | |-- _helper | | |-- __init__.py | | |-- file5.py | | |-- file6.py | | | ... |-- docs | | ... |-- tests | | ... |-- requirements.txt |-- setup.py
File1, file2, file3 and file4 MUST be documented in the API exactly as they appear. They must NOT import each other. File5 and file6 must NOT be documented in the API. They may be imported by file1 - 4.
There are several important files to be aware of:
__init__.py : dictates what is seen by the API and importable by other functions
requirements.txt : lists all library dependencies for development, used by IDE during setup
setup.py : defines how to package the code for pip, including version number and library dependencies for installation
All CNGI documentation is automatically rendered from files placed in the docs folder using the Sphinx tool. A Readthedocs service scans for updates to the Github repository and automatically calls Sphinx to build new documentation as necessary. The resulting documentation html is hosted by readthedocs as a CNGI website.
Compatible file types in the docs folder that can be rendered by Sphinx include:
Jupyter notebook (.ipynb)
Sphinx extension modules are used to automatically crawl the cngi code directories and pull out function definitions. These definitions end up in the API section of the documentation. All CNGI functions must conform to the numpy docstring format.
The nbsphinx extension module is used to render Jupyter notebooks to html.
The CNGI team recommends the use of the PyCharm IDE for developing CNGI code. PyCharm provides a simple (relatively) unified environment that includes Github integration, code editor, python shell, system terminal, and venv setup.
CNGI also relies heavily on Google Colaboratory for both documentation and code execution examples. Google colab notebooks integrate with Github and allow markdown-style documentation interleaved with executable python code. Even in cases where no code is necessary, colab notebooks are the preferred choice for markdown documentation. This allows other team members to make documentation updates in a simple, direct manner.
CNGI is distributed and installed via pip by hosting packages on pypi. The pypi test server is available to all authorized CNGI developers to upload an evaluate their code branches.
Typically, the Colab notebook documentation and examples will need a pip installation of CNGI to draw upon. The pypi test server allows notebook documentation to temporarily draw from development branches until everything is finalized in a Github pull request and production pypi distribution.
Developers should create a .pypirc file in their home directory for convenient uploading of distributions to the pip test server. It should look something like:
[distutils] index-servers = pypi pypitest [pypi] username = yourusername password = yourpassword [pypitest] repository = https://test.pypi.org/legacy/ username = yourusername password = yourpassword
Production packages are uploaded to the main pypi server by a subset of authorized CNGI developers when a particular version is ready for distribution.
Step by Step¶
Concise steps for contributing code to CNGI
Request that your Github account be added to the contributors of the CNGI repository
Make sure Python 3.6 and Git are installed on your machine
Download and install the free PyCharm Community edition. On Linux, it is just a tar file. Expand it and execute pycharm.sh in the bin folder via something like:
From the welcome screen, click
Get from Version Control
Add your Github account credentials to PyCharm and then you should see a list of all repositories you have access to
Select the CNGI repository and set an appropriate folder location/name. Click “Clone”.
File -> Settings -> Project: xyz -> Python Intrepreter
and click the little cog to add a new Project Interpreter. Make a new Virtualenv environment, with the location set to a venv subfolder in the project directory. Make sure to use Python 3.6.
Double click the
requirements.txtfile that was part of the git clone to open it in the editor. That should prompt PyCharm to ask you if you want to “Install requirements” found in this file. Yes, you do. You can ignore the stuff about plugins.
All necessary supporting Python libraries will now be installed in to the venv created for this project (isolating them from your base system). Do NOT add any project settings to Git.
Double click on files to open in editor and make changes.
Create new files with:
right-click -> New
Move / rename / delete files with:
right-click -> Refactor
Run code interactively by selecting “Python Console” from the bottom of the screen. This is your venv enviornment with everything from requirements.txt installed in addition to the cngi package. You can do things like this:
>>> from cngi.dio import read_vis >>> xds = read_vis('path\to\data.vis.zarr')
When you make changes to a module (lets say read_vis for example), close the Python Console and re-open it, then import the module again to see the changes.
Commit changes to your local branch with
right-click -> Git -> Commit File
Merge latest version of Github master trunk to your local branch with
right-click -> Git -> Repository -> Pull
Push your local branch up to the Github master trunk with
right-click -> Git -> Repository -> Push
Make a Pip Package¶
If not already done, create an account on pip (and the test server) and have a CNGI team member grant access to the package. Then create a
.pypircfile in your home directory.
Set a unique version number in
setup.pyby using the release candidate label, as in:
Build the source distribution by executing the following commands in the PyCharm Terminal (button at the bottom left):
$ rm -fr dist $ python setup.py sdist
call twine to upload the sdist package to pypi-test:
$ python -m twine upload dist/* -r pypitest
Enjoy your pip package as you would a real production one by pointing to the test server:
$ pip install -i https://test.pypi.org/simple/ cngi-prototype==0.0.48rc1
Update the Documentation¶
A bulk of the documentation is in the
docsfolder and in the ‘.ipynb’ format. These files are visible through PyCharm, but should be edited and saved in Google Colab. The easiest way to do this is not navigate to the Github docs folder and click on the .ipynb file you want to edit. There is usually an
open in colabbutton at the top.
Alternatively, notebooks can be accessed in Colab by combining a link prefix with the name of the .ipynb file in the repository
docsfolder. For example, this page you are reading now can be edited by combining the colab prefix:
with the filename of this notebook:
In Colab, make the desired changes and then select
File -> Save a copy in Github
enter you Github credentials if not already stored with Google, and then select the CNGI repository and the appropriate path/filename, i.e.
Readthedocs will detect changes to the Github master and automatically rebuild the documentation hosted on their server (this page you are reading now, for example). This can take ~15 minutes
In the docs folder, some of the root index files are stored as .md or .rst format and may be edited by double clicking and modifying in the PyCharm editor. They can then be pushed to the master trunk in the same manner as source code.
After modifying an .md or .rst file, double check that it renders correctly by executing the following commands in the PyCharm Terminal
$ cd docs/ $ rm -fr _api/api $ rm -fr build $ sphinx-build -b html . ./build
Then open up a web browser and navigate to
Do NOT add api or build folders to Git, they are intermediate build artifacts. Note that **_api** is the location of actual documentation files that automatically parse the docstrings in the sourcecode, so that should be in Git.
Documentation is generated using Sphinx, with the autodoc and napoleon extensions enabled. Function docstrings should be written in NumPy style. For compatibility with Sphinx, import statements should generally be underneath function definitions, not at the top of the file.
A complete set of formal and enforced coding standards have not yet been formally adopted. Some alternatives under consideration are:
Google’s style guide
Python Software Foundation’s style guide