COVID-19 Open-Source Helpdesk

Helping Nextstrain

Hi all,

I’m here to recruit some help from the open-source, scientific python community!

Nextstrain (from Trevor Bedford’s lab) is an open-source project that does real-time tracking of pathogen evolution around the world. You may have seen this project on Twitter—they have quickly become one of the world’s best resources for tracking the evolution of SARS-CoV-2 (virus that causes COVID-19).

Here is their web app showing the virus’ transmission, genetic diversity, and phylogeny.

They need help! I spoke with Colin Megill, a research scientist from the Bedford lab, and he said an immediate need they have is optimizing Nextstrain’s backend library, augur. It’s in pure Python and includes the phylogenetic models and bioinformatics calculations that power Nextstrain.

Augur could use a significant speed-up. Colin said the expertise of OSS/Scipy ecosystem to optimize their calculations would be huge. If you scan the codebase and have ideas for improving Augur’s efficiency, open an issue there and start a discussion that might lead to PR. Further, they would love some contributors from this group to help them address issues open on the repo. PRs, of course, are welcome. This is a really tangible way in which we, OSS developers, can get involved in tools that directly translate to impact on the current COVID-19 crisis.

You shouldn’t need prior knowledge about bioinformatics or phylogenetics to help—just some Python chops. They have some tests and documentation that can help us get started.

To summarize their two immediate needs:

  1. Help address any open issues in Augur
  2. Scan Augur’s codebase and see if we can improve its speed.

I know this isn’t a specific question for the help desk, but I think directing some OSS contributors their way would greatly impact the world.

Reach out if you have any questions. Thanks, all!

Zach

Project Jupyter | Cal Poly

4 Likes

@jerowe just had almost exact same idea today. At Quansight we just put a plan together for Nextstrain Dask-on-HPC, and collecting everyone with spare cycles who can help. I’ll let @jerowe talk about the plan (she actually understands bioinformatics, Dask and HCP); I’ll be happy to help coordinate

Cheers,
Ralf

4 Likes

Let’s make some phylogenetic trees SUPER FAST!

2 Likes

There’s quite a few actionable issues on https://github.com/nextstrain/augur/issues indeed, those can be self-contained and anyone can jump in and go fix one.

To not overload the Nextstrain team (I assume they’re super busy) we plan to fork some of the relevant repos, collaborate in public but without syncing back each PR, and once all the pieces of the puzzle fit (including smooth deployment and docs), then propose merging it back or maintaining it side-by-side for a while. If there are people interested in this, in particular with Dask, AWS or HPC experience, please comment in this thread.

A tentative plan is here, we’ll have things set up on GitHub tomorrow.

6 Likes

@Zsailer It’s great to hear so many people are on the same page. :wink:

Let’s all get organized and get going with a plan. I can see a few angles of attack. @rgommers is talking to folks at Quansight and I’m building out compute clusters on AWS. Of course anyone who would like to contribute and needs resources is welcome onto the cluster.

2 Likes

Loop in Chris Wilcox from the Python community who has already been working on the project.

2 Likes

Sounds good. I’ve talked with Brian Granger at Jupyter/AWS, and he’s likely able to get us free AWS credits as well—if that’s something we need.

2 Likes

I wouldn’t say no to aws credits!

@jerowe This is awesome. I did some phylogenetics in my grad school days (hence, PhyloPandas). Thus, I’m eager to help these groups anyway I can.

I’ll talk with Brian in the coming days and see what resources we might be able to get from AWS.

Thanks, @rgommers, for connecting everyone!

1 Like

I really love profiling and think it’s an under taught / under utilized skill. Might be nice to put together some resources on how one goes about profiling python code to get people involved.

3 Likes

I agree on the use of Dask. Well-maintained and hits the key requirements. There’s also knowledge within the JupyterHub/Binder projects on Dask, SLURM, etc.

Another major need (see this comment) in Augur is test coverage. We can certainly help there! :rocket:

Looping in @crwilcox per @willingc’s suggestion! :slight_smile:

Chris, I know you’ve already been involved with Nextstrain quite a bit. If you know of other areas we can help, let us know.

1 Like

Hi @Zsailer. Nextstrain could very much use help of some experts in certain areas.

2 Likes

I just sync’d with Colin from nextstrain and it seems Augur + Skillset of this group is a great matchup.

I have mostly been working on auspice as of now.

2 Likes

Starting to look at this sent me down a dependency rabbit hole (try to install augur -> installs snakemake -> fails to install datries). A related bit of work to this is to get datries updated for py38 (cython output included in the pypi tarball) and automate building of wheels for it. I have reached out via email to the author who’s email address I could find to see if they would like help.

2 Likes

Hi all — here are (timestamped) onboarding videos, descriptions of the components of Nextstrain’s interface

3 Likes

After cloning the repo and cd’ing into it, I ran:

conda env create -f environment.yml
pip install augur

And everything worked. What was your installation process?

I do agree that we should upgrade python version to get some performance improvements for free. Currently, conda environment is using python 3.6.

1 Like

@arakhmat I was doing everything with pip in a py38 venv. Many of the core projects have adopted NEP29 which says we drop support for py36 in new versions starting in June of this year so getting this stack moved to newer Python seems important to me.

As of last night there is py38 conda packages for datrie on conda-forge [1]. I suspect my work for tonight will be getting that patch merged upstream and release and get snakemake installing via conda (with bioconda + conda-forge channels) on py38.

[1] https://github.com/conda-forge/datrie-feedstock/pull/5

2 Likes

Well, snakemake on py38 via conda already works!

jupiter@08:45  ➤  conda create -n py38_clean
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/tcaswell/.virtualenvs/py38_clean



Proceed ([y]/n)? y

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py38_clean
#
# To deactivate an active environment, use
#
#     $ conda deactivate

✔  ~
jupiter@08:45  ➤   conda activate py38_clean
(py38_clean) ✔  ~
jupiter@08:45  ➤  
(py38_clean) ✔  ~
jupiter@08:45  ➤  conda install python=3.8 snakemake -c bioconda -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/tcaswell/.virtualenvs/py38_clean

  added / updated specs:
    - python=3.8
    - snakemake


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    libblas-3.8.0              |      16_openblas          10 KB  conda-forge
    libcblas-3.8.0             |      16_openblas          10 KB  conda-forge
    liblapack-3.8.0            |      16_openblas          10 KB  conda-forge
    libopenblas-0.3.9          |       h5ec1e0e_0         7.8 MB  conda-forge
    llvm-openmp-9.0.1          |       hc9558a2_2         782 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         8.6 MB

The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-1_llvm
  aioeasywebdav      conda-forge/linux-64::aioeasywebdav-2.4.0-py38_1000
  aiohttp            conda-forge/linux-64::aiohttp-3.6.2-py38h516909a_0
  appdirs            conda-forge/noarch::appdirs-1.4.3-py_1
  async-timeout      conda-forge/noarch::async-timeout-3.0.1-py_1000
  attrs              conda-forge/noarch::attrs-19.3.0-py_0
  bcrypt             conda-forge/linux-64::bcrypt-3.1.7-py38h516909a_0
  boto3              conda-forge/noarch::boto3-1.12.28-pyh9f0ad1d_0
  botocore           conda-forge/noarch::botocore-1.15.28-pyh9f0ad1d_0
  bzip2              conda-forge/linux-64::bzip2-1.0.8-h516909a_2
  ca-certificates    conda-forge/linux-64::ca-certificates-2019.11.28-hecc5488_0
  cachetools         conda-forge/noarch::cachetools-3.1.1-py_0
  cairo              conda-forge/linux-64::cairo-1.16.0-hcf35c78_1003
  certifi            conda-forge/linux-64::certifi-2019.11.28-py38h32f6830_1
  cffi               conda-forge/linux-64::cffi-1.14.0-py38hd463f26_0
  chardet            conda-forge/linux-64::chardet-3.0.4-py38h32f6830_1006
  configargparse     conda-forge/noarch::configargparse-1.1-pyh8c360ce_0
  crc32c             conda-forge/linux-64::crc32c-2.0-py38h516909a_0
  cryptography       conda-forge/linux-64::cryptography-2.8-py38h766eaa4_2
  datrie             conda-forge/linux-64::datrie-0.8-py38h1e0a361_0
  decorator          conda-forge/noarch::decorator-4.4.2-py_0
  docutils           conda-forge/linux-64::docutils-0.15.2-py38_0
  dropbox            conda-forge/noarch::dropbox-9.4.0-py_0
  expat              conda-forge/linux-64::expat-2.2.9-he1b5a44_2
  fftw               conda-forge/linux-64::fftw-3.3.8-nompi_h7f3a6c3_1110
  filechunkio        conda-forge/noarch::filechunkio-1.8-py_2
  fontconfig         conda-forge/linux-64::fontconfig-2.13.1-h86ecdb6_1001
  freetype           conda-forge/linux-64::freetype-2.10.1-he06d7ca_0
  ftputil            conda-forge/noarch::ftputil-3.4-py_0
  gettext            conda-forge/linux-64::gettext-0.19.8.1-hc5be6a0_1002
  ghostscript        bioconda/linux-64::ghostscript-9.18-1
  giflib             conda-forge/linux-64::giflib-5.1.9-h516909a_0
  gitdb              conda-forge/noarch::gitdb-4.0.2-py_0
  gitpython          conda-forge/noarch::gitpython-3.1.0-py_0
  glib               conda-forge/linux-64::glib-2.58.3-py38h73cb85d_1003
  google-api-core    conda-forge/linux-64::google-api-core-1.16.0-py38_1
  google-auth        conda-forge/noarch::google-auth-1.11.2-py_0
  google-cloud-core  conda-forge/noarch::google-cloud-core-1.3.0-py_0
  google-cloud-stor~ conda-forge/noarch::google-cloud-storage-1.26.0-py_0
  google-resumable-~ conda-forge/noarch::google-resumable-media-0.5.0-py_1
  googleapis-common~ conda-forge/linux-64::googleapis-common-protos-1.51.0-py38_1
  graphite2          conda-forge/linux-64::graphite2-1.3.13-he1b5a44_1001
  graphviz           conda-forge/linux-64::graphviz-2.38.0-hf68f40c_1011
  harfbuzz           conda-forge/linux-64::harfbuzz-2.4.0-h9f30f68_3
  icu                conda-forge/linux-64::icu-64.2-he1b5a44_1
  idna               conda-forge/noarch::idna-2.9-py_1
  imagemagick        conda-forge/linux-64::imagemagick-7.0.8_11-pl526hc610aec_0
  importlib-metadata conda-forge/linux-64::importlib-metadata-1.5.0-py38h32f6830_1
  importlib_metadata conda-forge/noarch::importlib_metadata-1.5.0-1
  ipython_genutils   conda-forge/noarch::ipython_genutils-0.2.0-py_1
  jbig               conda-forge/linux-64::jbig-2.1-h14c3975_2001
  jinja2             conda-forge/noarch::jinja2-2.11.1-py_0
  jmespath           conda-forge/noarch::jmespath-0.9.5-py_0
  jpeg               conda-forge/linux-64::jpeg-9c-h14c3975_1001
  jsonschema         conda-forge/linux-64::jsonschema-3.2.0-py38h32f6830_1
  jupyter_core       conda-forge/linux-64::jupyter_core-4.6.3-py38h32f6830_1
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.34-h53a641e_0
  libblas            conda-forge/linux-64::libblas-3.8.0-16_openblas
  libcblas           conda-forge/linux-64::libcblas-3.8.0-16_openblas
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1007
  libgcc             conda-forge/linux-64::libgcc-7.2.0-h69d50b8_2
  libgcc-ng          conda-forge/linux-64::libgcc-ng-9.2.0-h24d8f2e_2
  libgfortran-ng     conda-forge/linux-64::libgfortran-ng-7.3.0-hdf63c60_5
  libiconv           conda-forge/linux-64::libiconv-1.15-h516909a_1006
  liblapack          conda-forge/linux-64::liblapack-3.8.0-16_openblas
  libopenblas        conda-forge/linux-64::libopenblas-0.3.9-h5ec1e0e_0
  libpng             conda-forge/linux-64::libpng-1.6.37-hed695b0_1
  libprotobuf        conda-forge/linux-64::libprotobuf-3.11.4-h8b12597_0
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-9.2.0-hdf63c60_2
  libtiff            conda-forge/linux-64::libtiff-4.1.0-hc3755c2_3
  libtool            conda-forge/linux-64::libtool-2.4.6-h14c3975_1002
  libuuid            conda-forge/linux-64::libuuid-2.32.1-h14c3975_1000
  libwebp            conda-forge/linux-64::libwebp-0.5.2-7
  libxcb             conda-forge/linux-64::libxcb-1.13-h14c3975_1002
  libxml2            conda-forge/linux-64::libxml2-2.9.10-hee79883_0
  llvm-openmp        conda-forge/linux-64::llvm-openmp-9.0.1-hc9558a2_2
  lz4-c              conda-forge/linux-64::lz4-c-1.8.3-he1b5a44_1001
  markupsafe         conda-forge/linux-64::markupsafe-1.1.1-py38h1e0a361_1
  multidict          conda-forge/linux-64::multidict-4.7.5-py38h516909a_0
  nbformat           conda-forge/noarch::nbformat-5.0.4-py_0
  ncurses            conda-forge/linux-64::ncurses-6.1-hf484d3e_1002
  networkx           conda-forge/noarch::networkx-2.4-py_1
  numpy              conda-forge/linux-64::numpy-1.18.1-py38h8854b6b_1
  openjpeg           conda-forge/linux-64::openjpeg-2.3.1-h981e76c_3
  openssl            conda-forge/linux-64::openssl-1.1.1e-h516909a_0
  pandas             conda-forge/linux-64::pandas-1.0.3-py38hcb8c335_0
  pango              conda-forge/linux-64::pango-1.40.14-he7ab937_1005
  paramiko           conda-forge/linux-64::paramiko-2.7.1-py38_0
  pcre               conda-forge/linux-64::pcre-8.44-he1b5a44_0
  perl               conda-forge/linux-64::perl-5.26.2-h516909a_1006
  pip                conda-forge/noarch::pip-20.0.2-py_2
  pixman             conda-forge/linux-64::pixman-0.38.0-h516909a_1003
  pkg-config         conda-forge/linux-64::pkg-config-0.29.2-h516909a_1006
  prettytable        conda-forge/noarch::prettytable-0.7.2-py_3
  protobuf           conda-forge/linux-64::protobuf-3.11.4-py38he1b5a44_0
  psutil             conda-forge/linux-64::psutil-5.7.0-py38h1e0a361_1
  pthread-stubs      conda-forge/linux-64::pthread-stubs-0.4-h14c3975_1001
  pyasn1             conda-forge/noarch::pyasn1-0.4.8-py_0
  pyasn1-modules     conda-forge/noarch::pyasn1-modules-0.2.7-py_0
  pycparser          conda-forge/noarch::pycparser-2.20-py_0
  pygments           conda-forge/noarch::pygments-2.6.1-py_0
  pygraphviz         conda-forge/linux-64::pygraphviz-1.5-py38h1e0a361_1002
  pynacl             conda-forge/linux-64::pynacl-1.3.0-py38h516909a_1001
  pyopenssl          conda-forge/noarch::pyopenssl-19.1.0-py_1
  pyrsistent         conda-forge/linux-64::pyrsistent-0.16.0-py38h1e0a361_0
  pysftp             conda-forge/noarch::pysftp-0.2.9-py_1
  pysocks            conda-forge/linux-64::pysocks-1.7.1-py38h32f6830_1
  python             conda-forge/linux-64::python-3.8.2-h8356626_5_cpython
  python-dateutil    conda-forge/noarch::python-dateutil-2.8.1-py_0
  python-irodsclient conda-forge/noarch::python-irodsclient-0.8.2-py_0
  python_abi         conda-forge/linux-64::python_abi-3.8-1_cp38
  pytz               conda-forge/noarch::pytz-2019.3-py_0
  pyyaml             conda-forge/linux-64::pyyaml-5.3.1-py38h1e0a361_0
  ratelimiter        conda-forge/linux-64::ratelimiter-1.2.0-py38_1000
  readline           conda-forge/linux-64::readline-8.0-hf8c457e_0
  requests           conda-forge/noarch::requests-2.23.0-pyh8c360ce_2
  rsa                conda-forge/noarch::rsa-4.0-py_0
  s3transfer         conda-forge/linux-64::s3transfer-0.3.3-py38_0
  setuptools         conda-forge/linux-64::setuptools-46.1.1-py38h32f6830_0
  six                conda-forge/noarch::six-1.14.0-py_1
  smmap              conda-forge/noarch::smmap-3.0.1-py_0
  snakemake          bioconda/noarch::snakemake-5.12.3-0
  snakemake-minimal  bioconda/noarch::snakemake-minimal-5.12.3-py_0
  sqlite             conda-forge/linux-64::sqlite-3.30.1-hcee41ef_0
  tk                 conda-forge/linux-64::tk-8.6.10-hed695b0_0
  toposort           conda-forge/noarch::toposort-1.5-py_3
  traitlets          conda-forge/linux-64::traitlets-4.3.3-py38h32f6830_1
  urllib3            conda-forge/linux-64::urllib3-1.25.7-py38h32f6830_1
  wheel              conda-forge/noarch::wheel-0.34.2-py_1
  wrapt              conda-forge/linux-64::wrapt-1.12.1-py38h1e0a361_1
  xmlrunner          conda-forge/noarch::xmlrunner-1.7.7-py_0
  xorg-kbproto       conda-forge/linux-64::xorg-kbproto-1.0.7-h14c3975_1002
  xorg-libice        conda-forge/linux-64::xorg-libice-1.0.10-h516909a_0
  xorg-libsm         conda-forge/linux-64::xorg-libsm-1.2.3-h84519dc_1000
  xorg-libx11        conda-forge/linux-64::xorg-libx11-1.6.9-h516909a_0
  xorg-libxau        conda-forge/linux-64::xorg-libxau-1.0.9-h14c3975_0
  xorg-libxdmcp      conda-forge/linux-64::xorg-libxdmcp-1.1.3-h516909a_0
  xorg-libxext       conda-forge/linux-64::xorg-libxext-1.3.4-h516909a_0
  xorg-libxpm        conda-forge/linux-64::xorg-libxpm-3.5.13-h516909a_0
  xorg-libxrender    conda-forge/linux-64::xorg-libxrender-0.9.10-h516909a_1002
  xorg-libxt         conda-forge/linux-64::xorg-libxt-1.1.5-h516909a_1003
  xorg-renderproto   conda-forge/linux-64::xorg-renderproto-0.11.1-h14c3975_1002
  xorg-xextproto     conda-forge/linux-64::xorg-xextproto-7.3.0-h14c3975_1002
  xorg-xproto        conda-forge/linux-64::xorg-xproto-7.0.31-h14c3975_1007
  xz                 conda-forge/linux-64::xz-5.2.4-h516909a_1002
  yaml               conda-forge/linux-64::yaml-0.2.2-h516909a_1
  yarl               conda-forge/linux-64::yarl-1.3.0-py38h516909a_1000
  zipp               conda-forge/noarch::zipp-3.1.0-py_0
  zlib               conda-forge/linux-64::zlib-1.2.11-h516909a_1006
  zstd               conda-forge/linux-64::zstd-1.4.4-h3b9ef0a_2


Proceed ([y]/n)? y


Downloading and Extracting Packages
_openmp_mutex-4.5    | 5 KB      | ######################################################################################################### | 100% 
libcblas-3.8.0       | 10 KB     | ######################################################################################################### | 100% 
llvm-openmp-9.0.1    | 782 KB    | ######################################################################################################### | 100% 
liblapack-3.8.0      | 10 KB     | ######################################################################################################### | 100% 
libblas-3.8.0        | 10 KB     | ######################################################################################################### | 100% 
libopenblas-0.3.9    | 7.8 MB    | ######################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(py38_clean) ✔  ~
jupiter@08:46  ➤  
2 Likes