Your home folder on Perlmutter has a size limit of 40GB, but the $SCRATCH space allows you up to 20TB. GenomeFace requires about 4GB for package caching and an additional 10GB for the environment, so we'll be using the $SCRATCH directory for our installations.
Mamba is a fast, robust package manager that is a near drop-in replacement for Conda. It is preferred over Conda due to its significantly faster performance, particularly when installing large packages and resolving dependencies. Its use is highly recommended.
If you don't have Mamba or Conda already installed, please execute the following commands to download and install Mamba:
# Download Miniforge installer (includes Mamba)
curl -L -o Mambaforge-Linux-x86_64.sh https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
# Install Miniforge to your $SCRATCH directory
bash Mambaforge-Linux-x86_64.sh -b -p $SCRATCH/mambaforge
# Add Mamba to your PATH for the current session
export PATH=$SCRATCH/mambaforge/bin:$PATH
# Activate the base environment
mamba init
# Restart your shell or run the following
exec $SHELL
To change the package cache location, you'll need to modify your .condarc file or create one if it doesn't exist. This step ensures that packages are stored in $SCRATCH/pkgcache:
# make pkgcache directory
mkdir -p $SCRATCH/pkgcache
# Create or edit .condarc to update the package cache directory
echo "pkgs_dirs:
- $SCRATCH/pkgcache" > ~/.condarc
Now, create a new environment specifically for GenomeFace with the prefix $SCRATCH/genomeface.
# Create the new environment
mamba create --prefix $SCRATCH/genomeface
# Activate the environment
mamba activate $SCRATCH/genomeface
With the environment activated, you can now install GenomeFace:
# Install GenomeFace
mamba install genomeface cuda-version=11.7 -c nvidia -c rapidsai -c conda-forge -c bioconda -c https://portal.nersc.gov/project/m1982/genomeface_repo
Or, for the experimental version w/ eukaroyte support:
mamba install genomeface cuda-version=11.7 -c nvidia -c rapidsai -c conda-forge -c bioconda -c https://portal.nersc.gov/project/m1982/genomeface_repo_euk
Remember to reactivate your conda environment with mamba activate $SCRATCH/genomeface anytime you want to use it.
If you need a quick reference or run into issues, you can always display the help message using the -h flag. This will also show the detected GPUs.
genomeface -h
Help Output:
usage: genomeface [-h] -i INPUT_FILE -a ABUNDANCE_FILE -g MARKERS_FILE -o OUTPUT_FOLDER
[-m MIN_CONTIG_LENGTH] [-p MIN_PURITY] [-s]
GenomeFace Prerelease
- A next-generation tool for metagenomic binning, using deep learning and multi-GPU accelerated
clustering. Ideal for large-scale, real-world data.
The Exabiome Project (Lawrence Berkeley National Laboratory)
- Contact rlettich@lbl.gov for issues or unexpected poor performance.
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE Input FASTA file containing metagenome assembly (optionally gzipped).
-a ABUNDANCE_FILE MetaBAT 2 style TSV file containing abundance data. Typically produced by
the `jgi_summarize_depths` or `coverm` programs.
-g MARKERS_FILE Input TSV file describing marker genes found on each contig. Can be
produced by the included `markersgf` program.
-o OUTPUT_FOLDER Output folder for writing bin FASTA files
-m MIN_CONTIG_LENGTH Minimum contig length to be considered for binning (default: 1500).
-p MIN_PURITY Minimum marker gene estimated % purity for selecting clusters for output.
Balances Precsion / Recall (default: 85)
-s Specifies that the input FASTA is multiple single sample assemblies,
concatenated.
CUDA Devices (Clustering Acceleration):
- Device 0: NVIDIA A100-PCIE-40GB
Tensorflow Enabled Devices (Neural Network Acceleration):
- /physical_device:GPU:0
Notes:
- By default, coassembly is assumed. For concatenated single-sample assemblies, use the [-s] flag.
Examples:
genomeface -i coassembly.fa.gz -a abundance.tsv -g markers.tsv -o ./output
genomeface -i concatenated_assemblies.fa.gz -a abundance.tsv -g markers.tsv -o ./output -s -m 1000
If all goes well, you should be able to run it, and see something like this: