HPC:CellRanger

From HPC wiki

Cell Ranger

Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis.

Note: At present, we are not providing References for any species. HPC users will have to download and build these as needed.

Usage

Cell Ranger has different pipelines intended for different experiments. It is recommended that HPC users read the Cell Ranger documentation, available here, for details on the available pipelines and the various command options before reading this section of our wiki.

The sections below describe how to run Cell Ranger on our HPC system.

Cell Ranger module

The Cell Ranger module needs to be enabled, before running Cell Ranger jobs (click "expand" to see basic module usage):


[asrini@node156 ~]$ module show cellranger/5.0.1 
-------------------------------------------------------------------
/usr/share/Modules/modulefiles/cellranger/5.0.1:

module-whatis	 Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis. 
prepend-path	 PATH /opt/software/cellranger/5.0.1/bin 
-------------------------------------------------------------------
[asrini@node156 ~]$ module load cellranger/5.0.1 

[asrini@node156 ~]$ cellranger 
cellranger cellranger-5.0.1
Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data

USAGE:
    cellranger <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

SUBCOMMANDS:
    count               Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well
    multi               Analyze gene expression (targeted or whole-transcriptome) and/or feature barcode and/or immune profiling data from a single sample and GEM well
    vdj                 Assembles single-cell VDJ receptor sequences from 10x Immune Profiling libraries
    aggr                Aggregate data from multiple Cell Ranger runs
    reanalyze           Re-run secondary analysis (dimensionality reduction, clustering, etc)
    targeted-compare    Analyze targeted enrichment performance by comparing a targeted sample to its cognate parent WTA sample (used as input for targeted gene expression)
    targeted-depth      Estimate targeted read depth values (mean reads per cell) for a specified input parent WTA sample and a target panel CSV file
    mkvdjref            Prepare a reference for use with CellRanger VDJ
    mkfastq             Run Illumina demultiplexer on sample sheets that contain 10x-specific sample index sets
    testrun             Execute the 'count' pipeline on a small test dataset
    mat2csv             Convert a gene count matrix to CSV format
    mkref               Prepare a reference for use with 10x analysis software. Requires a GTF and FASTA
    mkgtf               Filter a GTF file by attribute prior to creating a 10x reference
    upload              Upload analysis logs to 10x Genomics support
    sitecheck           Collect linux system configuration information
    help                Prints this message or the help of the given subcommand(s)

[asrini@node156 ~]$ cellranger  -V
cellranger cellranger-5.0.1
Submitting Cell Ranger Jobs

The Cell Ranger suite includes LSF integration and therefore jobs submitted using any of the Cell Ranger pipelines can be easily be dispatched to our HPC system. However, care must be taken to ensure that the resource requests match the actual Cell Ranger command (called "cellranger") options.

Per the documentation, by default, Cell Ranger will use 90% of available memory and all available cores. Unless the bsub command options explicitly request 90% of available memory/cores on each of our compute nodes, the --localmem and --localcores flags must be used to restrict resource usage to match the resource request.

Sample job

Below is a sample Cell Ranger job that will reserve 32 cores and 128GB RAM for the job.


#!/bin/bash
#BSUB -J cellranger 
#BSUB -o cellranger.%J.out
#BSUB -e cellranger.%J.error
#BSUB -n 32 
#BSUB -M 131072
#BSUB -R "span[hosts=1] rusage [mem=131072]" 


## USAGE information
## Run this job on the HPC as folows from the head node A(assuming the script is saved as "cellranger_bsub.sh"): 
## bsub < cellranger_bsub.sh

##NOTE1: make sure you use the the "<" symbol above, it is not a typo and is needed


##NOTE2: the command below assumes the "cellranger" executable is in your $PATH. You can either use the available module, which will set the $PATH correctly, or set $PATH to point to your own installation.

if [ -f /etc/profile.d/modules.sh ]; then
   source /etc/profile.d/modules.sh
fi

module load cellranger/5.0.1

cellranger count --transcriptome=<path-to-transcriptome-dir> --fastqs=<path-to-fastqs> --id=<id> --nopreflight --jobmode=local --localcores=32 --localmem=128 --nosecondary --chemistry=<option>

##NOTE3: Make sure the --localcores and --localmem values in the above command match the #BSUB -n, #BSUB -M and #BSUB -R options at the top of the script

To submit it from the head node, you'll have to run it as follows (assuming the script is saved as "cellranger_bsub.sh"):

bsub < cellranger_bsub.sh

NOTE: the "<" is required.

Optimizations
CPU/RAM

Since all our HPC compute nodes have at minimum 88 cores and between 256-512GB, it possible to set the --localmem and --localcores to higher values. However, per the Cell Ranger documentation there is notable diminishing return beyond 128GB RAM (--localmem) and 32 threads (--localcores).

Cluster Mode (Optional)

Cluster mode is one of three primary ways of running Cell Ranger. The example code provided above uses the "Job Submission Mode" wherein the node to which the job is dispatched is treated as it were a "local server".

The "Cluster Mode" functionality can be used by setting "--jobmode=lsf" for the cellranger command AND setting the various other MRO flags as described in the documentation. Specifically, the following options must be set: __MRO_THREADS__ AND __MRO_MEM_MB__

When the Cell Ranger module on the HPC is enabled, two additional variables are set - $MARTIAN and $JOBMGRS.

The $MARTIAN variable points to the martian bin directory while the $JOBMRS template contains the template files for "Cluster Mode" job use:

 
[asrini@node157 ~]$ module load cellranger/5.0.1 

[asrini@node157 ~]$ $MARTIAN/mro version
v4.0.2
 
[asrini@node157 ~]$ ls $JOBMGRS/
config.json  lsf.template.example  pbspro.template.example  retry.json  sge_queue.py  sge.template.example  slurm.template.example  torque.template.example

Other Pages