Difference between revisions of "HPC:CellRanger"
(Created page with "=== Cell Ranger === Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform c...") |
|||
(20 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
==== Usage ==== | ==== Usage ==== | ||
+ | Cell Ranger has different pipelines intended for different experiments. It is recommended that HPC users read the Cell Ranger documentation, available [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/what-is-cell-ranger '''here'''], for details on the available pipelines and the various command options '''before''' reading this section of our wiki. | ||
− | + | The sections below describe how to run Cell Ranger on our HPC system. | |
+ | ===== Cell Ranger module ===== | ||
+ | The Cell Ranger module needs to be enabled, before running Cell Ranger jobs (click "expand" to see basic module usage): | ||
+ | |||
+ | ---- | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | <pre> | ||
+ | [asrini@node156 ~]$ module show cellranger/5.0.1 | ||
+ | ------------------------------------------------------------------- | ||
+ | /usr/share/Modules/modulefiles/cellranger/5.0.1: | ||
+ | |||
+ | module-whatis Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis. | ||
+ | prepend-path PATH /opt/software/cellranger/5.0.1/bin | ||
+ | ------------------------------------------------------------------- | ||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | [asrini@node156 ~]$ module load cellranger/5.0.1 | ||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | |||
+ | [asrini@node156 ~]$ cellranger | ||
+ | cellranger cellranger-5.0.1 | ||
+ | Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data | ||
+ | |||
+ | USAGE: | ||
+ | cellranger <SUBCOMMAND> | ||
+ | |||
+ | FLAGS: | ||
+ | -h, --help Prints help information | ||
+ | -V, --version Prints version information | ||
+ | |||
+ | SUBCOMMANDS: | ||
+ | count Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well | ||
+ | multi Analyze gene expression (targeted or whole-transcriptome) and/or feature barcode and/or immune profiling data from a single sample and GEM well | ||
+ | vdj Assembles single-cell VDJ receptor sequences from 10x Immune Profiling libraries | ||
+ | aggr Aggregate data from multiple Cell Ranger runs | ||
+ | reanalyze Re-run secondary analysis (dimensionality reduction, clustering, etc) | ||
+ | targeted-compare Analyze targeted enrichment performance by comparing a targeted sample to its cognate parent WTA sample (used as input for targeted gene expression) | ||
+ | targeted-depth Estimate targeted read depth values (mean reads per cell) for a specified input parent WTA sample and a target panel CSV file | ||
+ | mkvdjref Prepare a reference for use with CellRanger VDJ | ||
+ | mkfastq Run Illumina demultiplexer on sample sheets that contain 10x-specific sample index sets | ||
+ | testrun Execute the 'count' pipeline on a small test dataset | ||
+ | mat2csv Convert a gene count matrix to CSV format | ||
+ | mkref Prepare a reference for use with 10x analysis software. Requires a GTF and FASTA | ||
+ | mkgtf Filter a GTF file by attribute prior to creating a 10x reference | ||
+ | upload Upload analysis logs to 10x Genomics support | ||
+ | sitecheck Collect linux system configuration information | ||
+ | help Prints this message or the help of the given subcommand(s) | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | [asrini@node156 ~]$ cellranger -V | ||
+ | cellranger cellranger-5.0.1 | ||
+ | </pre> | ||
+ | |||
+ | </div> | ||
===== Submitting Cell Ranger Jobs ===== | ===== Submitting Cell Ranger Jobs ===== | ||
+ | |||
+ | The Cell Ranger suite includes LSF integration and therefore jobs submitted using any of the Cell Ranger pipelines can be easily be dispatched to our HPC system. However, care must be taken to ensure that the resource requests match the actual Cell Ranger command (called "cellranger") options. | ||
+ | |||
+ | Per the documentation, by default, Cell Ranger will use 90% of available memory and all available cores. Unless the bsub command options explicitly request 90% of available memory/cores on each of our compute nodes, the --localmem and --localcores flags '''must''' be used to restrict resource usage to match the resource request. | ||
+ | |||
+ | ===== Sample job ===== | ||
+ | Below is a sample Cell Ranger job that will reserve 32 cores and 128GB RAM for the job. | ||
+ | |||
+ | ---- | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | <pre> | ||
+ | #!/bin/bash | ||
+ | #BSUB -J cellranger | ||
+ | #BSUB -o cellranger.%J.out | ||
+ | #BSUB -e cellranger.%J.error | ||
+ | #BSUB -n 32 | ||
+ | #BSUB -M 131072 | ||
+ | #BSUB -R "span[hosts=1] rusage [mem=131072]" | ||
+ | |||
+ | |||
+ | ## USAGE information | ||
+ | ## Run this job on the HPC as folows from the head node A(assuming the script is saved as "cellranger_bsub.sh"): | ||
+ | ## bsub < cellranger_bsub.sh | ||
+ | |||
+ | ##NOTE1: make sure you use the the "<" symbol above, it is not a typo and is needed | ||
+ | |||
+ | |||
+ | ##NOTE2: the command below assumes the "cellranger" executable is in your $PATH. You can either use the available module, which will set the $PATH correctly, or set $PATH to point to your own installation. | ||
+ | |||
+ | if [ -f /etc/profile.d/modules.sh ]; then | ||
+ | source /etc/profile.d/modules.sh | ||
+ | fi | ||
+ | |||
+ | module load cellranger/5.0.1 | ||
+ | |||
+ | cellranger count --transcriptome=<path-to-transcriptome-dir> --fastqs=<path-to-fastqs> --id=<id> --nopreflight --jobmode=local --localcores=32 --localmem=128 --nosecondary --chemistry=<option> | ||
+ | |||
+ | ##NOTE3: Make sure the --localcores and --localmem values in the above command match the #BSUB -n, #BSUB -M and #BSUB -R options at the top of the script | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | </div> | ||
+ | |||
+ | To submit it from the head node, you'll have to run it as follows (assuming the script is saved as "cellranger_bsub.sh"): | ||
+ | <pre> | ||
+ | bsub < cellranger_bsub.sh | ||
+ | </pre> | ||
+ | NOTE: the "<" is required. | ||
+ | |||
+ | ===== Optimizations ===== | ||
+ | |||
+ | ====== CPU/RAM ====== | ||
+ | |||
+ | Since all our HPC compute nodes have at minimum 88 cores and between 256-512GB, it possible to set the --localmem and --localcores to higher values. However, per the Cell Ranger documentation there is notable diminishing return beyond 128GB RAM (--localmem) and 32 threads (--localcores). | ||
+ | |||
+ | ====== Cluster Mode ('''Optional''') ====== | ||
+ | |||
+ | Cluster mode is one of three primary ways of running Cell Ranger. The example code provided above uses the "Job Submission Mode" wherein the node to which the job is dispatched is treated as it were a "local server". | ||
+ | |||
+ | The "Cluster Mode" functionality can be used by setting "--jobmode=lsf" for the cellranger command '''AND''' setting the various other MRO flags as described in the [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/advanced/cluster-mode documentation]. Specifically, the following options must be set: __MRO_THREADS__ AND __MRO_MEM_MB__ | ||
+ | |||
+ | When the Cell Ranger module on the HPC is enabled, two additional variables are set - $MARTIAN and $JOBMGRS. | ||
+ | |||
+ | The $MARTIAN variable points to the martian bin directory while the $JOBMRS template contains the template files for "Cluster Mode" job use: | ||
+ | |||
+ | <pre> | ||
+ | [asrini@node157 ~]$ module load cellranger/5.0.1 | ||
+ | |||
+ | [asrini@node157 ~]$ $MARTIAN/mro version | ||
+ | v4.0.2 | ||
+ | </pre> | ||
+ | |||
+ | <pre> | ||
+ | [asrini@node157 ~]$ ls $JOBMGRS/ | ||
+ | config.json lsf.template.example pbspro.template.example retry.json sge_queue.py sge.template.example slurm.template.example torque.template.example | ||
+ | </pre> | ||
=== Other Pages === | === Other Pages === |
Latest revision as of 19:52, 26 January 2021
Contents
Cell Ranger
Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis.
Note: At present, we are not providing References for any species. HPC users will have to download and build these as needed.
Usage
Cell Ranger has different pipelines intended for different experiments. It is recommended that HPC users read the Cell Ranger documentation, available here, for details on the available pipelines and the various command options before reading this section of our wiki.
The sections below describe how to run Cell Ranger on our HPC system.
Cell Ranger module
The Cell Ranger module needs to be enabled, before running Cell Ranger jobs (click "expand" to see basic module usage):
[asrini@node156 ~]$ module show cellranger/5.0.1 ------------------------------------------------------------------- /usr/share/Modules/modulefiles/cellranger/5.0.1: module-whatis Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis. prepend-path PATH /opt/software/cellranger/5.0.1/bin -------------------------------------------------------------------
[asrini@node156 ~]$ module load cellranger/5.0.1
[asrini@node156 ~]$ cellranger cellranger cellranger-5.0.1 Process 10x Genomics Gene Expression, Feature Barcode, and Immune Profiling data USAGE: cellranger <SUBCOMMAND> FLAGS: -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: count Count gene expression (targeted or whole-transcriptome) and/or feature barcode reads from a single sample and GEM well multi Analyze gene expression (targeted or whole-transcriptome) and/or feature barcode and/or immune profiling data from a single sample and GEM well vdj Assembles single-cell VDJ receptor sequences from 10x Immune Profiling libraries aggr Aggregate data from multiple Cell Ranger runs reanalyze Re-run secondary analysis (dimensionality reduction, clustering, etc) targeted-compare Analyze targeted enrichment performance by comparing a targeted sample to its cognate parent WTA sample (used as input for targeted gene expression) targeted-depth Estimate targeted read depth values (mean reads per cell) for a specified input parent WTA sample and a target panel CSV file mkvdjref Prepare a reference for use with CellRanger VDJ mkfastq Run Illumina demultiplexer on sample sheets that contain 10x-specific sample index sets testrun Execute the 'count' pipeline on a small test dataset mat2csv Convert a gene count matrix to CSV format mkref Prepare a reference for use with 10x analysis software. Requires a GTF and FASTA mkgtf Filter a GTF file by attribute prior to creating a 10x reference upload Upload analysis logs to 10x Genomics support sitecheck Collect linux system configuration information help Prints this message or the help of the given subcommand(s)
[asrini@node156 ~]$ cellranger -V cellranger cellranger-5.0.1
Submitting Cell Ranger Jobs
The Cell Ranger suite includes LSF integration and therefore jobs submitted using any of the Cell Ranger pipelines can be easily be dispatched to our HPC system. However, care must be taken to ensure that the resource requests match the actual Cell Ranger command (called "cellranger") options.
Per the documentation, by default, Cell Ranger will use 90% of available memory and all available cores. Unless the bsub command options explicitly request 90% of available memory/cores on each of our compute nodes, the --localmem and --localcores flags must be used to restrict resource usage to match the resource request.
Sample job
Below is a sample Cell Ranger job that will reserve 32 cores and 128GB RAM for the job.
#!/bin/bash #BSUB -J cellranger #BSUB -o cellranger.%J.out #BSUB -e cellranger.%J.error #BSUB -n 32 #BSUB -M 131072 #BSUB -R "span[hosts=1] rusage [mem=131072]" ## USAGE information ## Run this job on the HPC as folows from the head node A(assuming the script is saved as "cellranger_bsub.sh"): ## bsub < cellranger_bsub.sh ##NOTE1: make sure you use the the "<" symbol above, it is not a typo and is needed ##NOTE2: the command below assumes the "cellranger" executable is in your $PATH. You can either use the available module, which will set the $PATH correctly, or set $PATH to point to your own installation. if [ -f /etc/profile.d/modules.sh ]; then source /etc/profile.d/modules.sh fi module load cellranger/5.0.1 cellranger count --transcriptome=<path-to-transcriptome-dir> --fastqs=<path-to-fastqs> --id=<id> --nopreflight --jobmode=local --localcores=32 --localmem=128 --nosecondary --chemistry=<option> ##NOTE3: Make sure the --localcores and --localmem values in the above command match the #BSUB -n, #BSUB -M and #BSUB -R options at the top of the script
To submit it from the head node, you'll have to run it as follows (assuming the script is saved as "cellranger_bsub.sh"):
bsub < cellranger_bsub.sh
NOTE: the "<" is required.
Optimizations
CPU/RAM
Since all our HPC compute nodes have at minimum 88 cores and between 256-512GB, it possible to set the --localmem and --localcores to higher values. However, per the Cell Ranger documentation there is notable diminishing return beyond 128GB RAM (--localmem) and 32 threads (--localcores).
Cluster Mode (Optional)
Cluster mode is one of three primary ways of running Cell Ranger. The example code provided above uses the "Job Submission Mode" wherein the node to which the job is dispatched is treated as it were a "local server".
The "Cluster Mode" functionality can be used by setting "--jobmode=lsf" for the cellranger command AND setting the various other MRO flags as described in the documentation. Specifically, the following options must be set: __MRO_THREADS__ AND __MRO_MEM_MB__
When the Cell Ranger module on the HPC is enabled, two additional variables are set - $MARTIAN and $JOBMGRS.
The $MARTIAN variable points to the martian bin directory while the $JOBMRS template contains the template files for "Cluster Mode" job use:
[asrini@node157 ~]$ module load cellranger/5.0.1 [asrini@node157 ~]$ $MARTIAN/mro version v4.0.2
[asrini@node157 ~]$ ls $JOBMGRS/ config.json lsf.template.example pbspro.template.example retry.json sge_queue.py sge.template.example slurm.template.example torque.template.example