HPC:NGSUtils
From HPC wiki
NGSUtils
NGSUtils is a suite of software tools for working with next-generation sequencing datasets. NGSUtils v0.5.7 is installed across all HPC nodes.
Usage
The entire NGSUtils suite, which includes the ngsutils, bamutils, bedutils, fastqutils and gfutils commands, can be loaded by a module
[asrini@node062 ~]$ module show ngsutils-0.5.7 ------------------------------------------------------------------- /usr/share/Modules/modulefiles/ngsutils-0.5.7: module-whatis NGSUtils is a suite of software tools for working with next-generation sequencing datasets prepend-path PATH /opt/software/ngsutils/0.5.7/bin ------------------------------------------------------------------- [asrini@node062 ~]$ module load ngsutils-0.5.7 [asrini@node062 ~]$ which ngsutils /opt/software/ngsutils/0.5.7/bin/ngsutils [asrini@node062 ~]$ ngsutils Usage: ngsutils COMMAND Commands update - Updates NGSUtils from git repository (http://github.com/ngsutils/ngsutils) repeat2fasta - Extract repeatmasker flagged regions to a FASTA file strip_fasta - Remove sequences from a FASTA file based on name tag_fasta - Tag FASTA sequence names with a prefix or suffix tabixindex - Index a tab-delimited file using Tabix and bgzip Run 'ngsutils help CMD' for more information about a specific command ngsutils 0.5.7-efb237d [asrini@node062 ~]$ bamutils Usage: bamutils COMMAND Commands DNA-seq basecall - Base/variant caller RNA-seq count - Calculates counts/FPKM for genes/BED regions/repeats (also CNV) General best - Filter out multiple mappings for a read, selecting only the best convertregion - Converts region mapping to genomic mapping export - Export reads, mapped positions, and other tags expressed - Finds regions expressed in a BAM file extract - Extracts reads based on regions in a BED file filter - Removes reads from a BAM file based on criteria innerdist - Calculate the inner mate-pair distance from two BAM files junctioncount - Counts the number of reads spanning individual junctions. keepbest - Parses BAM file and keeps the best mapping for reads that have multiple mappings merge - Combine multiple BAM files together (taking best-matches) pair - Given two separately mapped paired files, re-pair the files peakheight - Find the size (max height, width) of given peaks (BED) in a BAM file renamepair - Postprocesses a BAM file to rename pairs that have an extra /N value split - Splits a BAM file into smaller pieces stats - Calculates simple stats for a BAM file tag - Update read names with a suffix (for merging) Conversion tobed - Convert BAM reads to BED regions tobedgraph - Convert BAM coverage to bedGraph (for visualization) tofasta - Convert BAM reads to FASTA sequences tofastq - Convert BAM reads back to FASTQ sequences Misc check - Checks a BAM file for corruption cleancigar - Fixes BAM files where the CIGAR alignment has a zero length element Run 'bamutils help CMD' for more information about a specific command [asrini@node062 ~]$ bedutils Usage: bedutils COMMAND Commands General clean - Cleans a BED file (score should be integers) extend - Extends BED regions (3') overlap - Find overlapping BED regions from a query and target file reduce - Merges overlapping BED regions refcount - Given a number of BED files, calculate the number of samples that overlap regions in a reference BED file sizes - Extract the sizes of BED regions sort - Sorts a BED file (in place) stats - Calculates simple stats for a BED file subtract - Subtracts one set of BED regions from another Conversion annotate - Annotate BED files by adding / altering columns frombasecall - Converts a file in basecall format to BED3 format fromprimers - Converts a list of PCR primer pairs to BED regions fromvcf - Converts a file in VCF format to BED6 tobed3 - Removes extra columns from a BED (or BED compatible) file tobed6 - Removes extra columns from a BED (or BED compatible) file tobedgraph - BED to BedGraph tofasta - Extract BED regions from a reference FASTA file Misc cleanbg - Cleans up a bedgraph file Run 'bedutils help CMD' for more information about a specific command [asrini@node062 ~]$ fastqutils Usage: fastqutils COMMAND Commands General barcode_split - Splits a FASTQ/FASTA file based on sequence barcodes filter - Filter out reads using a number of metrics merge - Merges paired FASTQ files into one file names - Write out the read names properpairs - Find properly paired reads (when fragments are filtered separately) revcomp - Reverse compliment a FASTQ file sort - Sorts a FASTQ file by name or sequence split - Splits a FASTQ file into N chunks stats - Calculate summary statistics for a FASTQ file tag - Adds a prefix or suffix to the read names in a FASTQ file tile - Splits long FASTQ reads into smaller (tiled) chunks trim - Remove 5' and 3' linker sequences (slow, S/W aligned) truncate - Truncates reads to a maximum length unmerge - Unmerged paired FASTQ files into two (or more) files Conversion convertqual - Converts qual values from Illumina to Sanger scale csencode - Converts color-space FASTQ file to encoded FASTQ fromfasta - Converts (cs)FASTA/qual files to FASTQ format fromqseq - Converts Illumina qseq (export/sorted) files to FASTQ tobam - Converts to BAM format (unmapped) tofasta - Converts to FASTA format (seq or qual) Run 'fastqutils help CMD' for more information about a specific command [asrini@node062 ~]$ gtfutils Usage: gtfutils COMMAND Commands General add_isoform - Appends isoform annotation from UCSC isoforms file add_reflink - Appends isoform/name annotation from RefSeq/refLink add_xref - Appends name annotation from UCSC Xref file annotate - Annotates genomic positions based on a GTF model filter - Filter annotations from a GTF file genesize - Extract genomic/transcript sizes for genes junctions - Build a junction library from FASTA and GTF model query - Query a GTF file by coordinates Conversion tobed - Convert a GFF/GTF file to BED format Run 'gtfutils help CMD' for more information about a specific command