HPC:NGSUtils
From HPC wiki
NGSUtils
NGSUtils is a suite of software tools for working with next-generation sequencing datasets. NGSUtils v0.5.7 is installed across all HPC nodes.
Usage
The entire NGSUtils suite, which includes the ngsutils, bamutils, bedutils, fastqutils and gfutils commands, can be loaded by a module
[asrini@node062 ~]$ module show ngsutils-0.5.7
-------------------------------------------------------------------
/usr/share/Modules/modulefiles/ngsutils-0.5.7:
module-whatis NGSUtils is a suite of software tools for working with next-generation sequencing datasets
prepend-path PATH /opt/software/ngsutils/0.5.7/bin
-------------------------------------------------------------------
[asrini@node062 ~]$ module load ngsutils-0.5.7
[asrini@node062 ~]$ which ngsutils
/opt/software/ngsutils/0.5.7/bin/ngsutils
[asrini@node062 ~]$ ngsutils
Usage: ngsutils COMMAND
Commands
update - Updates NGSUtils from git repository
(http://github.com/ngsutils/ngsutils)
repeat2fasta - Extract repeatmasker flagged regions to a FASTA file
strip_fasta - Remove sequences from a FASTA file based on name
tag_fasta - Tag FASTA sequence names with a prefix or suffix
tabixindex - Index a tab-delimited file using Tabix and bgzip
Run 'ngsutils help CMD' for more information about a specific command
ngsutils 0.5.7-efb237d
[asrini@node062 ~]$ bamutils
Usage: bamutils COMMAND
Commands
DNA-seq
basecall - Base/variant caller
RNA-seq
count - Calculates counts/FPKM for genes/BED regions/repeats (also CNV)
General
best - Filter out multiple mappings for a read, selecting only the best
convertregion - Converts region mapping to genomic mapping
export - Export reads, mapped positions, and other tags
expressed - Finds regions expressed in a BAM file
extract - Extracts reads based on regions in a BED file
filter - Removes reads from a BAM file based on criteria
innerdist - Calculate the inner mate-pair distance from two BAM files
junctioncount - Counts the number of reads spanning individual junctions.
keepbest - Parses BAM file and keeps the best mapping for reads that have multiple mappings
merge - Combine multiple BAM files together (taking best-matches)
pair - Given two separately mapped paired files, re-pair the files
peakheight - Find the size (max height, width) of given peaks (BED) in a BAM file
renamepair - Postprocesses a BAM file to rename pairs that have an extra /N value
split - Splits a BAM file into smaller pieces
stats - Calculates simple stats for a BAM file
tag - Update read names with a suffix (for merging)
Conversion
tobed - Convert BAM reads to BED regions
tobedgraph - Convert BAM coverage to bedGraph (for visualization)
tofasta - Convert BAM reads to FASTA sequences
tofastq - Convert BAM reads back to FASTQ sequences
Misc
check - Checks a BAM file for corruption
cleancigar - Fixes BAM files where the CIGAR alignment has a zero length element
Run 'bamutils help CMD' for more information about a specific command
[asrini@node062 ~]$ bedutils
Usage: bedutils COMMAND
Commands
General
clean - Cleans a BED file (score should be integers)
extend - Extends BED regions (3')
overlap - Find overlapping BED regions from a query and target file
reduce - Merges overlapping BED regions
refcount - Given a number of BED files, calculate the number of samples that overlap regions in a reference BED file
sizes - Extract the sizes of BED regions
sort - Sorts a BED file (in place)
stats - Calculates simple stats for a BED file
subtract - Subtracts one set of BED regions from another
Conversion
annotate - Annotate BED files by adding / altering columns
frombasecall - Converts a file in basecall format to BED3 format
fromprimers - Converts a list of PCR primer pairs to BED regions
fromvcf - Converts a file in VCF format to BED6
tobed3 - Removes extra columns from a BED (or BED compatible) file
tobed6 - Removes extra columns from a BED (or BED compatible) file
tobedgraph - BED to BedGraph
tofasta - Extract BED regions from a reference FASTA file
Misc
cleanbg - Cleans up a bedgraph file
Run 'bedutils help CMD' for more information about a specific command
[asrini@node062 ~]$ fastqutils
Usage: fastqutils COMMAND
Commands
General
barcode_split - Splits a FASTQ/FASTA file based on sequence barcodes
filter - Filter out reads using a number of metrics
merge - Merges paired FASTQ files into one file
names - Write out the read names
properpairs - Find properly paired reads (when fragments are filtered separately)
revcomp - Reverse compliment a FASTQ file
sort - Sorts a FASTQ file by name or sequence
split - Splits a FASTQ file into N chunks
stats - Calculate summary statistics for a FASTQ file
tag - Adds a prefix or suffix to the read names in a FASTQ file
tile - Splits long FASTQ reads into smaller (tiled) chunks
trim - Remove 5' and 3' linker sequences (slow, S/W aligned)
truncate - Truncates reads to a maximum length
unmerge - Unmerged paired FASTQ files into two (or more) files
Conversion
convertqual - Converts qual values from Illumina to Sanger scale
csencode - Converts color-space FASTQ file to encoded FASTQ
fromfasta - Converts (cs)FASTA/qual files to FASTQ format
fromqseq - Converts Illumina qseq (export/sorted) files to FASTQ
tobam - Converts to BAM format (unmapped)
tofasta - Converts to FASTA format (seq or qual)
Run 'fastqutils help CMD' for more information about a specific command
[asrini@node062 ~]$ gtfutils
Usage: gtfutils COMMAND
Commands
General
add_isoform - Appends isoform annotation from UCSC isoforms file
add_reflink - Appends isoform/name annotation from RefSeq/refLink
add_xref - Appends name annotation from UCSC Xref file
annotate - Annotates genomic positions based on a GTF model
filter - Filter annotations from a GTF file
genesize - Extract genomic/transcript sizes for genes
junctions - Build a junction library from FASTA and GTF model
query - Query a GTF file by coordinates
Conversion
tobed - Convert a GFF/GTF file to BED format
Run 'gtfutils help CMD' for more information about a specific command
