Difference between revisions of "HPC:kClust"
From HPC wiki
(Created page with "=== kClust === kClust is a program intended for fast and sensitive clustering of large protein sequence databases. kClust v1.0 is installed across all the HPC nodes. === Usag...") |
|||
Line 61: | Line 61: | ||
=== Other Pages === | === Other Pages === | ||
+ | ---- | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | *[[HPC:FAQ|HPC FAQ ]] | ||
+ | *[[HPC:Login|Connecting to the PMACS cluster]] | ||
+ | *[[HPC:User_Guide|User Guide]] | ||
*[[HPC:Software|Available Software]] | *[[HPC:Software|Available Software]] | ||
− | *[[HPC: | + | *[[HPC:Archive System|PMACS Archive System]] |
− | + | </div> | |
− |
Latest revision as of 15:44, 13 August 2019
kClust
kClust is a program intended for fast and sensitive clustering of large protein sequence databases. kClust v1.0 is installed across all the HPC nodes.
Usage
kClust can be loaded as module.
[asrini@node062 ~]$ module show kClust-1.0 ------------------------------------------------------------------- /usr/share/Modules/modulefiles/kClust-1.0: module-whatis kClust: fast and sensitive clustering of large protein sequence databases. This version is compiled against our version of GCC and our architecture. prepend-path PATH /opt/software/kClust/1.0/bin ------------------------------------------------------------------- [asrini@node062 ~]$ module load kClust-1.0 [asrini@node062 ~]$ which kClust /opt/software/kClust/1.0/bin/kClust [asrini@node062 ~]$ kClust --help Usage: ./kClust -i [fasta-db-file] -d [directory] [options] Version 1.0 kClust is a clustering program for protein sequences. Written by Christian Mayer (christian.eberhard.mayer@googlemail.com) and Maria Hauser (mhauser@genzentrum.lmu.de) Required arguments: -i [fasta-db-file] : Sequence database in fasta format or directory with the output of the previous kClust run if -P option is set. -d [directory] : Directory for temporary and result files. Optional arguments: -M [megabytes] : Memory limit for clustering (default=3500MB). -P : Cluster profiles computed from existing alignment files (default=false). -sc : Use sequence background frequency score correction for the k-mer scores (default=false). -td [directory] : Directory for temporary files (default=WORKING_DIR/tmp) -s [float] : Clustering threshold (score per column) (default=1.12 half bits ~ 30% sequence identity). Set to zero for the clustering based only on the e-value of the hit. -e [float] : Clustering E-value threshold (default=1.0e-4). -c [float] : Alignment coverage of the longer sequence (default=0.8). --merge-ncbi-headers : Compress NCBI headers in representatives database, creating a merged header instead of the representative sequence header. --merge-uniprot-headers : Compress Uniprot headers in representatives database, creating a merged header instead of the representative sequence header. --write-time-benchmark : Write time benchmark files, containing sequences which consume the most computation time (default=false). Expert arguments: --filter-k [integer] : Length of k-mers for similarity scoring filter (default=6). --filter-T [float] : Similarity threshold for filter k-mer generation (default=4.3 half bits). --filter-t [float] : k-mer score threshold for prefiltering (default=0.55 half bits). --kdp-k [integer] : Length of k-mers for kDP alignments (default=4). --kdp-T [float] : Similarity threshold for kDP k-mer generation (default=2.9 half bits). --kdp-G [float] : Gap open penalty (default=12.0 half bits). --kdp-E [float] : Gap extension penalty (default=2.0 half bits). --kdp-F [float] : Intra-diagonal gap penalty (default=0.27 half bits). --kdp-delta [integer] : Width of delta window (default=50). Sequence identity ~ score per column (see -s option): 20% 30% 40% 50% 60% 70% 80% 90% 99% 0.52 1.12 1.73 2.33 2.93 3.53 4.14 4.74 5.28