Difference between revisions of "HPC:R"
Line 2: | Line 2: | ||
There are currently several versions of the R programming language installed across all the HPC compute nodes. | There are currently several versions of the R programming language installed across all the HPC compute nodes. | ||
− | === Running R programs === | + | === Running R programs on the PMACS HPC system === |
Various versions of R and other software packages are available as modules '''only''' on compute nodes (both interactive and non-interactive). '''Do not''' try to run these on the head node. If attempting to run R interactively, first launch an interactive session: | Various versions of R and other software packages are available as modules '''only''' on compute nodes (both interactive and non-interactive). '''Do not''' try to run these on the head node. If attempting to run R interactively, first launch an interactive session: | ||
Line 13: | Line 13: | ||
</pre> | </pre> | ||
+ | Please read the rest of this section, before launching R jobs on the PMACS HPC system. | ||
=== Available Versions === | === Available Versions === |
Revision as of 20:53, 22 May 2018
Contents
R (programming language)
There are currently several versions of the R programming language installed across all the HPC compute nodes.
Running R programs on the PMACS HPC system
Various versions of R and other software packages are available as modules only on compute nodes (both interactive and non-interactive). Do not try to run these on the head node. If attempting to run R interactively, first launch an interactive session:
[asrini@consign ~]$ bsub -Is bash Job <35804293> is submitted to default queue <interactive>. <<Waiting for dispatch ...>> <<Starting on node060.hpc.local>> [asrini@node062 ~]$
Please read the rest of this section, before launching R jobs on the PMACS HPC system.
Available Versions
[asrini@node062 ~]$ module avail R --------------------------------------------------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------------------------------------------------- R-3.1.1 R-3.1.2 R-3.2.1 R-3.2.2
Usage
Currently, the default version of R installed across all the HPC nodes, is version 3.0.1.
[asrini@node062 ~]$ which R /usr/bin/R [asrini@node062 ~]$ R --version R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see http://www.gnu.org/licenses/.
Note: This version (3.0.1) is likely to be removed as the default version and will be made available as module.
Other R versions installed across the HPC nodes can be loaded as a module
[asrini@node062 ~]$ module show R-3.1.1 ------------------------------------------------------------------- /usr/share/Modules/modulefiles/R-3.1.1: module-whatis GNU R prepend-path PATH /opt/software/R/3.1.1/bin prepend-path MANPATH /opt/software/R/3.1.1/share/man ------------------------------------------------------------------- [asrini@node062 ~]$ module load R-3.1.1 [asrini@node062 ~]$ R --version R version 3.1.1 (2014-07-10) -- "Sock it to Me" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see http://www.gnu.org/licenses/. [asrini@node062 ~]$ module show R-3.1.2 ------------------------------------------------------------------- /usr/share/Modules/modulefiles/R-3.1.2: module-whatis GNU R prepend-path PATH /opt/software/R/3.1.2/bin prepend-path MANPATH /opt/software/R/3.1.2/share/man ------------------------------------------------------------------- [asrini@node062 ~]$ module load R-3.1.2 [asrini@node062 ~]$ R --version R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet" Copyright (C) 2014 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 2 or 3. For more information about these matters see http://www.gnu.org/licenses/.
R libraries/packages
Several difficult to install R packages/libraries have been installed across all the HPC nodes for every version of R that is installed on the cluster. These typically include various BioConductor packages and some other packages that would otherwise need administrative privileges to install. Users are encouraged to install R packages needed for their work, if the desired package is not already installed. To see a full listing of R packages installed run the following command in an interactive shell:
[asrini@node062 ~]$ echo 'library()' | R --slave
Running R doParallel jobs
The doParallel package is a "parallel backend" for the foreach package. It provides a mechanism needed to execute foreach loops in parallel. This section specifically covers the use of the R doParallel library on the PMACS cluster nodes.
Details regarding the package and its manual can be found on CRAN
Install doParallel
The doParallel library is not installed by default so you will have to install it.
[asrini@consign ~]$ bsub -Is bash [asrini@node063 ~]$ R > install.packages("doParallel");
Note 1: The doParallel library will be installed in under $HOME/R/x86_64-redhat-linux-gnu-library/3.0/
Note 2: If an alternative version of R is preferred, make sure you load the appropriate module for that version before attempting the install. Also make sure you are using that specific version of R before running jobs.
Using doParallel
The doParallel library has the ability to detect the number of CPU cores a given system has and can spawn one thread on each core that was detected. This functionality is provided by the "detectCores()" function that is part of this library. However, we do not recommend the use of this function in our environment because this function does not operate within the confines of our job scheduler - IBM's Platform LSF. The use of the "detectCores()" function and the subsequent "makeCluster()" function will result in one user's jobs affecting other user jobs running on the same node.
Instead, we recommend forcing doParallel to operate within the confines of the job scheduler. Below are some examples:
Example 1
the following example, taken from the doParallel manual, shows basic use of the doParallel library with the value for core count hard coded in the R script:
[asrini@consign ~]$ bsub -n 2 -Is bash [asrini@node063]$ cat doParallel_test.R library(doParallel) registerDoParallel(cores=2) x <- iris[which(iris[,5] != "setosa"), c(1,5)] trials <- 10000 ptime <- system.time({ r <- foreach(icount(trials), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1) } })[3] ptime
Note : In the above example, cores is set to 2. This is because the interactive session was launched on 2 cores (bsub -n 2 -Is bash). If more cores were requested, the doParallel script can be changed accordingly.
And can be run as:
[asrini@node063]$ Rscript doParallel_test.R Loading required package: foreach Loading required package: iterators Loading required package: parallel elapsed 17.159
Example 2
The following example shows how to pass the value of number cores as an argument to the R script.
[asrini@consign ~]$ bsub -n 2 -Is bash [asrini@node063 ~]$ cat doParallel_test2.R library(doParallel) core_count <- as.numeric(commandArgs(TRUE)[1]) cl <- makeCluster(core_count) registerDoParallel(cl) print(cl) [asrini@node063 test_jobs]$ Rscript doParallel_test2.R 2 Loading required package: foreach Loading required package: iterators Loading required package: parallel socket cluster with 2 nodes on host ‘localhost’
Note : In the above example, the value 2 was passed to the R script which was then accepted as an argument by the R script and a doParallel cluster of the same size was in turn created. If more cores were requested in the bsub submission, the argument passed to the doParallel script can be changed accordingly.
This can also be done as a batch (non-interactive) submission:
[asrini@consign ~]$ bsub -n 2 -R "span[hosts=1]" -e doParallel.e -o doParallel.o Rscript doParallel_test2.R 2
The error and output files from the above bsub submission contain the same information as shown previously.