https://rc.fas.harvard.edu
serial_requeue
or stats
partition
/n/regal/stats
scratch space
/scratch
for temporary space (fast!)
On OSX and Linux. Put this into your ~/.ssh/config
file
on your laptop/desktop (not on odyssey!):
Host odyssey.fas.harvard.edu odyssey ody Hostname odyssey.fas.harvard.edu User csardi ControlMaster auto ControlPath /tmp/%r@%h:%p ForwardAgent yes ForwardX11Trusted yes
After this, you only need to authenticate yourself at your first login. Until you close this first session, other sessions don't need your password and verification key.
Plus, you can use ody
instead full odyssey hostname.
rclogin09:~$ module load centos6/R-3.1.1 Loading module hpc/intel-mkl-11.0.0.079. Loading module centos6/tcl-8.5.14. Loading module centos6/tk-8.5.14. Loading module centos6/fftw-3.3_gcc-4.4.7. Loading module centos6/gsl-1.16_gcc-4.4.7. Loading module centos6/hdf5-1.8.11_gcc-4.4.7. Loading module centos6/netcdf-4.3.0_gcc-4.4.7. Loading module centos6/R-3.1.1.
module avail 2>&1 | less
module
without arguments to list the
available options.
~/.bashrc
file.
Needed for Rscript!
~/.Rprofile
:
options(repos = structure(c(CRAN = "http://cran.rstudio.com")))
devtools
package to
install R packages directly from Github.
devtools::create
.
stat221 # R package for all code ├─DESCRIPTION # package meta data ├─NAMESPACE # don't need to touch it, usually ├─R # R code files │ ├─util.R # Common for all projects │ ├─hw1-1.R │ └─... ├─data # data files (small) │ ├─hw1.rda │ └─... └─inst └─tests └─testthat └─test-hw1.R
[csardi@rclogin03 ~]$ screen bash
[csardi@rclogin03 ~]$ srun -n 1 -p interact --pty R
srun: job 833346 queued and waiting for resources
srun: job 833346 has been allocated resources
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)
[...]
Now we can detach by pressing CTRL+a d
, and log out from
the cluster completely. To reattach, login to the same interactive
node:
[csardi@rclogin05 ~]$ ssh rclogin03
[csardi@rclogin03 ~]$ screen -r
See man screen
for the details.
srun
and sbatch
, because the default is 100MB only.
gc()
to measure the required memory,
in a trial run, after your program finished:
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 181400 9.7 407500 21.8 350000 18.7
Vcells 276264 2.2 786432 6.0 786332 6.0
> a <- numeric(10^8)
> gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 181405 9.7 407500 21.8 350000 18.7
Vcells 100276271 765.1 110892011 846.1 100436297 766.3
#! /usr/bin/env Rscript
#SBATCH -n 1 # (Max) number of tasks per job, for R usually 1
#SBATCH -o out.txt # File for the standard output
#SBATCH -e err.txt # File for the standard error
#SBATCH --open-mode=append # Append to standard output and error files
#SBATCH -p serial_requeue # Partition to use
#SBATCH --mem-per-cpu=4096 # Memory required per CPU, in MegaBytes
#SBATCH --mail-user=<user> # Where to send mail
#SBATCH --mail-type=ALL # When to send mail
noisy_cor <- function(num_genes, true_cor, noise_var) {
## do the computation
...
## write results to file
...
}
noisy_cor(5000, 0.9, 0.1, to_file = TRUE)
#! /usr/bin/env Rscript
#SBATCH -n 1 # (Max) number of tasks per job, for R usually 1
#SBATCH -o out.txt # File for the standard output
#SBATCH -e err.txt # File for the standard error
#SBATCH --open-mode=append # Append to standard output and error files
#SBATCH -p serial_requeue # Partition to use
#SBATCH --mem-per-cpu=4096 # Memory required per CPU, in MegaBytes
#SBATCH --mail-user=<user> # Where to send mail
#SBATCH --mail-type=ALL # When to send mail
## Check if we are running from SLURM
if (Sys.getenv("SLURM_JOB_ID") != "") {
library(stat221)
noisy_cor(5000, 0.9, 0.1, to_file = TRUE)
}
In another file:
noisy_cor <- function(num_genes, true_cor, noise_var, to_file = FALSE) {
## do the computation
...
if (to_file) {
## write results to file
...
}
}
Often, we want to run the same program, with different paremeters. This can be done two ways with SLURM.
We generate the submission files with a program (maybe from R), as many submission files as the number of jobs we want to run, and then submit each of them to an individual job.
#! /usr/bin/env Rscript
#SBATCH -n 1 # (Max) number of tasks per job, for R usually 1
#SBATCH -o out-%a.txt # File for the standard output
#SBATCH -e err-%a.txt # File for the standard error
#SBATCH -p serial_requeue # Partition to use
#SBATCH --mem-per-cpu=1024 # Memory required per CPU, in MegaBytes
#SBATCH -a 1-9 # Array of 9 jobs, with ids 1, 2, ..., 9
if (Sys.getenv("SLURM_JOB_ID") != "") {
true_cor <- c(0.5, 0.8, 0.9)
noise_var <- c(0.1, 0.2, 0.5)
params <- expand.grid(true_cor = true_cor, noise_var = noise_var)
my_id <- as.numeric(Sys.getenv("SLURM_ARRAY_TASK_ID"))
my_params <- params[my_id,]
noisy_cor(num_genes = 5000, true_cor = my_params$true_cor,
noise_var = my_params$noise_var, to_file = TRUE)
}
Problem: you need a large number of random seeds for the jobs.
Do not set the random seed based on time in seconds! You need at least milliseconds! Save the seed with the results!
fracSec <- function() {
now <- as.vector(as.POSIXct(Sys.time())) / 1000
as.integer(abs(now - trunc(now)) * 10^8)
}
...
seed <- fracSec()
set.seed(seed)
...
save(result, seed, file = "...")
traceback()
function immediately
after an error.
debug()
function to step through an R
function line by line.
options(error = recover)
to enter an interactive
debugging tool, immediately after an error happens.