Using RStan on the cluster
After reading this, you will be able to:
-
Log into talapas, the high performance computing cluster on campus.
-
Utilize
rsyncto send files between talapas and your local machine. -
Run
rstanon talapas and send the fitted Stan model to your machine.
These instructions were written with OSX users in mind.
Whenever you see <username> you should replace it with
your uoregon username.
1 ssh into the cluster
Open your terminal and ssh into talapas with the following command
ssh <username>@talapas-login.uoregon.edu
You will be prompted for a password. Type your uoregon password and hit RETURN.
Now navigate to the UO biostats directory
cd /projects/bi610/
You can list (with ls) all directories here, one of which is <username>.
If you move to it with cd <username> you will likely see that it is empty.
I recommend creating a new directory for your homework.
Suppose I wanted to work on homework 7, then typing mkdir hw7
will create a new directory named hw7 where you can store all homework 7 files.
You can look at the full path to this directory by typing pwd.
Do it, you’ll need that path for the next step.
It should be /projects/bi610/<username>/hw7/.
2 Become BFFs with rsync
Now you need to populate your shiny new homework directory with some files.
Let’s start with our data file, BattingAverage.csv.
In a different tab or window for your terminal, navigate to where
you have BattingAverage.csv stored and run the command
rsync -vzP BattingAverage.csv <username>@talapas-uoregon.edu:/projects/bi610/<username>/hw7/
You will be prompted to enter your uoregon password to initiate the transfer.
In case you’re curious, the options for rsync are:
-vbe verbose-zcompress the file for the transfer-Pshow transfer progress
The general structure of rsync is rsync [options] FROM TO, so we’re telling it to
send BattingAverage.csv, which is in the current directory, to our
hw7 directory on the cluster.
3 Submitting a job
Now, in order to run rstan on talapas, you need two more files in the hw7 directory:
- an R script that reads in the data and fits an
rstanmodel - an
sbatchfile to run the R script on the cluster
You can make these in any text editor you want,
either on your local machine (in which case you will have to rsync them to talapas),
or on the cluster using either vim or nano.
Here is an example R script, named run_rstan.R:
library(rstan)
# read in the data
data <- read.table('BattingAverage.csv', header=TRUE, sep=',')
# the stan model code
stan_code <- "
data {
int N; // number of players
int hits[N];
int at_bats[N];
int npos; // number of positions
int position[N];
}
parameters {
real<lower=0, upper=1> theta[N];
real<lower=0, upper=1> mu[npos];
real<lower=0> kappa[npos];
}
model {
real alpha;
real beta;
hits ~ binomial(at_bats, theta);
for (i in 1:N) {
alpha = mu[position[i]] * kappa[position[i]];
beta = (1 - mu[position[i]]) * kappa[position[i]];
theta[i] ~ beta(alpha, beta);
}
mu ~ beta(1,1);
kappa ~ gamma(0.1,0.1);
}
"
# compile and sample
model_fit <- stan(model_code = stan_code,
chains = 4,
iter = 2000,
control = list(max_treedepth = 13),
data = list(N = nrow(data),
hits = data$Hits,
at_bats = data$AtBats,
npos = length(unique(data$PriPos)),
position = data$PriPosNumber
)
)
# save the fitted model to an .rds file
saveRDS(model_fit, file='baseball_model.rds')
And here is an example sbatch file named run_rstan.sbatch
#!/bin/bash
#SBATCH --account=bi610
#SBATCH --partition=short
#SBATCH --job-name=rstan
#SBATCH --time 1:00:00
#SBATCH --mem-per-cpu=8G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<username>@uoregon.edu
module load gcc/7.3 R/3.6.1
Rscript run_rstan.R
The options for the sbatch file are pretty self explanatory,
but for more information see this
cheat sheet.
With these three files in the hw7 directory, just run the command
sbatch run_rstan.sbatch
And that’s it! Your job should be submitted, or in the queue.
You can run the command squeue -u <username> to see where
the job is in the queue or how long it’s been running.
When it’s complete you will find two additional files in your directory;
slurm-<JOB ID>.outwhich has all the output (including errors and warnings) from therstanjobbaseball_model.rdsthe fittedrstanmodel
You can now rsync the fitted model back to your local machine in the appropriate directory,
load it into your Rstudio environment with
data <- readRDS(file = 'baseball_model.rds')
and start looking at how your chains mixed, the posterior samples, etc.