Using RStan on the cluster
After reading this, you will be able to:
-
Log into talapas, the high performance computing cluster on campus.
-
Utilize
rsync
to send files between talapas and your local machine. -
Run
rstan
on talapas and send the fitted Stan model to your machine.
These instructions were written with OSX users in mind.
Whenever you see <username>
you should replace it with
your uoregon username.
1 ssh
into the cluster
Open your terminal and ssh
into talapas with the following command
ssh <username>@talapas-login.uoregon.edu
You will be prompted for a password. Type your uoregon password and hit RETURN.
Now navigate to the UO biostats directory
cd /projects/bi610/
You can list (with ls
) all directories here, one of which is <username>
.
If you move to it with cd <username>
you will likely see that it is empty.
I recommend creating a new directory for your homework.
Suppose I wanted to work on homework 7, then typing mkdir hw7
will create a new directory named hw7
where you can store all homework 7 files.
You can look at the full path to this directory by typing pwd
.
Do it, you’ll need that path for the next step.
It should be /projects/bi610/<username>/hw7/
.
2 Become BFFs with rsync
Now you need to populate your shiny new homework directory with some files.
Let’s start with our data file, BattingAverage.csv
.
In a different tab or window for your terminal, navigate to where
you have BattingAverage.csv
stored and run the command
rsync -vzP BattingAverage.csv <username>@talapas-uoregon.edu:/projects/bi610/<username>/hw7/
You will be prompted to enter your uoregon password to initiate the transfer.
In case you’re curious, the options for rsync
are:
-v
be verbose-z
compress the file for the transfer-P
show transfer progress
The general structure of rsync
is rsync [options] FROM TO
, so we’re telling it to
send BattingAverage.csv
, which is in the current directory, to our
hw7
directory on the cluster.
3 Submitting a job
Now, in order to run rstan
on talapas, you need two more files in the hw7
directory:
- an R script that reads in the data and fits an
rstan
model - an
sbatch
file to run the R script on the cluster
You can make these in any text editor you want,
either on your local machine (in which case you will have to rsync
them to talapas),
or on the cluster using either vim or nano.
Here is an example R script, named run_rstan.R
:
library(rstan)
# read in the data
data <- read.table('BattingAverage.csv', header=TRUE, sep=',')
# the stan model code
stan_code <- "
data {
int N; // number of players
int hits[N];
int at_bats[N];
int npos; // number of positions
int position[N];
}
parameters {
real<lower=0, upper=1> theta[N];
real<lower=0, upper=1> mu[npos];
real<lower=0> kappa[npos];
}
model {
real alpha;
real beta;
hits ~ binomial(at_bats, theta);
for (i in 1:N) {
alpha = mu[position[i]] * kappa[position[i]];
beta = (1 - mu[position[i]]) * kappa[position[i]];
theta[i] ~ beta(alpha, beta);
}
mu ~ beta(1,1);
kappa ~ gamma(0.1,0.1);
}
"
# compile and sample
model_fit <- stan(model_code = stan_code,
chains = 4,
iter = 2000,
control = list(max_treedepth = 13),
data = list(N = nrow(data),
hits = data$Hits,
at_bats = data$AtBats,
npos = length(unique(data$PriPos)),
position = data$PriPosNumber
)
)
# save the fitted model to an .rds file
saveRDS(model_fit, file='baseball_model.rds')
And here is an example sbatch
file named run_rstan.sbatch
#!/bin/bash
#SBATCH --account=bi610
#SBATCH --partition=short
#SBATCH --job-name=rstan
#SBATCH --time 1:00:00
#SBATCH --mem-per-cpu=8G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<username>@uoregon.edu
module load gcc/7.3 R/3.6.1
Rscript run_rstan.R
The options for the sbatch
file are pretty self explanatory,
but for more information see this
cheat sheet.
With these three files in the hw7
directory, just run the command
sbatch run_rstan.sbatch
And that’s it! Your job should be submitted, or in the queue.
You can run the command squeue -u <username>
to see where
the job is in the queue or how long it’s been running.
When it’s complete you will find two additional files in your directory;
slurm-<JOB ID>.out
which has all the output (including errors and warnings) from therstan
jobbaseball_model.rds
the fittedrstan
model
You can now rsync
the fitted model back to your local machine in the appropriate directory,
load it into your Rstudio environment with
data <- readRDS(file = 'baseball_model.rds')
and start looking at how your chains mixed, the posterior samples, etc.