Configuring R for SSRDE

One of the most popular components of the R ecosystem is a development environment called RStudio. RStudio is a fantastic resource for developing R code and using R to analyze data. 


However, RStudio is meant for exploratory analysis and development, and SSRDE is meant as a resource where finished code can be run on dedicated compute nodes; which means that the use case for RStudio naturally ends where the use case for SSRDE begins. SSRDE does not have a graphical user interface, which means RStudio can't be run on SSRDE.


Because of RStudio's popularity, and the modified workflow for developing in RStudio, then running R on the cluster versus developing and running it in RStudio; this article will focus on exporting code from RStudio, moving that code and the data it needs from a local machine to the cluster, replicating your R environment (specifically third-party package installations) on SSRDE, and submitting jobs to run. 

If you have experience with RStudio, the following picture will look familiar:

If not, the two components that matter for our purposes are the console window and the script window. The console window is meant for inputting individual R commands, which are evaluated immediately. This is great for performing quick operations like printing out the first column of a Data Frame or installing a package; if you plan on writing many lines of R code, though, you'll want to collect them in a script file and then run the entire file (or a selection of it) from the  script window. The script window is outlined in red in the following image, and the terminal window is in blue:

When you save code in the script window, it creates or modifies a file whose name ends in .R. If we look at the image closely, we'll see that the script is called analysis.R, makes use of a dataset called fitness.csv, and lives in a folder called Research. Your work-space might be significantly more complicated than this, but if it all lives in one folder then the steps I go through should work pretty much the same for you, and if not hopefully it will be clear what steps you need to adapt to make things work for you. Here's the Research folder and its contents, just so it's clear what we're working with:

There's a good chance that you already have R code and data on a different machine than SSRDE, we have instructions for getting that data to SSRDE using a utility called SCP that can be found here.

managing R versions on SSRDE

SSRDE uses R version 3.4.4 as a default, but has a number of different versions of R available. If you want to use another version of R, you can do so by taking advantage of the modules system, as outlined here.

Note in particular that if you're using a non-default version of R, you'll need to load the correct R module prior to installing packages (more in this in a moment), and you'll need to include a module load command in your job submission script as described in the article on modules.

Installing 3rd-party libraries on SSRDE

The default install.package() command in R tries to install 3rd-party libraries in a location that everyone using the computer can reach. This requires sudo (administrator) access on SSRDE, so the following is a procedure that will allow you to install and use R libraries from your home directory, which you can do on your own.

However, you will need to add a few lines to your Renviron file to be able to reach the outside internet.

Take a look at this website for instructions:


https://support.posit.co/hc/en-us/articles/200488488-Configuring-R-to-Use-an-HTTP-or-HTTPS-Proxy

 

You can easily access your .Renviron file by running the command file.edit('~/.Renviron') within R or RStudio


You only need the lines added.

http_proxy=http://webproxy.ucsd.edu:3128

https_proxy=http://webproxy.ucsd.edu:3128

Save and exit your environment and restart R for it to take effect.

If you don't intend to use the default version of R, the first thing you'll need to do is load the module for the version of R you'll be using:

Create a new directory in your home directory with the mkdir command. You can name it anything you like, but these steps assume it's named R_Libs

Note, modules for multiple versions of R should not exist in the same folder. If you intend to use multiple versions of R (if you need to reproduce papers with different dependencies for instance), you should create different R_Libs folders for each version of R, in the screenshot below there is an R_Libs folder for versions 4.0.2 and 3.6.3

Open the R REPL using the R command. The REPL is very similar to the terminal window in RStudio, it allows us to enter R commands one at a time:

What we'll be doing is using the install.package() with the lib argument, which lets us tell R where we want the package installed. If we give that argument our local R_Libs directory our libraries will install there. I've done that for the tidyverse package below:

In order for R to recognize tidyverse, we have to tell it to look in the R_Libs directory. We do this by exporting an environment variable; which is like a variable in R, but used by operating systems to store information like the location of third-party packages. The command is export R_LIBS=~/R_libs/ 

If you're using multiple versions of R as described above, you will need to run the export command to switch between your different R_Libs folders before you submit jobs. If you run into trouble with this, your SSCF representative can assist.

Exporting environment variables in this way only works until you log out. In order for the environment variable to be persistent, we need to modify a file called .bashrc (the . at the beginning indicates it's a hidden file, and a normal ls command won't show it). .bashrc is (for our purposes) a bash script that runs a number of commands every time we log in. It does things like set the text color of the window, show what's displayed on the far left side of the terminal, and -- in this case -- export environment variables we need. 


If we add our export call to the very end of our .bashrc file, that environment variable will be exported every time we log in, and the operating system will always know where to find your locally-installed libraries

We have to be quite careful modifying .bashrc though; any mistakes in the file can cause our profile to break.

Because of that, I've written a one-liner that you can copy and paste into your terminal that will add the required line to your .bashrc without having to actually open it up. Run the following from your home directory (making sure to change the name of the local directory if you're not using "R_Libs") and you should be set: echo "export R_LIBS=~/R_Libs/" >> .bashrc

To run a full R script from the command line, you'll use the command Rscript <name_of_script>.R. This command is essentially a drop-in replacement for the matlab command used in the quick start guide


An example of the bash script used for running an R job, and the slurm command submitting that bash script is as follows:

At this point, you should now be able to transfer R code and data from your local computer to SSRDE, locally install any third-party packages that your code needs, and submit jobs involving R code to Slurm to be sent out to the cluster. 

SSCF representatives are not programmers and will not be able to provide substantial help with your code itself, but we can assist with the transfer and organization of files, navigating the server, and using the job submission program Slurm.

If you need additional help setting up your environment, please don't hesitate to contact SSCF at sscfhelp@ucsd.edu (or reach out to your department SSCF representative, ex. sscf-econ@ucsd.edu) and we'll be happy to assist!