High Performance Computing#
Note
This tutorial assumes some basic familiarity with Unix and high performance computing. To run the scripts on your HPC system, it needs to have both SLURM and Apptainer or Singularity installed. If you are unsure this is the case, we recommend contacting your HPC admin team.
HPC at the University of Bristol#
If you are running Abil at the University of Bristol, we recommend first completing the Getting Started HPC Course. If you are running at another institution, we recommend familiarizing yourself with your local HPC machine.
Installing Singularity#
To simplify installing packages and dependencies on HPC machines, which often require admin approval, we use containers. Here we use Singularity, which packages all of the requirements for running Abil into a portable and reproducible container that does not require root privileges. There are two software options for creating Singularity containers, Singularity and Apptainer. Apptainer is often easier to install than Singularity and is backwards compatible with legacy Singularity installs. Both require a Linux operating system, but provide instructions for installing on Windows or Mac OS.
Singularity/Apptainer is not natively supported on Windows or MacOS and requires a Virtual Machine (VM). For more details see the Apptainer admin guide or the Singularity admin guide
Building Singularity Container#
To run Abil on a HPC machine, first compile Singularity from the terminal:
First, change directory to the Singularity folder:
cd ./Abil/singularity/
If using Apptainer (recommended):
sudo apptainer build abil.sif Singularity.sif
If using Singularity:
sudo singularity build abil.sif Singularity.sif
Singularity/Apptainer is not natively supported on Windows or MacOS and requires a Virtual Machine (VM). For more details see the Apptainer admin guide or the Singularity admin guide.
Once the VM is installed following the instructions for your OS, use the terminal of the VM to change directory to the Singularity folder:
cd ./Abil/singularity/
If using Apptainer (recommended):
sudo apptainer build abil.sif Singularity.sif
If using Singularity:
sudo singularity build abil.sif Singularity.sif
Transfer Abil to your HPC Machine#
To transfer to your home directory (~):
scp -r ./Abil <username>@HPC_machine.ac.uk:~
To transfer to a specific directory (ex. /user/work/username):
scp -r ./Abil <username>@HPC_machine.ac.uk:/user/work/username
To transfer from a Windows machine, use WinSCP.
To use WinSCP, type the host in the Host name box, then enture your username in the User Name box.
For more instructions, check with your organization.
SLURM scripts#
To execute Abil on an HPC machine, we use SLURM scripts. The SLURM script tells the HPC machine what to load (the Singularity container), what to execute (Python scripts), and how much compute is required, in a single executable file.
Variable declarations#
The first part of the bash script declares the variables needed to execute the job. Here, we include the time limit for the run (time), the number of nodes to use (nodes), the memory allocation (mem), the number of cpus per task (cpus-per-task), and the number of targets to be tuned (array).
#!/bin/bash
#
#
#SBATCH --time=0-6:00:00
#SBATCH --nodes=1
#SBATCH --mem=10000M
#SBATCH --cpus-per-task=16
#SBATCH --array=0-1
Note
The wall time (–time), amount of RAM (–mem) and number of threads (–cpus-per-task) will vary depending on the size of your dataset, the number of hyper-parameters and your HPC hardware.
Executable commands#
The next part of the bash script includes the commands to be executed. First, the array value is used to set a local variable that will be used to specify the target being tuned.
i=${SLURM_ARRAY_TASK_ID}
Next, the Apptainer module is loaded, and set up using the abil.sif container uploaded prior.
module load apptainer/1.3.1
singularity exec \
-B/user/work/$(whoami):/user/work/$(whoami) \
/user/work/$(whoami)/Abil/singularity/abil.sif \
Finally, the model Python script is executed using the specified number of cpus, for the target “i”, within a specific model (knn in this instance). Lastly, the Singularity cache is exported.
python /user/work/$(whoami)/Abil/hpc_tune.py ${SLURM_CPUS_PER_TASK} ${i} "knn"
export SINGULARITY_CACHEDIR=/user/work/$(whoami)/.singularity
Alterations for predict and post#
The set up is the same for each the predict.sh and post.sh scripts, with the only change being the Python executable line. predict.sh should say the following:
python /user/work/$(whoami)/Abil/hpc_predict.py ${SLURM_CPUS_PER_TASK} ${i}
while post.sh should say the following, and does not include the array specification:
python /user/work/$(whoami)/Abil/hpc_post.py
Execute Abil on your HPC Machine#
To login to your HPC account, follow your organizations directions.
Example (ssh):
ssh <username>@HPC_machine.ac.uk
Change directory to Abil:
cd /user/work/username/Abil
Change directory to the folder containing your bash scripts:
cd hpc_example
Submit hpc tuning jobs (ex. using slurm):
sbatch tune_RF.sh
sbatch tune_KNN.sh
sbatch tune_XGB.sh
After tune.sh jobs are finshed, submit predict job:
sbatch predict.sh
After predict.sh is finished, submit post job:
sbatch post.sh
Singularity file#
Below is the Singularity.sif file text. This is used to create abil.sif in the steps above.
Bootstrap: docker
From: continuumio/miniconda3
%files
../../examples/conda/environment.yml /root
%post
echo ". /opt/conda/etc/profile.d/conda.sh" >> ~/.bashrc
. /opt/conda/etc/profile.d/conda.sh
conda install -n base -c conda-forge mamba
mamba env update -n base --file /root/environment.yml
pip install abil
%runscript
. /opt/conda/etc/profile.d/conda.sh
exec "$@"