NVIDIA GPU nodes

We currently make a few of our NVIDIA GPUs available to development, testing and benchmarking:

Type	Spec	Node	Remarks
A100	AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup.	COSMA – mad04	Integrated into COSMA. Partition cosma8-shm
A100	AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup.	COSMA – mad06	Integrated into COSMA. Partition. Interactive access only.
V100	One Intel node (32 cores, 768GB) with 10 V100 GPUs. Designed to study multi-GPU offloading.	COSMA – gn001	Integrated into COSMA. Interactive access only.
A100 (plus older cards)	12 nodes, not tightly connected. Host nodes not to be used (massively), as they serve a Jupyter notebook cluster.	NCC – gpu10, gpu11 or gpu12	Part of NCC. Each card partitioned into 7 logical GPUs. System is used for production AI runs, but can be used for development, too.

COSMA – gn001

These nodes are available for interactive access only. Log into a COSMA login node and ssh straightforwardly into the GPU nodes that you want to expeirment with.

COSMA – mad04 and mad06

Interactive access

It is not possible to SSH straight into these machines. Access needs to be pre-booked using SLURM. Request a time allocation using the salloc command:

salloc -p cosma8-shm -w <NODE_ID> -A <ACCOUNT_GROUP> -t 01:00:00

And once the time is allocated get a bash session by executing the srun command:

srun -p cosma8-shm -A <ACCOUNT_GROUP> --pty /bin/bash

Batch jobs

The nodes are available within the cosma8-shm3 partition and have to be selected specifically within your SLURM script:

#SBATCH -p cosma8-shm
#SBATCH -w mad06

Alternatively, you can use the –include or –exclude settings to pick the exact node.

Environment

	COSMA systems	NCC
NVIDIA toolchain, C++ offloading	module load nvhpc/23.11 nvc++ -stdpar …	module load nvidia-hpc/23.3 nvc++ -stdpar …
NVIDIA toolchain, OpenMP offloading		module load nvidia-hpc/23.3 nvc++ -fopenmp -mp=gpu …
Intel toolchain, OpenMP offloading
Intel toolchain, SYCL offloading		module load cuda module load oneapi CXX=icpx CXXFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80 LDFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80
Codeplay toolchain, SYCL	module load sycl/2022.11-codeplay-cuda module load cuda/11.2 clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda Instead of the modules, try the containers: module load cuda/12.3 module load singularity/3.9.2 singularity shell –nv /cosma/local/singularity/images/oneapi-nvidia-2024.sif

Please note that the NVIDIA tools do not support all of OpenMP and some of the newest C++ features. However, they are built on top of LLVM, i.e. you can compile your code with LLVM, then compile the GPU-relevant code parts with the NVIDIA tools, and let LLVM link everything together.

There’s a well-known bug in the NVIDIA toolchain with older generations of the software. It manifests that the CUDA devices are not seen. A simple environment variable change fixes this issue:

export CUDA_VISIBLE_DEVICES=0