Skip to main content



We currently make a few of our NVIDIA GPUs available to development, testing and benchmarking:

A100AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup.COSMA – mad04Integrated into COSMA. Partition cosma8-shm
A100AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup.COSMA – mad06Integrated into COSMA. Partition. Interactive access only.
V100One Intel node (32 cores, 768GB) with 10 V100 GPUs. Designed to study multi-GPU offloading.COSMA – gn001Integrated into COSMA. Interactive access only.
A100 (plus older cards)12 nodes, not tightly connected. Host nodes not to be used (massively), as they serve a Jupyter notebook cluster.NCC – gpu10, gpu11 or gpu12Part of NCC. Each card partitioned into 7 logical GPUs. System is used for production AI runs, but can be used for development, too.

COSMA – gn001

These nodes are available for interactive access only. Log into a COSMA login node and ssh straightforwardly into the GPU nodes that you want to expeirment with.

COSMA – mad04 and mad06

Interactive access

It is not possible to SSH straight into these machines. Access needs to be pre-booked using SLURM. Request a time allocation using the salloc command:

salloc -p cosma8-shm -w <NODE_ID> -A <ACCOUNT_GROUP> -t 01:00:00

And once the time is allocated get a bash session by executing the srun command:

srun -p cosma8-shm -A <ACCOUNT_GROUP> --pty /bin/bash

Batch jobs

The nodes are available within the cosma8-shm3 partition and have to be selected specifically within your SLURM script:

#SBATCH -p cosma8-shm
#SBATCH -w mad06

Alternatively, you can use the –include or –exclude settings to pick the exact node.


COSMA systemsNCC
NVIDIA toolchain, C++ offloadingmodule load nvhpc/23.11
nvc++ -stdpar …
module load nvidia-hpc/23.3
nvc++ -stdpar …
NVIDIA toolchain, OpenMP offloading
module load nvidia-hpc/23.3
nvc++ -fopenmp -mp=gpu …
Intel toolchain, OpenMP offloading
Intel toolchain, SYCL offloadingmodule load cuda
module load oneapi
CXXFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80
LDFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80
Codeplay toolchain, SYCLmodule load sycl/2022.11-codeplay-cuda
module load cuda/11.2
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda

Instead of the modules, try the containers:
module load cuda/12.3
module load singularity/3.9.2
singularity shell –nv /cosma/local/singularity/images/oneapi-nvidia-2024.sif

Please note that the NVIDIA tools do not support all of OpenMP and some of the newest C++ features. However, they are built on top of LLVM, i.e. you can compile your code with LLVM, then compile the GPU-relevant code parts with the NVIDIA tools, and let LLVM link everything together.

There’s a well-known bug in the NVIDIA toolchain with older generations of the software. It manifests that the CUDA devices are not seen. A simple environment variable change fixes this issue: