NVIDIA GPU nodes
NVIDIA GPU nodes
We currently make a few of our NVIDIA GPUs available to development, testing and benchmarking:
Type | Spec | Node | Remarks |
A100 | AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup. | COSMA – mad04 | Integrated into COSMA. Partition cosma8-shm |
A100 | AMD Genoa server with A100 GPU acceleration. Similar to LUMI supercomputer setup. | COSMA – mad06 | Integrated into COSMA. Partition. Interactive access only. |
V100 | One Intel node (32 cores, 768GB) with 10 V100 GPUs. Designed to study multi-GPU offloading. | COSMA – gn001 | Integrated into COSMA. Interactive access only. |
A100 (plus older cards) | 12 nodes, not tightly connected. Host nodes not to be used (massively), as they serve a Jupyter notebook cluster. | NCC – gpu10, gpu11 or gpu12 | Part of NCC. Each card partitioned into 7 logical GPUs. System is used for production AI runs, but can be used for development, too. |
COSMA – gn001
These nodes are available for interactive access only. Log into a COSMA login node and ssh straightforwardly into the GPU nodes that you want to expeirment with.
COSMA – mad04 and mad06
Interactive access
It is not possible to SSH straight into these machines. Access needs to be pre-booked using SLURM. Request a time allocation using the salloc command:
salloc -p cosma8-shm -w <NODE_ID> -A <ACCOUNT_GROUP> -t 01:00:00
And once the time is allocated get a bash session by executing the srun command:
srun -p cosma8-shm -A <ACCOUNT_GROUP> --pty /bin/bash
Batch jobs
The nodes are available within the cosma8-shm3 partition and have to be selected specifically within your SLURM script:
#SBATCH -p cosma8-shm
#SBATCH -w mad06
Alternatively, you can use the –include or –exclude settings to pick the exact node.
Environment
COSMA systems | NCC | |
NVIDIA toolchain, C++ offloading | module load nvhpc/23.11 nvc++ -stdpar … | module load nvidia-hpc/23.3 nvc++ -stdpar … |
NVIDIA toolchain, OpenMP offloading | module load nvidia-hpc/23.3 nvc++ -fopenmp -mp=gpu … | |
Intel toolchain, OpenMP offloading | ||
Intel toolchain, SYCL offloading | module load cuda module load oneapi CXX=icpx CXXFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80 LDFLAGS=-fsycl-targets=nvptx64-nvidia-cuda,spir64 -Xsycl-target-backend=nvptx64-nvidia-cuda –cuda-gpu-arch=sm_80 | |
Codeplay toolchain, SYCL | module load sycl/2022.11-codeplay-cuda module load cuda/11.2 clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda Instead of the modules, try the containers: module load cuda/12.3 module load singularity/3.9.2 singularity shell –nv /cosma/local/singularity/images/oneapi-nvidia-2024.sif |
Please note that the NVIDIA tools do not support all of OpenMP and some of the newest C++ features. However, they are built on top of LLVM, i.e. you can compile your code with LLVM, then compile the GPU-relevant code parts with the NVIDIA tools, and let LLVM link everything together.
There’s a well-known bug in the NVIDIA toolchain with older generations of the software. It manifests that the CUDA devices are not seen. A simple environment variable change fixes this issue:
export CUDA_VISIBLE_DEVICES=0