Intel CPU nodes

We currently host a set of newer Intel CPU nodes for benchmarking purposes.

Type	Spec	Node	Remarks
Ice Lake	Barlow Pass memory (4TB), with 1TB DDR4 used as a L4 cache	mad07	direct access only
Sky Lake	112 cores (4×28), 1.5TB RAM, Intel Platinum 8180	mad02	cosma7-shm partition
Sky Lake	48 cores, 6TB RAM, 2x Platinum 8260L, Optane memory	mad03	cosma7-shm2 partition

Interactive access (“direct access only”)

The servers marked as “direct access” can only be reached by a direct SSH login. For this, you have to request special permission. Please check that noone else uses the node before you start your jobs, and restrict usage to the minimum time required (no production runs). These servers are purchased through the SKA telescopes. That is, any project aligned with the telescope work has priority access.

Batch access

All servers which are not marked as “direct access only” should be used through SLURM. For cosma7-shm2, users need to use

#SBATCH -A durham

while for cosma7-shm

#SBATCH -A do009

has to be used.

Environment

COSMA’s login nodes are from AMD. Therefore, we strongly recommend that you ssh into the Intel server of your choice and that you recompile your code there from scratch.

Intel toolchain

The Intel toolchain (oneAPI) is our recommended tool of choice:

module load intel_comp/2023.2.0 compiler mpi 
module load gnu_comp/13.1.0

The second line loads a reasonably new GNU toolchain/STL into the Intel setup. By default, the Intel compilers use a rather old GNU STL, which might not provide all the features your code needs. Ensure you create platform-specific code by adding the compile flags

-Ofast -xhost

Technical specifications

Below are some technical details of the listed machines. While these details were accurate at the time of writing, they may be subject to change in the future. If these details are important for your work, always double check them using a suitable tool like likwid.

mad07

Vendor/model	2 x Intel Xeon Gold 6330 (dual-socket system)
Topology	2 sockets, 28 cores per socket, 2 threads per core (HT enabled)
Vector extensions	AVX, AVX2, AVX512
Cache	48 KiB L1 (per core), 1.25 MiB L2 (per core), 42 MiB L3 (per CPU)
RAM	4 TB, 2 TB per NUMA
NUMA configuration	1 NUMA domain per CPU, 28 cores per NUMA domain
`peakflops` bench	per core: 4.60 Gflops (scalar), 39.3 Gflops (AVX512) per CPU: 144 Gflops (scalar), 1113 Gflops (AVX512)
`copy_mem` bench	per core: 19.6 GB/s per CPU: 87.8 GB/s

Funding and acknowledgements

The AMD test nodes have been installed in collaboration and as addendum to DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). DiRAC equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure.