Skip to main content

Intel CPU nodes

Intel CPU nodes

We currently host a set of newer Intel CPU nodes for benchmarking purposes.

TypeSpecNodeRemarks
Ice LakeBarlow Pass memory (4TB), with 1TB DDR4 used as a L4 cachemad07direct access only
Sky Lake112 cores (4×28), 1.5TB RAM, Intel Platinum 8180mad02cosma7-shm partition
Sky Lake48 cores, 6TB RAM, 2x Platinum 8260L, Optane memorymad03cosma7-shm2 partition

Interactive access (“direct access only”)

The servers marked as “direct access” can only be reached by a direct SSH login. For this, you have to request special permission. Please check that noone else uses the node before you start your jobs, and restrict usage to the minimum time required (no production runs). These servers are purchased through the SKA telescopes. That is, any project aligned with the telescope work has priority access.

Batch access

All servers which are not marked as “direct access only” should be used through SLURM. For cosma7-shm2, users need to use

#SBATCH -A durham 

while for cosma7-shm

#SBATCH -A do009

has to be used.

Environment

COSMA’s login nodes are from AMD. Therefore, we strongly recommend that you ssh into the Intel server of your choice and that you recompile your code there from scratch.

Intel toolchain

The Intel toolchain (oneAPI) is our recommended tool of choice:

module load intel_comp/2023.2.0 compiler mpi 
module load gnu_comp/13.1.0

The second line loads a reasonably new GNU toolchain/STL into the Intel setup. By default, the Intel compilers use a rather old GNU STL, which might not provide all the features your code needs. Ensure you create platform-specific code by adding the compile flags

-Ofast -xhost

Technical specifications

Below are some technical details of the listed machines. While these details were accurate at the time of writing, they may be subject to change in the future. If these details are important for your work, always double check them using a suitable tool like likwid.

mad07

Vendor/model2 x Intel Xeon Gold 6330 (dual-socket system)
Topology2 sockets, 28 cores per socket, 2 threads per core (HT enabled)
Vector extensionsAVX, AVX2, AVX512
Cache48 KiB L1 (per core), 1.25 MiB L2 (per core), 42 MiB L3 (per CPU)
RAM4 TB, 2 TB per NUMA
NUMA configuration1 NUMA domain per CPU, 28 cores per NUMA domain
peakflops benchper core: 4.60 Gflops (scalar), 39.3 Gflops (AVX512)
per CPU: 144 Gflops (scalar), 1113 Gflops (AVX512)
copy_mem benchper core: 19.6 GB/s
per CPU: 87.8 GB/s

Funding and acknowledgements

The AMD test nodes have been installed in collaboration and as addendum to DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). DiRAC equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure.