Intel CPU nodes
Intel CPU nodes
We currently host a set of newer Intel CPU nodes for benchmarking purposes.
Type | Spec | Node | Remarks |
Ice Lake | Barlow Pass memory (4TB), with 1TB DDR4 used as a L4 cache | mad07 | direct access only |
Sky Lake | 112 cores (4×28), 1.5TB RAM, Intel Platinum 8180 | mad02 | cosma7-shm partition |
Sky Lake | 48 cores, 6TB RAM, 2x Platinum 8260L, Optane memory | mad03 | cosma7-shm2 partition |
Interactive access (“direct access only”)
The servers marked as “direct access” can only be reached by a direct SSH login. For this, you have to request special permission. Please check that noone else uses the node before you start your jobs, and restrict usage to the minimum time required (no production runs). These servers are purchased through the SKA telescopes. That is, any project aligned with the telescope work has priority access.
Batch access
All servers which are not marked as “direct access only” should be used through SLURM. For cosma7-shm2, users need to use
#SBATCH -A durham
while for cosma7-shm
#SBATCH -A do009
has to be used.
Environment
COSMA’s login nodes are from AMD. Therefore, we strongly recommend that you ssh into the Intel server of your choice and that you recompile your code there from scratch.
Intel toolchain
The Intel toolchain (oneAPI) is our recommended tool of choice:
module load intel_comp/2023.2.0 compiler mpi
module load gnu_comp/13.1.0
The second line loads a reasonably new GNU toolchain/STL into the Intel setup. By default, the Intel compilers use a rather old GNU STL, which might not provide all the features your code needs. Ensure you create platform-specific code by adding the compile flags
-Ofast -xhost
Technical specifications
Below are some technical details of the listed machines. While these details were accurate at the time of writing, they may be subject to change in the future. If these details are important for your work, always double check them using a suitable tool like likwid
.
mad07
Vendor/model | 2 x Intel Xeon Gold 6330 (dual-socket system) |
Topology | 2 sockets, 28 cores per socket, 2 threads per core (HT enabled) |
Vector extensions | AVX, AVX2, AVX512 |
Cache | 48 KiB L1 (per core), 1.25 MiB L2 (per core), 42 MiB L3 (per CPU) |
RAM | 4 TB, 2 TB per NUMA |
NUMA configuration | 1 NUMA domain per CPU, 28 cores per NUMA domain |
peakflops bench | per core: 4.60 Gflops (scalar), 39.3 Gflops (AVX512) per CPU: 144 Gflops (scalar), 1113 Gflops (AVX512) |
copy_mem bench | per core: 19.6 GB/s per CPU: 87.8 GB/s |
Funding and acknowledgements
The AMD test nodes have been installed in collaboration and as addendum to DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). DiRAC equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure.