Complementary Systems Specifications¶

Below are the technical specifications of individual Complementary systems.

Partition 0 - ARM (Cortex-A72)¶

The partition is based on the ARMv8-A 64-bit nebo architecture.

Cortex-A72
ARMv8-A 64-bit
2x 32 cores @ 2 GHz
255 GB memory
disk capacity 3,7 TB
1x Infiniband FDR 56 Gb/s

Partition 1 - ARM (A64FX)¶

The partition is based on the Armv8.2-A architecture with SVE extension of instruction set and consists of 8 compute nodes with the following per-node parameters:

1x Fujitsu A64FX CPU
Arm v8.2-A ISA CPU with Scalable Vector Extension (SVE) extension
48 cores at 2.0 GHz
32 GB of HBM2 memory
400 GB SSD (m.2 form factor) – mixed used type
1x Infiniband HDR100 interface
connected via 16x PCI-e Gen3 slot to the CPU

Partition 2 - Intel (Ice Lake, NVDIMMs) ¶

The partition is based on the Intel Ice Lake x86 architecture. It contains two servers with Intel NVDIMM memories.

Each server has the following parameters:

2x 3rd Gen Xeon Scalable Processors Intel Xeon Gold 6338 CPU
32-cores @ 2.00GHz
16x 16GB RAM with ECC
DDR4-3200
1x Infiniband HDR100 interface
connected to CPU 8x PCI-e Gen4 interface
3.2 TB NVMe local storage – mixed use type

In addition, the servers has the following parameters:

Intel server 1 – low NVDIMM memory server with 2304 GB NVDIMM memory
16x 128GB NVDIMM persistent memory modules
Intel server 2 – high NVDIMM memory server with 8448 GB NVDIMM memory
16x 512GB NVDIMM persistent memory modules

Software installed on the partition:

FPGA boards support application development using following design flows:

OpenCL
High-Level Synthesis (C/C++) including support for OneAPI
Verilog and VHDL

Partition 3 - AMD (Milan, MI100 GPUs + Xilinx FPGAs)¶

The partition is based on two servers equipped with AMD Milan x86 CPUs, AMD GPUs and Xilinx FPGAs architectures and represents an alternative to the Intel-based partition's ecosystem.

Each server has the following parameters:

2x AMD Milan 7513 CPU
32 cores @ 2.6 GHz
16x 16GB RAM with ECC
DDR4-3200
4x AMD GPU accelerators MI 100
Interconnected with AMD Infinity Fabric™ Link for fast GPU to GPU communication
1x 100 GBps Infiniband HDR100
connected to CPU via 8x PCI-e Gen4 interface
3.2 TB NVMe local storage – mixed use

In addition:

AMD server 1 has 2x FPGA Xilinx Alveo U250 Data Center Accelerator Card
AMD server 2 has 2x FPGA Xilinx Alveo U280 Data Center Accelerator Card

Software installed on the partition:

FPGA boards support application development using following design flows:

OpenCL
High-Level Synthesis (C/C++)
Verilog and VHDL
developer tools and libraries for AMD GPUs.

Partition 4 - Edge Server¶

The partition provides overview of the so-called edge computing class of resources with solutions powerful enough to provide data analytic capabilities (both CPU and GPU) in a form factor which cannot require a data center to operate.

The partition consists of one edge computing server with following parameters:

1x x86_64 CPU Intel Xeon D-1587
TDP 65 W,
16 cores,
435 GFlop/s theoretical max performance in double precision
1x CUDA programmable GPU NVIDIA Tesla T4
TDP 70W
theoretical performance 8.1 TFlop/s in FP32
128 GB RAM
1.92TB SSD storage
connectivity:
2x 10 Gbps Ethernet,
WiFi 802.11 ac,
LTE connectivity

Partition 5 - FPGA Synthesis Server¶

FPGAs design tools usually run for several hours to one day to generate a final bitstream (logic design) of large FPGA chips. These tools are usually sequential, therefore part of the system is a dedicated server for this task.

This server is used by development tools needed for FPGA boards installed in both Compute partition 2 and 3.

AMD EPYC 72F3, 8 cores @ 3.7 GHz nominal frequency
8 memory channels with ECC
128 GB of DDR4-3200 memory with ECC
memory is fully populated to maximize memory subsystem performance
1x 10Gb Ethernet port used for connection to LAN
NVMe local storage
2x NVMe disks 3.2TB, configured RAID 1

Partition 6 - ARM + CUDA GPGU (Ampere) + DPU¶

This partition is based on ARM architecture and is equipped with CUDA programmable GPGPU accelerators based on Ampere architecture and DPU network processing units. The partition consists of two nodes with the following per-node parameters:

Server Gigabyte G242-P36, Ampere Altra Q80-30 (80c, 3.0GHz)
512GB DIMM DDR4, 3200MHz, ECC, CL22
2x Micron 7400 PRO 1920GB NVMe M.2 Non-SED Enterprise SSD
2x NVIDIA A30 GPU Accelerator
2x NVIDIA BlueField-2 E-Series DPU 25GbE Dual-Port SFP56, PCIe Gen4 x16, 16GB DDR + 64, 200Gb Ethernet
Mellanox ConnectX-5 EN network interface card, 10/25GbE dual-port SFP28, PCIe3.0 x8
Mellanox ConnectX-6 VPI adapter card, 100Gb/s (HDR100, EDR IB and 100GbE), single-port QSFP56

Partition 7 - IBM¶

The IBM Power10 server is a single-node partition with the following parameters:

Server IBM POWER S1022
2x Power10 12-CORE TYPICAL 2.90 TO 4.0 GHZ (MAX) PO
512GB DDIMMS, 3200 MHZ, 8GBIT DDR4
2x ENTERPRISE 1.6 TB SSD PCIE4 NVME U.2 MOD
2x ENTERPRISE 6.4 TB SSD PCIE4 NVME U.2 MOD
PCIE3 LP 2-PORT 25/10GB NIC&ROCE SR/CU A

Partition 8 - HPE Proliant¶

This partition provides a modern CPU with a very large L3 cache. The goal is to enable users to develop algorithms and libraries that will efficiently utilize this technology. The processor is very efficient, for example, for linear algebra on relatively small matrices. This is a single-node partition with the following parameters:

Server HPE Proliant DL 385 Gen10 Plus v2 CTO
2x AMD EPYC 7773X Milan-X, 64 cores, 2.2GHz, 768 MB L3 cache
16x HPE 16GB (1x+16GB) x4 DDR4-3200 Registered Smart Memory Kit
2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
BCM 57412 10GbE 2p SFP+ OCP3 Adptr
HPE IB HDR100/EN 100Gb 1p QSFP56 Adptr1
HPE Cray Programming Environment for x86 Systems 2 Seats

Partition 9 - Virtual GPU Accelerated Workstation¶

This partition provides users with a remote/virtual workstation running MS Windows OS. It offers rich graphical environment with a focus on 3D OpenGL or RayTracing-based applications with the smallest possible degradation of user experience. The partition consists of two nodes with the following per-node parameters:

Server HPE Proliant DL 385 Gen10 Plus v2 CTO
2x AMD EPYC 7413, 24 cores, 2.55GHz
16x HPE 32GB 2Rx4 PC4-3200AA-R Smart Kit
2x 3.84TB NVMe RI SFF BC U.3ST MV SSD
BCM 57412 10GbE 2p SFP+ OCP3 Adptr
2x NVIDIA A40 48GB GPU Accelerator

Available Software¶

The following is the list of software available on partiton 09:

Academic VMware Horizon 8 Enterprise Term Edition: 10 Concurrent User Pack for 4 year term license; includes SnS
8x NVIDIA RTX Virtual Workstation, per concurrent user, EDU, perpetual license
32x NVIDIA RTX Virtual Workstation, per concurrent user, EDU SUMS per year
7x Windows Server 2022 Standard - 16 Core License Pack
10x Windows Server 2022 - 1 User CAL
40x Windows 10/11 Enterprise E3 VDA (Microsoft) per year
Hardware VMware Horizon management

Partition 10 - Sapphire Rapids-HBM Server¶

The primary purpose of this server is to evaluate the impact of the HBM memory on the x86 processor on the performance of the user applications. This is a new feature previously available only on the GPGPU accelerators and provided a significant boost to the memory-bound applications. Users can also compare the impact of the HBM memory with the impact of the large L3 cache available on the AMD Milan-X processor also available on the complementary systems. The server is also equipped with DDR5 memory and enables the comparative studies with reference to DDR4 based systems.

2x Intel® Xeon® CPU Max 9468 48 cores base 2.1GHz, max 3.5Ghz
16x 16GB DDR5 4800Mhz
2x Intel D3 S4520 960GB SATA 6Gb/s
1x Supermicro Standard LP 2-port 10GbE RJ45, Broadcom BCM57416

Partition 11 - NVIDIA Grace CPU Superchip¶

The NVIDIA Grace CPU Superchip uses the NVIDIA® NVLink®-C2C technology to deliver 144 Arm® Neoverse V2 cores and 1TB/s of memory bandwidth. Runs all NVIDIA software stacks and platforms, including NVIDIA RTX™, NVIDIA HPC SDK, NVIDIA AI, and NVIDIA Omniverse™.

Superchip design with up to 144 Arm Neoverse V2 CPU cores with Scalable Vector Extensions (SVE2)
World’s first LPDDR5X with error-correcting code (ECC) memory, 1TB/s total bandwidth
900GB/s coherent interface, 7X faster than PCIe Gen 5
NVIDIA Scalable Coherency Fabric with 3.2TB/s of aggregate bisectional bandwidth
2X the packaging density of DIMM-based solutions
2X the performance per watt of today’s leading CPU
FP64 Peak of 7.1TFLOPS