The Cray CS-Storm is built to meet the most demanding compute requirements for production scalability, while also delivering a lower total-cost-of-ownership for researchers with accelerator workload environments.
XStream GPU cluster consists in:
Each 2626X8N Compute Node is made of:
XStream GPU cluster is made of 65 compute nodes for a total of 520 Nvidia K80 GPU cards (or 1040 logical graphical processing units).
The table below summarizes the Tesla K80 GPU card specifications.
|GPU||2x Kepler GK210|
|Peak double precision FLOPS||2.91 Tflops (GPU Boost Clocks)
1.87 Tflops (Base Clocks)
|Peak single precision FLOPS||8.74 Tflops (GPU Boost Clocks)
5.6 Tflops (Base Clocks)
|Memory bandwidth (ECC off)||480 GB/sec (240 GB/sec per GPU)|
|Memory size (GDDR5)||24 GB (12GB per GPU)|
|CUDA cores||4992 ( 2496 per GPU)|
The hardware architecture of the compute nodes is that each CPU socket (PCI root) is connected to PLX switches that connect 4 K80 cards (8 GPUs) together. Please take a look at the following diagrams:
You basically have 2 domains, one for each CPU. You can use the
lstopo command on a compute node (eg.
xs-0001) for full details of the PCI bus.
If you plan on doing GPU peer-to-peer communication, the
nvidia-smi command on a compute node will show you the GPUDirect topology matrix for the system:
$ nvidia-smi topo -m
PIX gives you the better latency, SOC is the worse.
XStream compute nodes are connected to a large private Lustre storage system with fast I/O. Accessible through the
$GROUP_WORK environment variables, this parallel filesystem is provided by a Cray Sonexion appliance.
This innovative and power efficient HPC storage system is made of:
This system is capable of providing more than 22 GB/s of sustained Lustre bandwidth over the Infiniband interconnect and about 1.4 PB of usable space. This storage space is not replicated nor backed up.
Have questions? Feel free to contact Research Computing Support.