LUMI is officially here!
The second pilot phase of LUMI, using the system’s GPU capacity, is complete, system’s performance verified with accepted benchmarks, contract work completed and, therefore, LUMI is officially ready to serve European scientists in its full capacity!
Setting up such a huge system as LUMI has not been smooth sailing all the time – we have suffered from many “black swans”, such as the global shortage of microelectronics, and the global COVID-19 pandemic was something that none of us could predict when the project started in 2019.
Furthermore, LUMI features a lot of all-new technology: it is among the first generation of Hewlett Packard Enterprise’s Slingshot interconnect, cloud-native Shasta software stack, and the MI250X is the first generation of AMD GPUs for high-performance computing in many years. Cutting edge oftentimes turns into bleeding edge, and LUMI was not an exemption. In addition, the sheer size of the installation – LUMI features tens of thousands of individual technical components; GPUs, processors, memory DIMMs, hard disk drives, data cables, switches and so on – all of which need to be triaged and vetted before the system is ready, which obviously takes time and effort.
Despite of the hiccups on the way, LUMI is EuroHPC Joint Undertaking’s first operational pre-exascale system. It was listed 3rd on the Top500 and Green500 lists already in May 2022, making it the fastest and most energy efficient supercomputer in Europe, and the success continued on the November 2022 listings. We were able to open the system for all customers on mid-December while waiting for the completion of the paperwork needed for the official acceptance.
The 10,240 GPUs in the system have been proven to work in scale and on the level of expected performance. The benchmark applications featured the following scientific software suites: Gromacs (molecular dynamics), CP2K (quantum chemistry) and ICON (climate science). In addition, the MLPerf application benchmark suite was used to test the machine learning workloads on top of PyTorch and Tensorflow. The test cases included ResNet image classification, SSD object recognition and XFMR translation. Another proxy application GridTools was used to measure stencil-based, especially numerical weather prediction style applications.
In addition to those, LUMI’s performance was quantified with numerous synthetic benchmarks. These, together with the early experiences from the pilot use, show system being ready and performant for multiple different workflows. The application portfolio is growing rapidly, for instance we are working together with our technology partners HPE and AMD on Application Readiness Program, where seven scientific applications from the LUMI consortium countries are being ported and optimized for LUMI-G.
The object storage service LUMI-O entered pilot phase at the same time with the system’s full general availability. It is meant for storing, sharing and staging large amounts of data. The computable storage areas LUMI-P and the flash pool solution LUMI-F have been in production since the beginning of 2022. During pilot phase LUMI-O is available for general use, but service availability or performance may be temporarily degraded due to maintenance activities.
The data analytics platform LUMI-D, which features nodes with very large memory or visualization GPUs, will be first available only as resources in the Slurm batch job system. Still, deploying the Open OnDemand suite (targeted availability Q1/2023) will make it an interactive partition as envisioned. The Open OnDemand will also allow for interactive use of the LUMI-G and LUMI-C nodes using, e.g., Jupyter Notebooks.
Now after the acceptance, LUMI will receive some minor hardware facelifts: LUMI-C and LUMI-G partitions will grow by a non-negligible amount of further capacity during this spring, LUMI-F by two petabytes, and the interconnect will be improved by adding more bandwidth between cabinets.
We have still some further work to do: it will take more time and effort before we can allow sensitive data to be processed on the system. In addition, the container cloud platform envisioned to support persistent services such as data mover utilities, web interfaces to LUMI-O datasets, job submission portals etc., will not be available for a while. There will be workarounds available for covering most of its use cases.
Furthermore, as announced recently, LUMI will be a part of the LUMI-Q solution that brings quantum computers available for all research communities in Europe. This functionality will become reality sometime in 2024.
Apply for resources
All eligible European scientists can now apply for capacity on LUMI via different calls. Half of the system’s resources are shared via EuroHPC’s calls which are targeted at researchers from academia, research institutes, public authorities and industry established or located in an EU Member State or in a country associated with Horizon 2020.
The other half of the system’s resources are shared by the LUMI consortium countries – this half is divided based on the member countries’ contributions to the LUMI funding. More information about different access modes is available in the Get started section.
Author: Pekka Manninen, Director of Science and Technology, CSC – IT Center for Science Ltd., Finland
Watch the video: Queen of the North – LUMI is officially accepted