Skip to main content

Results of the sixth Finnish LUMI Extreme Scale call

the LUMI data center

The accepted projects in the sixth Finnish LUMI Extreme Scale call:

AFMI: Anemoi decoder training

Principal Investigator Marko Laine, Finnish Meteorological Institute, co-PI Olle Räty, Finnish Meteorological Institute

Data-based AI models have initiated a paradigm change in operational weather prediction. Instead of using physics-based numerical weather prediction (NWP) models for the forecast production, it is possible, by using existing historical weather analyses, to train neural network models that have similar predictive power as their NWP counterparts. The research group will use the Anemoi framework developed at ECMWF in collaboration with national weather services. The Anemoi framework represents a transformative approach to weather forecasting. They will also incorporate the Bris model, developed by Met Norway, which is built on top of Anemoi. Together, these tools will be used to train specialized decoders for forecasting lightning, total cloudiness, and visibility. These will be trained on in-situ observations and cloud analyses consisting of several satellite data sets, respectively. This work is part of the ECMWF Machine Learning Pilot and is done in close collaboration with MET Norway.

AMASI: Advanced ML for aquatic species identification

PI Amir Khanjari, University of Jyväskylä, co-PI Muhammad Khan, University of Jyväskylä

Aquatic biodiversity faces unprecedented threats from climate change, pollution, and habitat destruction, making accurate species identification critical for effective conservation strategies. This project addresses this urgent scientific challenge through innovative deep learning approaches specifically designed for the unique complexities of aquatic species identification. The research advances the frontiers of machine learning by tackling several fundamental challenges that current AI systems struggle with. The research group’s work will advance trustworthy machine learning in ecological applications while creating accessible tools for taxonomists who lack specialized computational expertise.

DLSCD: Deep Learning for predicting Sudden Cardiac Death

PI Mark van Gils, Tampere University, co-PI Antti Kallonen, Tampere University

Sudden cardiac death (SCD) is a leading cause of mortality, particularly in cardiomyopathy patients. Current SCD prediction methods lack accuracy and predicting SCD remains an unmet clinical challenge. If SCD could be accurately predicted, it would be preventable with an implantable defibrillator.

The outcomes of this project promise enhancements in clinical decision-making and patient outcomes related to cardiovascular health. By identifying previously undiscovered ECG (electrocardiogram) spectral signatures linked to SCD, this research will facilitate earlier detection and intervention, potentially reducing mortality from cardiac events. This research is part of the EU-funded international cardiology research project SMASH-HCM.

ESCARM: Extreme-scale ice rheology modelling

PI Arttu Polojärvi, Aalto University, co-PI Jan Åström, CSC

The polar regions’ sea ice, spanning at maximum tens of millions of square kilometers in both hemispheres, plays a vital role in influencing the global climate. The behavior of sea ice is undergoing rapid changes, with Arctic ice coverage, which averaged around 8 million km² in the 1970s and 1980s, now decreasing to as low as 4 million km² in the 2020s.
Current large-scale climate models still have a rudimentary approach to sea ice dynamics, and further research is needed to explore the length scale-dependent fracture dynamics that dictate sea ice rheology. This project seeks to utilize the highly refined HiDEM code, crafted particularly for this application. This code can simulate sea ice fractures at a remarkably high resolution covering tens of thousands of square kilometers. Specifically tailored for LUMI, the HiDEM code surpasses rival codes in efficiency by more than three orders of magnitude.

FinSAM: Finnish Spoken Audio Model

PI Mikko Kurimo, Aalto University, co-PI Tamas Grosz, Aalto University

Via the research group’s work at Aalto University, National Audiovisual Institute in Finland (KAVI) and LUMI, Finnish has now one of the largest monolingual (non-English) open-source speech foundation model trained with the help of KAVI’s audio and video programmes. When fine-tuned with Lahjoita Puhetta and Finnish Parliament speeches, this foundation model has already significantly boosted automatic speech recognition (ASR) for spontaneous unlimited speech. However, the data was filtered just by simple voice activation detection and utilized only a fraction of KAVI’s data, which has over 5 million hours of speech. With this amount of speech data collection, and the research group’s tested training scripts, the next logical step is to study the foundation model’s capacity for other speech-based applications and increased robustness of ASR in everyday speech in variable conditions.

FINWE: Foreshock Impact on Near-earth space: Waves and Energisation

PI Lucile Turc, University of Helsinki, co-PI Minna Palmroth, University of Helsinki

The impact of the solar wind on Earth’s magnetosphere is also the source of space weather disturbances, which have adverse economic and societal impacts, e.g., damaging spacecraft electronics and disrupting GPS signals. Understanding space weather effects and forecasting them accurately is critical to protecting key infrastructures in space and on the ground. This project will significantly advance our knowledge of processes at play in near-Earth space, which will strengthen our space weather forecasting capabilities.

Shock waves are ubiquitous in space and astrophysical environments. They are formed by explosions of dying stars or when fast streams of charged particles, solar or stellar winds, encounter obstacles such as planets. Shocks are major acceleration sites of charged particles, but many issues remain unresolved regarding how particles are accelerated to the highest energies. Recent works indicate that the foreshock, a region extending ahead of the shock and filled with intense wave activity, plays an important role in particle acceleration. This project makes use of the global simulation model Vlasiator to investigate how foreshock dynamics impact the shock and how particles are energised. Vlasiator’s unique capabilities to simulate ion-scale processes in their global context will enable us to obtain a comprehensive picture of the physics at play and unravel the complex cross-scale dynamics of particle acceleration. The research group is aiming at a breakthrough in our understanding of shocks in general.

FLRG: Feed-forward large-scale 3D foundation model for both reconstruction and generation

PI Juho Kannala, Aalto University & University of Oulu

This project aims to revolutionize 3D scene reconstruction and generation by developing the feed-forward 3D foundation model by developing a novel single feed-forward architecture. This technology reshapes computer vision with real-time 3D capabilities, democratizing high-fidelity content creation for educators, artists, and developers. Its applications span healthcare — enabling AR-guided surgery and assistive navigation tools—and sustainability, accelerating digital twins for urban planning. The research group’s unified architecture combines Vision Transformers (ViTs), Implicit Neural Representations, and dynamic memory bank mechanisms. Those methods enable the model’s real-time, scalable inference while maintaining state-of-the-art accuracy. Trained on large-scale, diverse datasets, the model could achieve robust generalization.

GALPMSSD: Small scale galactic dynamo at high Prandtl number

PI Frederick Gent, Aalto University, co-PI Mordecai Mac Low, American Museum of Natural History

To understand galaxies the research group will use numerical models that span vast scales in time and space. Cutting-edge simulations therefore require extreme-scale computations. The research group will study the growth of magnetic fields in galaxies, and addresses some common bottlenecks faced in frontier computing. Magnetic fields in galaxies are ubiquitous. These magnetic fields affect galactic structure, star formation, and the processing of the cosmic dust that forms the building blocks for planets. These fields are grown by turbulent dynamo driven by supernova explosions and galactic rotation. Numerical models of dynamos to date have failed to predict the observed strength of turbulent fields. The research group hypothesize the reason is unphysically enhanced resistivity from insufficient numerical resolution, and the group will apply the highest resolution simulations so far achieved to resistive and viscous interstellar dynamos to clarify the role the turbulent dynamo has in structuring galactic magnetism.

GR-Nucleation: Machine-learning enabled molecular dynamics study of graphite nucleation from a paramagnetic iron melt

PI Jaakko Akola, Tampere University, co-PI Adam Götz, Norwegian University of Science and Technology

Controlling the microstructure of structural materials is vital to achieve desired properties. A crucial step in microstructure formation is nucleation, which is the process initiating solidification. In most metals of practical interest, nucleation occurs heterogeneously on solid inclusions (oxides, sulfides, nitrides, etc.) pre-existing in the melt. The role of these particles in the nucleation of the new phase can be diverse – to provide solid support for the new phase, act as a structural template for the formation of a phase with a similar crystal structure, and to support nucleation through chemical affinity by attracting atoms from the liquid to adhere. Yet, despite much empirical effort, this process remains poorly understood. In this project the group will study the role of inclusions on graphite nucleation from a ferrous melt on an atomistic scale, and will train machine-learning force-fields for relevant systems and study graphite nucleation through molecular dynamics simulations.

INSPIRE: Insights into Nanocluster-Protein Interactions and Electronic properties

PI Hannu Häkkinen, University of Jyväskylä

In this project, the research group investigates hybrid nanostructures formed by monolayer-protected gold nanoclusters and proteins using machine learning, molecular dynamics simulations and electronic structure calculations. Functionalization of the protective ligand layer enhances the biocompatibility of gold nanoclusters, enabling potential applications in bioimaging, biosensing, and nanomedicine. However, the interaction between the gold nanocluster and the protein can significantly influence the structure and function of both components, for example through formation of protein corona around the nanocluster. To understand these effects, the group employs machine learning methods in predicting the structures of such hybrids, and advanced molecular dynamics simulation methods to examine structural characteristics and dynamics of these structures at atomic resolution. Electronic structure calculations further provide insights into spectral properties and related phenomena of the hybrid structures.

IrRePoT: Irradiation Response of Polycrystalline Tungsten

PI Fredric Granberg, University of Helsinki, co-PI Aslak Fellman, University of Helsinki

Tungsten is the chosen material for some of the most critical parts of fusion reactors and will be in a very demanding environment with extreme irradiation and heat fluxes. Understanding how tungsten is behaving and evolving under such conditions is critical to be able to use fusion power as a CO2-free energy source. With extreme-scale atomistic computer simulations, the group will be able to understand this evolution at scales that are directly relevant for experimental validation, but still contain the atomistic resolution, needed for deep understanding of the underlying mechanisms. In this project the researchers will conduct never-before-studied system sizes with state-of-the-art ML interaction models, to understand the evolution of tungsten under harsh conditions in future fusion power plants.

MaMuLaM: Massively Multilingual Language Models

PI Jörg Tiedemann, University of Helsinki, co-PI Filip Ginter, University of Turku

Multilingual natural language processing (NLP) is essential for global AI applications, yet training and deploying massive models across diverse languages present computational and efficiency challenges. This project aims to develop and optimize massively multilingual language models with a focus on low-resource languages. The group will work on continual pretraining of large language models, on large-scale multilingual datasets and design modular neural networks for efficient language adaptation. We explore flexible modularity and knowledge distillation techniques to enhance performance and efficiency for smaller language models in massively multilingual settings. The project will serve the EU project HPLT, the ERC proof-of-concept project MARMoT and GreenNLP funded by the Research Council of Finland. The project is also a direct continuation of the MaLA-LM initiative in collaboration with TU Darmstadt.

OpenEuroLLM Design: OpenEuroLLM Design

PI Sampo Pyysalo, University of Turku, co-PI Jonathan Burdge, AMD Silo AI

OpenEuroLLM is an unprecedented collaboration where Europe’s leading AI companies, research institutions, and HPC centres combine forces to develop fully open next-generation large language models (LLMs) for all European languages. The OpenEuroLLM Design project will address critical questions in the design of the OpenEuroLLM models, exploring key architecture decisions for efficient training of large- scale multilingual models, and identifying the optimal composition and preprocessing of data for training multilingual models, addressing considerations such as data filtering and cleaning as well as optimal data mixtures taking into account factors such as language, register, and topics. The project will train thousands of models of various sizes that will be comprehensively evaluated to identify the most effective design for the largest OpenEuroLLM models and establish scaling laws to predict their performance.

PepMemML: Design of optimal peptide sequences for membrane-related applications using high-throughput molecular simulations and machine learning

PI Matti Javanainen, Tampere University

Peptides are building blocks of proteins. This project combines simulations and machine learning to design optimal peptide sequences for biophysical applications related to peptide adsorption onto, insertion into, and penetration through biomembranes, and the partitioning of transmembrane peptides between different membrane environments. The researchers will sample 10s of thousands of sequences using high-throughput molecular dynamics simulations. The results are used to train machine learning models to predict sequences for peptides that would localize in the desired manner in a lipid bilayer. This research will help researchers to understand the key factors in determining peptide localization. It will also be extended to more realistic mammalian or bacterial membranes, after which the predictions will provide good guesses for antimicrobial or antiviral peptides, or cell-penetrating peptides for cargo delivery.

PTAUHelium: Phase Transition Dynamics in A-Universe of Helium Droplet

PI Kuang Zhang, University of Helsinki, co-PI Mark Hindmarsh, University of Helsinki

The first order transition between superfluid A- and B-phases of liquid Helium-3 (He3) is a close analogy to the one in extensions of the Standard Model of particle physics. According to homogeneous nucleation theory, a 1st order phase transition in the early universe would have generated gravitational waves, which are an important target of the 2030s’ LISA gravitational wave observatory. However, experiments in superfluid He3 have shown this transition is much more rapid than theory predicts. As investigators of the QUEST-DMC collaboration, which aims to understand this transition through new experiments and large-scale simulations, the researchers seek the resources to carry out the numerical simulation investigations. With extreme scale access to LUMI-G, the group can gather data on a wide range of parameters and initial conditions, allowing more accurately predict lifetime of metastable states in both superfluid He3 and the early universe.

TDP-FM: Transforming Digital Pathology with Foundational Models

PI Mario Parreño Centeno, University of Oulu

In this project the researchers aim to develop a novel multimodal foundation model for computational pathology, addressing critical challenges in cancer diagnostics. Current approaches often require extensive annotated data and lack interpretability, limiting their clinical applicability. This project’s approach leverages self-supervised learning to harness vast amounts of unlabeled data, significantly reducing the need for costly annotations.

By integrating diverse data modalities, including whole-slide images and genomic profiles, our model aims to provide a comprehensive understanding of cancer pathology and improve model interpretability. The computational power of LUMI is essential in this project for developing a robust foundation model that can transform cancer diagnostics.