Luxembourg’s new supercomputer, MeluXina, was launched on 7 June as part of the country’s data-driven innovation strategy, addressing the needs of companies, start-ups as well as public and research institutions.
Several research projects have been granted early access to MeluXina to perform large-scale experiments and test their software on the system.
MeluXina to assess the scalability of models studied in research projects
For one month, several research projects will undergo large-scale experiments and test their software on the system before MeluXina’s operations actually begin. The projects result from collaborations within research groups and between research and industry. They were selected based on their potential impact on society, the economy and science, as well as their ability to exploit the resources of the supercomputer.
Early access to MeluXina provides an excellent opportunity for researchers to evaluate the scalability of the models studied in the research projects and to run their prototypes in a production environment. The primary results will encourage other researchers to run their projects in MeluXina, thus establishing public-private partnerships for academic and industrial research on advanced research.
Selected projects to support Luxembourg research excellence
At national level, one of the short- and medium-term needs is the skills required by industry to exploit the capacity and benefits offered by the new supercomputer. Against this background, the selected projects support the existing pillars of research excellence in the priority areas of materials science, physics and biology, materials and simulation, computer science and ICT.
Head of the Blast-Comet project, Simone Zorzan outlined what it is all about. “It revolves around two quite well-known software in bioinformatics. One is a standard for the analysis of genes and genomes, while the other is used for the analysis of proteins,” he said before elaborating. “During the past few years I developed a software, that was able to run these two tools on the LIST HPC; it distributes the calculation across several nodes and speeds up the processing of the biological information”.
This software currently implements the two biological tools, Blast and Comet. Both software now make use of multiple nodes. Blast is the most used software to analyse biological sequences, while Comet is a well-known software to identify proteins on large inputs resulting from mass spectrometry measurements. In the future the underlying architecture could be adapted to other bioinformatic tools.
The idea behind this submission is to explore possible interest by biologists in the MeluXina infrastructure, to evaluate the performances with respect to larger inputs and to ideally provide a user-friendly interface for non-computer science-savvy users.
Explaining CazymeClust, project leader Malte Herold stated. “The project goes in a similar direction as Comet-blast as on the basis we want to compare sequences of genes and proteins,” he began. “But we want to do this on a larger scale. When you want to compare millions of sequences there are other approaches that are more efficient. A recent paper showcased a clustering approach to identify and characterise the unknown fraction of genes in microbial communities. We want to test their approach and apply it to a different dataset of samples enriched in carbohydrate active enzymes”.
Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40%-60% of the predicted genes are unknown. Systematically addressing this problem is challenging. With clustering methods it is sometimes possible to directly identify gene functions or determine sets of unknown genes that are important in a particular environment.
“We hope that with the access to MeluXina, we can build a reference gene catalogue of carbohydrate active enzymes as a screening tool for future projects and potentially identify new important enzymes,” Malte stated.
How will MeluXina help? “Even when using efficient clustering approaches, for datasets consisting of hundreds of millions of genes, it is not possible on normal computers, so we need an HPC for this. The new infrastructure allows us to test this method on a large dataset” Malte said.
The approach of CazymeClust is to explore a wide combination of inhouse or publicly available datasets enriched in carbohydrate active enzymes, such as samples from anaerobic digesters or termite gut.
Exploring MeluXina Capabilities to Perform High-resolution Terrestrial Systems Simulations
“This early access proposal is about exploring MeluXina capabilities to perform high resolution terrestrial system simulations. So what terrestrial system means here is a complete system from the land to the atmosphere.” explained the project leader Mauro Sulis.
This proposal aims at exploring the opportunities offered by the deployment of the petascale (faster processing of traditional supercomputer applications) HPC infrastructure of MeluXina to perform a suite of numerical experiments on regional-scale terrestrial systems at increased spatial resolution. In particular, the impact of using a convection-permitting atmospheric model setup for simulating extreme weather events will be showcased.
Mauro continued, “When we talk about HPC-enabled terrestrial system simulations, something that is important to consider is that “behind” this kind of scientific activity there is always a laborious technical procedure consisting in “porting” the adopted numerical code (or suite of codes) to a new machine. Porting means finding the optimal configuration of your model in terms of compilers and optimisation flags and exploiting in the best way the complex hardware architecture and software stack of the machine”.
For this early access proposal, the aim is to use this granted computing time to find the most suitable configuration of the numerical terrestrial system model used in some of his scientific projects . This will mean gaining time in running simulations from a scientific perspective and reducing the energy bill and the environmental impact.
But what does this project hope to achieve in one month? Mauro explained that he wants to gain additional insights on some of the technical (e.g., code performance diagnostics) and technological (e.g., storage/compute unbalance) challenges faced by his scientific activities carried out across the ENVISION Unit. “And if I want to look ahead, while remaining far from a fully-fledged digital twin of the Earth, the numerical experiments conducted in this proposal could be deemed as preliminary steps toward this direction”.
Echoing the sentiments of all three projects, by using a vehicle driving analogy to explain the situation of the projects working with MeluXina, Simone painted an image of where LIST currently stands. “We still don’t have the keys to access and we don’t know exactly what kind of “engine” is under the hood, or how to get most of the power while driving, what kind of reaction it will have to the different and extreme driving situations, , with which accessories it is shipped and which ones we can add… So we need to become skilled drivers and engineers, to achieve the best performances on the avenues LIST will run in the future. We have done an important training on the LIST HPC, but we could have some learning to do to get things up to speed with MeluXina”.
Digital Twin of a Biomass Furnace
The proposed digital twin concept has a significant impact on processing of biomass and is considered as a crucial step along the processing chain to the desired high quality biomass furnace encompassing functionality and durability. Creating a digital twin helps to unveil the underlying physics of biomass conversion, and thus, gaining a deepened understanding. The latter enables engineers to design improved reactors and operate them at more favourable conditions with a higher output at reduced costs contributing to a resource efficient Europe. The Digital Twin can also predict responses of the biomass furnace to safety critical events and uncover previously unknown issues before they become critical and thus targets also the societal aspect of safe and reliable processes.
Exploring the chemical space of drug-like molecules
This project will contribute to the generation of one of the first databases of quantum mechanical properties for large drug-like molecules which could be used for the development of ML-assisted chemical space exploration tools for the discovery of chemicals with a desired combination of properties for a given application. Hence, this data will be the basis of future academic and industrial investigations in the direction of rational design of chemical compounds.
Extended Discrete Element Method (XDEM)
On the scientific aspect, this study will demonstrate the large-scale performance of the XDEM software, with a special focus on the original load-balancing policies and dynamic load-balancing specially designed for particles. XDEM has multi-physics applications such as biomass furnace, blast furnace and additive manufacturing that are ubiquitous in Luxembourg’s industry. Efficient parallel numerical methods and analysis tools provide more detailed results, permit faster technological development and innovation and constitute an economic advantage.
The FEniCS Project finite element software is a computing platform for quickly translating scientific models into efficient finite element simulations. It has been used to develop robust and scalable finite element solvers for challenging problems in diverse application areas including Physics, Mathematics, Engineering and Biology.
The project brings together experts at the University of Luxembourg, University of Cambridge and Rafinex Sarl. It proposes to assess the scalability of four state-of-the-art finite element solvers implemented in the FEniCS Project with strong relevance to real problems in Science and Engineering. Strong and weak scaling tests will be carried out on MeluXina using up to 25600 processes.
GigaSOM.jl is a Julia package for clustering data from flow and mass cytometry. It is developed because the existing methods could not easily contain the datasets that are produced now. The motivation for the whole project came from Luxembourg Institute of Health (LIH) who use the software regularly now.
In case the users of GigaSOM would like to run it on the new supercomputer, they will have some assurance that the application is “compatible”, and they won’t need to solve various installation and portability problems.
GigaSOM is routinely used for diagnosing and evaluating human samples, which (in turn) partially contributes to answering many complicated research questions in immunology and oncology.
Pure GPU Constraint Solver
The GPU Constraint Solver project aims to design a novel software architecture for solving constraint problems, a general method for many optimisation problems. Constraint solvers are very compute intensive, parallel machines offer a great opportunity to improve the solvers’ performance. The novel architecture exploits mathematical properties to guarantee correct results on parallel machines.
The essential outcome is to test the validity of the chosen theoretical approach to parallel constraint solvers. This work is set in an overall research line, planned in a FNR CORE proposal. The research line addresses the suitability of lattice theory to other parallel programming models (beyond constraint solvers), and even to define a new general purpose parallel model. Results from this scalability study will be very useful in the course of this CORE project.
The University of Luxembourg and the European Investment Bank (EIB) through the STAREBEI programme are working together to encourage private equity partners to invest in innovative and sustainable technologies. The funded research project “Sustainable and Trustworthy Artificial Intelligence Recommitment System (STAIRS) ” proposes an innovative approach to generate efficient recommitment strategies to guide institutional investors with the aid of AI-based algorithms.
The Visualising simulation ensembles of ferroelectric/dielectric superlattices for energy applications (VISIBLE ENERGY)” project brings together LIST’s Materials Research and Technology and IT for Innovative Services units. Together they will be producing and processing large amounts of data, with LIST’s giant Visualisation Wall also implicated.
The project aims endeavours to optimise artificial ferroic materials for several applications, potentially for energy storage and low power electronics. The idea of this project is to use some of the new infrastructure LIST has, such as the Visualisation Wall, as this is very well suited to large scale data.
Find out more details about University of Luxembourg’s projects : Ten University projects granted early access to MeluXina
Read more about LIST project