Difference between revisions of "Streaming scientific software with EESSI"

From HPC Wiki
Jump to navigation Jump to search
Line 61: Line 61:
  
 
EESSI provides optimised builds for many architectures and the correct version is automatically loaded.
 
EESSI provides optimised builds for many architectures and the correct version is automatically loaded.
This can result in even performance when the locally provided alternatives have not been built with architecture optimisations in mind.
+
This can result in even better performance when the locally provided alternatives have not been built with architecture optimisations in mind.
 
With community contributions from multiple people, you can benefit from synergy effects in the maintenance of a common software stack.
 
With community contributions from multiple people, you can benefit from synergy effects in the maintenance of a common software stack.
  

Revision as of 14:17, 12 April 2024


The European Environment for Scientific Software Installations (EESSI, pronounced as "easy", also see [1]), provides a "streaming service" for scientific software. It is based on the Cern VM File System (CVMFS), which is a read-only (from a user-perspective) file system for software distribution. CVMFS was originally developed in the context of worldwide distributed GRID computing and has been widely used in production for many years.

The general idea is to install a CVMFS client on all user-facing machines in an HPC cluster. The client is easily configured to connect to the EESSI servers to download and cache files via HTTP.

Motivation

Researchers migrating from their laptops and workstations to HPC systems may need rethink their software environment before they can start their calculations.

The same is true when researchers migrate between HPC systems, as many systems have their own conventions and technological solutions for providing the required research software, e.g. in the form of LMod modules.

Another reason is the heterogeneous hardware environment which may require special software installations and configurations, e.g. optimized builds for x86 CPUs, ARM CPUs, FPGAs or GPUs, and libraries for fast interconnect technologies.

EESSI addresses all of these issues by providing the same workflow and mechanism on many platforms across all scales.

Loading Software Modules

On an HPC cluster, you can check if CVMFS is available, e.g. by running

less /cvmfs/software.eessi.io/README.eessi

In this case, an init script has to be sourced:

$ source /cvmfs/software.eessi.io/versions/2023.06/init/bash 
Found EESSI repo @ /cvmfs/software.eessi.io/versions/2023.06!
archdetect says x86_64/intel/haswell
Using x86_64/intel/haswell as software subdirectory.
Found Lmod configuration file at /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/lmodrc.lua
Using /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/modules/all as the directory to be added to MODULEPATH.
Found Lmod configuration file at /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/lmodrc.lua
Found Lmod SitePackage.lua file at /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/.lmod/SitePackage.lua
Initializing Lmod...
Prepending /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/intel/haswell/modules/all to $MODULEPATH...
Environment set up to use EESSI (2023.06), have fun!

{EESSI 2023.06} $ module avail # run the usual module commands

On a laptop, you would have to install CVMFS or use an EESSI container.

Adding Missing Software

EESSI is a community driven project built on top of EasyBuild to manage reproducible build recipes. If an important software module is missing or if you would like to distribute scientific software under your are developing, you can contribute by opening a pull request.

After the pull request is reviewed and approved, an automatic process is building and ingesting the software into the EESSI stack.

Performance Considerations

The first access to any software package may be slower than subsequent accesses, if it is not present in local or site-local caches. The download is limited by the available network bandwidth.

In practice the access to software distributed with EESSI via CVMFS is seamless for a user. Before taking performance measurements, it is a good idea to ensure that the software is available in local caches by running a small dummy workload (cache warming). Another approach may be to limit performance measurements to later stages of the workflow, so that non-deterministic download times on potential first access do not affect the measurements.

EESSI provides optimised builds for many architectures and the correct version is automatically loaded. This can result in even better performance when the locally provided alternatives have not been built with architecture optimisations in mind. With community contributions from multiple people, you can benefit from synergy effects in the maintenance of a common software stack.

Proof of Concept at the University of Wuppertal

EESSI is distributed with the default configurations of CVMFS clients, so any site already offering CVMFS on worker nodes can evaluate software streaming through EESSI without any changes.

In a simple proof of concept, we've compared the execution of a trivial computational fluid dynamics simulation with the same version of OpenFOAM 11. The test is running 4 MPI processes on two separate nodes, to also test the OpenMPI version shipped with EESSI in combination with local InfiniBand hardware.

In the following table, you can observe the results of three measurements each:

# EESSI modules via CVMFS:
     JobID    State       Elapsed  CPUEff   MemEff 
  26269014  COMPLETED    00:04:25   44.2%     0.2%  
  26269015  COMPLETED    00:04:17   47.2%     0.2%  
  26269016  COMPLETED    00:04:05   46.0%     0.2%  

# Local modules on parallel filesystem (BeeGFS):
     JobID    State       Elapsed  CPUEff   MemEff 
  26269017  COMPLETED    00:05:18   40.6%     0.2%  
  26269018  COMPLETED    00:04:25   41.8%     0.2%  
  26269019  COMPLETED    00:04:31   41.7%     0.2%

Both approaches result in comparable elapsed job times, whereas the EESSI version is slightly faster. This is likely due to worse performance of the local parallel filesystem (BeeGFS), which is not optimized for software distribution and can be affected by other running jobs in the HPC system.

So using Software distributed by EESSI can lead equal or better performance than using a local software stack.

Changes to the job scripts are minimal. Users just have to make sure to source the EESSI init script and use the correct module naming scheme.


Additional Resources