Difference between revisions of "Admin Guide Easybuild"
m |
(→) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
− | [[Category:HPC-Admin|Easybuild]] | + | [[Category:HPC-Admin|Easybuild]]<nowiki /> |
− | [[Category:HPC.NRW-Best-Practices|Easybuild]] | + | [[Category:HPC.NRW-Best-Practices|Easybuild]]<nowiki /> |
− | [[Category:Easybuild]] | + | [[Category:Easybuild]]<nowiki /> |
− | + | {{DISPLAYTITLE:Easybuild (Admin Guide)}}<nowiki /> | |
== Introduction == | == Introduction == | ||
Line 121: | Line 121: | ||
[https://easybuild.readthedocs.io/en/latest/Hooks.html Hooks] are small Python scripts which can directly influence the build process. These can be e.g. used to make site-specific adjustment to Easyconfig file without the need to create a completely new one each time. | [https://easybuild.readthedocs.io/en/latest/Hooks.html Hooks] are small Python scripts which can directly influence the build process. These can be e.g. used to make site-specific adjustment to Easyconfig file without the need to create a completely new one each time. | ||
− | An example hook script is given below, which adds some slurm specific configuration, adds flags to every OpenMPI build and points to | + | An example hook script is given below, which adds some slurm specific configuration, adds flags to every OpenMPI build and points to a license file for intel software installations: |
<source lang="python">import sys, os | <source lang="python">import sys, os | ||
Line 169: | Line 169: | ||
self.log.info("[pre-configure hook] Adding %s" % extra_opts) | self.log.info("[pre-configure hook] Adding %s" % extra_opts) | ||
self.cfg.update('configopts', extra_opts)</source> | self.cfg.update('configopts', extra_opts)</source> | ||
+ | |||
== Toolchains == | == Toolchains == | ||
Latest revision as of 11:55, 5 April 2022
Introduction
EasyBuild (EB) is a Python framework to automate the process of software installations and the creation of environment modules in an HPC environment. Most installations are hereby directly compiled from source to create architecture optimized and performant builds. The following article shortly describes the first steps of how to set up EasyBuild. The main focus, however, lies on the description of a sustainable directory structure and configuration helping to keep an overview of installed software while beeing able to provide frequent updates of new versions. A more complete guide on how to set up EasyBuild can be found at the official documentation.
Setup
Assumptions
- /sw/easybuild/ is the main folder and accessible from all compute nodes
- /sw/easybuild/default will be the default path for your EB installation
- /sw/easybuild/stacks/skylake/2020a is a path for a specific CPU arch and software stack
- CentOS 7 (otherwise the list of dependencies for EB can vary)
- LMOD >= v8.3.4 (the current versions from the OpenHPC repos are not sufficient)
- Python 3 (You can also use Python 2, but at this point you really shouldn’t)
Initial installation and configuration
# INSTALL SYSTEM DEPS
yum install epel-release
yum install python3 git gcc gcc-c++ libibverbs-devel patch openssl-devel
# DOWNLOAD THE BOOTSTRAP EB SCRIPT AND BOOTSTRAP EB
curl -O https://raw.githubusercontent.com/easybuilders/easybuild-framework/develop/easybuild/scripts/bootstrap_eb.py
python3 bootstrap_eb.py /sw/easybuild/default
# UPDATE $MODULEPATH, AND LOAD THE EasyBuild MODULE
module use /sw/easybuild/default/modules/all # add this to your bashrc
module load EasyBuild
# CREATE DIRS AND A FIRST CONFIGFILE FOR YOUR CPU ARCHITECTURE
mkdir -p /sw/easybuild/configfiles # directory for your configs
mkdir -p /sw/easybuild/sources
mkdir -p /sw/easybuild/stacks/skylake/2020a
vim /sw/easybuild/configfiles/skylake-2020a.cf # Enter the values from the example below
# CREATE AN ALIAS IN YOUR BASHRC FOR EVERY SPECIFIC CONFIGFILE, E.G
alias ebsky-2020a='eb --configfile=/sw/easybuild/configfiles/skylake-2020a.cfg'
# YOU CAN LIST ALL AVAILABLE CONFIG OPTIONS WITH
eb -a
# OR YOU REDIRECT THE OUTPUT OF eb --confighelp TO GET AN ANNOTADED CONFIGFILE
eb --confighelp >> myconfig.cfg
# TO SHOW YOUR CURRENT EB CONFIGURATION USE
eb --show-config
ebsky-2020a --show-config
#NOTE: If you install software with EB and just use 'eb' you will install it into your default folder!
Example usage of EasyBuild
# Basic usage
eb -h # short help
eb -H # complete list of all options
# Dry-run installation of the foss-2020a toolchain
ebsky-2020a foss-2020a.eb -r -D # -D == dry-run / -r == install deps
ebsky-2020a foss-2020a.eb -r -M # -M == show only missing dependencies
# Real installation with all deps
ebsky-2020a foss-2020a.eb -r
Folder structure
Inside of /sw/easybuild/stacks/
, a folder for every supported CPU architecture is created (broadwell, skylake, etc.). These contain subfolders of the specific EB toolchain release dates (i.e. 2018b, 2019a etc.) where one finds the modules and installed software. In addition the folders configfiles
, hooks
, sources
and easybuild_repo
can be found here. The easybuild_repo folder is synced via a local git repository and contains a folder archive
and custom_easyconfigs
.
/sw └── easybuild ├── configfiles ├── custom_easyconfigs ├── hooks # --> Python script which can alter the build process └── sources # --> place to store all downloaded source-files; prevents downloading a source multiple times ├── stacks ├── broadwell ├── skylake └── 2018b └── 2019a ├── modules └── software ├── ... ├── ...
Further Configuration of Easybuild
Config files
For every toolchain release there exists a base configuration file inside the folder configfiles
. Personal settings of different admins can be made in local config.cfg files in ~/.config/easybuild/
. Configfiles can be used via
eb --configfile=/PATH/TO/CONFIG.CFG
Files which are listed within --configfile
are treated first.
Example of a basic skylake-2020a.cfg:
[config] prefix = /sw/easybuild/stacks/skylake/2020a module-naming-scheme = HierarchicalMNS sourcepath=/sw/easybuild/sources/ robot-paths=/sw/easybuild/custom_easyconfigs:%(DEFAULT_ROBOT_PATHS)s group-writable-installdir=true
Using different config files for different architectures and toolchain releases helps to handle heterogeneous systems and to keep an overview of installed software.
Create a bash alias for every config file for ease of use.
alias ebsky-2020a='eb --configfile=/sw/easybuild/configfiles/skylake-2020a.cfg'
Archive and self written Easyconfig files
After every successful build, the used Easyconfig file will be be archived in stacks/$ARCH/$RELEASE/easybuild_repo/
. Self-written Easyconfig files can be stored in the folder custom_easyconfigs
and will be considered when searching for software. A guide on how to write your own easyconfig file can be found here.
SLURM integration
SLURM can be used as a job backend to compile multiple programs at the same time. Dependencies will be resolved automatically and the order of running jobs is taken care of. Use the flags --job
when you run eb. You can add the following to your config files. Modify appropriately.
job-backend=Slurm
job-cores=<NCORES>
job-max-jobs=<NJOBS>
Hooks
Hooks are small Python scripts which can directly influence the build process. These can be e.g. used to make site-specific adjustment to Easyconfig file without the need to create a completely new one each time.
An example hook script is given below, which adds some slurm specific configuration, adds flags to every OpenMPI build and points to a license file for intel software installations:
import sys, os
from easybuild.tools.build_log import print_msg
from distutils.version import LooseVersion
def start_hook(*args, **opts):
if "--job" in sys.argv:
# Check if env var was set
slurm_partition = os.getenv("SBATCH_PARTITION")
if slurm_partition is not None:
print_msg("[start-hook] SLURM_PARTITION ENV VAR set: %s."%slurm_partition)
else:
slurm_partition = "normal"
slurm_mem_per_node = os.getenv("SBATCH_MEM_PER_NODE")
if slurm_mem_per_node is not None:
print_msg("[start-hook] SLURM_MEM_PER_NODE ENV VAR set: %s."%slurm_mem_per_node)
else:
slurm_mem_per_node = "36G"
import easybuild.tools.job.slurm as slurm
class slurm_job(slurm.SlurmJob):
def __init__(self, *args, **opts):
super(slurm_job, self).__init__(*args, **opts)
self.job_specs['partition'] = slurm_partition
self.job_specs['mem'] = slurm_mem_per_node
self.job_specs['time'] = '12:00:00'
slurm.SlurmJob = slurm_job
print_msg("[start-hook] using partition << %s >> "%slurm_partition)
def pre_prepare_hook(self, *args, **kwargs):
# SET PATH TO INTEL LICENSE FILE
if self.name in ["icc", "ifort", "itac", "VTune"]:
self.cfg['license_file'] = "/sw/licenses/USE_SERVER.lic"
self.log.info("[pre-prepare hook] Setting path to license file: %s" % self.cfg['license_file'] )
print_msg("Intel license file: %s" % self.cfg['license_file'])
def pre_configure_hook(self, *args, **kwargs):
if self.name == 'OpenMPI':
extra_opts = ""
# Enable slurm and pmi support
extra_opts += "--with-slurm --with-pmi"
# Now add the options
self.log.info("[pre-configure hook] Adding %s" % extra_opts)
self.cfg.update('configopts', extra_opts)
Toolchains
About every 6 months EasyBuild releases new toolchains which combine a set of specific modules for compilers, MPI and numerical libraries (cf. common toolchains). The two most common are:
- intel + intelcuda –> icc/ifort, iMPI, MKL
- foss + fosscuda –> gcc, OpenMPI, OpenBLAS, LAPACK, ScaLAPACK, FFTW
Modules
Modules in this example are automatically created using the hierarchical module naming scheme. There are other options (the default being the EasyBuildModuleNamingScheme) available as well as the option to create your own site-specific module naming scheme. See this link.
Meta-modules
A meta-module in this case is a module which makes a different module path available to the user while taking care of the correct path for the current architecture. An example of such a module is given below. These modules can e.g. reside in folder /sw/easybuild/meta-module/palma. The software in this example case is called palma and the versions correspond the EB toolchain releases, i.e. 2019b.lua, 2020a.lua etc. The CPU_ARCH
environment variable which is used here, is exported in a modules.sh
script within /etc/profile.d
via CPU_ARCH=$(cat /sys/devices/cpu/caps/pmu_name)
.
help([==[
Description
===========
This is a meta module giving you access to the PALMA 2020a software stack. Software on PALMA is build using the
EasyBuild Python Framework.
Supported CPU Architectures: skylake
More information
================
- PALMA: https://confluence.uni-muenster.de/display/HPC
- EasyBuild: https://easybuild.readthedocs.io
]==])
whatis([==[This is a meta module giving you access to the PALMA 2020a software stack. Software on PALMA is build using
the EasyBuild Python Framework.]==])
whatis([==[PALMA: https://confluence.uni-muenster.de/display/HPC]==])
whatis([==[EasyBuild: https://easybuild.readthedocs.io]==])
local version = "2020a"
local root = "/sw/easybuild/stacks"
local cpu_arch = os.getenv("CPU_ARCH")
local suffix = "/modules/all/Core"
local hostname = subprocess("hostname -s")
-- LmodMessage("cpu_arch = ", cpu_arch)
-- LmodMessage("hostname = ", hostname)
local cpu_arch = os.getenv("CPU_ARCH")
if (string.find(hostname, "^r13n[01-12].*")) then
cpu_arch = cpu_arch .. "-IB"
end
-- ONLY SKYLAKE HAS THIS TOOLCHAIN AT THE MOMENT
if (cpu_arch ~= "skylake") then
-- THROWS AN ERROR MESSAGE AND EXIT
LmodError(version, "IS NOT YET AVAILABLE ON THE ", cpu_arch, "ARCHITECTURE.")
end
conflict("palma")
prepend_path("MODULEPATH", pathJoin(root, cpu_arch, version, suffix))
-- add_property("lmod",)
Using the correct modules path
We created a modules.sh
file inside of /etc/profile.d/
on the compute and login nodes instead of symlinking directly to lmod/init/profile
. This deals with some issues Lmod encounters with slurm. In case of an interactive slurm session, it also has to reset $MODULEPATH
before updating/reloading the currently loaded modules. Otherwise updating the modules will not unload the old path since we might be on a node with a different architecture than on the login node.
#!/bin/sh
# NOTE: In a slurm batch job (non-interactive, non-login) this file is not sourced.
# Only the file where $BASH_ENV points to is sourced in this case.
# CPU ARCH ENV VARIABLE FOR META MODULES
if [ -f "/sys/devices/cpu/caps/pmu_name" ];then
export CPU_ARCH=`cat /sys/devices/cpu/caps/pmu_name`
fi
# Special treatment if we are in an interactive slurm session
if [ ! -z "$SLURM_NODELIST" ];then
echo "YOU ARE NOW IN AN INTERACTIVE SLURM JOB"
# Reset module path
export MODULEPATH=/sw/easybuild/meta-modules/
# Update the currently loaded modules to account for architecture specific paths
module update
# re-source the bash init to gain module autocomplete functionality in an interactive session
. /opt/lmod/lmod/init/bash >/dev/null
return
fi
# Enable bash module support
. /opt/lmod/lmod/init/profile >/dev/null
Improvements
- There are still a lot of (low-level) modules for system libraries exposed to the users –> A solution can be using hidden modules
- Instead of using meta-modules, one can use bind mounts on every node, such that the architecture specific folder points to a generall easybuild install folder (symlinks are know to cause issues)
- An automated build process for all architectures
- ‘module spider’ output pointing to toolchain instead of compiler plus MPI library