Easybuild (Admin Guide)

From HPC Wiki
Admin Guide Easybuild
Jump to navigation Jump to search

Introduction

EasyBuild (EB) is a Python framework to automate the process of software installations and the creation of environment modules in an HPC environment. Most installations are hereby directly compiled from source to create architecture optimized and performant builds. The following article shortly describes the first steps of how to set up EasyBuild. The main focus, however, lies on the description of a sustainable directory structure and configuration helping to keep an overview of installed software while beeing able to provide frequent updates of new versions. A more complete guide on how to set up EasyBuild can be found at the official documentation.

Setup

Assumptions

  • /sw/easybuild/ is the main folder and accessible from all compute nodes
  • /sw/easybuild/default will be the default path for your EB installation
  • /sw/easybuild/stacks/skylake/2020a is a path for a specific CPU arch and software stack
  • CentOS 7 (otherwise the list of dependencies for EB can vary)
  • LMOD >= v8.3.4 (the current versions from the OpenHPC repos are not sufficient)
  • Python 3 (You can also use Python 2, but at this point you really shouldn’t)

Initial installation and configuration

# INSTALL SYSTEM DEPS
yum install epel-release
yum install python3 git gcc gcc-c++ libibverbs-devel patch openssl-devel 

# DOWNLOAD THE BOOTSTRAP EB SCRIPT AND BOOTSTRAP EB
curl -O https://raw.githubusercontent.com/easybuilders/easybuild-framework/develop/easybuild/scripts/bootstrap_eb.py
python3 bootstrap_eb.py /sw/easybuild/default

# UPDATE $MODULEPATH, AND LOAD THE EasyBuild MODULE
module use /sw/easybuild/default/modules/all # add this to your bashrc
module load EasyBuild

# CREATE DIRS AND A FIRST CONFIGFILE FOR YOUR CPU ARCHITECTURE
mkdir -p /sw/easybuild/configfiles # directory for your configs
mkdir -p /sw/easybuild/sources
mkdir -p /sw/easybuild/stacks/skylake/2020a

vim /sw/easybuild/configfiles/skylake-2020a.cf # Enter the values from the example below

# CREATE AN ALIAS IN YOUR BASHRC FOR EVERY SPECIFIC CONFIGFILE, E.G
alias ebsky-2020a='eb --configfile=/sw/easybuild/configfiles/skylake-2020a.cfg'

# YOU CAN LIST ALL AVAILABLE CONFIG OPTIONS WITH
eb -a
# OR YOU REDIRECT THE OUTPUT OF eb --confighelp TO GET AN ANNOTADED CONFIGFILE
eb --confighelp >> myconfig.cfg

# TO SHOW YOUR CURRENT EB CONFIGURATION USE
eb --show-config
ebsky-2020a --show-config

#NOTE: If you install software with EB and just use 'eb' you will install it into your default folder!

Example usage of EasyBuild

# Basic usage
eb -h # short help
eb -H # complete list of all options

# Dry-run installation of the foss-2020a toolchain
ebsky-2020a foss-2020a.eb -r -D # -D == dry-run / -r == install deps
ebsky-2020a foss-2020a.eb -r -M # -M == show only missing dependencies

# Real installation with all deps
ebsky-2020a foss-2020a.eb -r

Folder structure

Inside of /sw/easybuild/stacks/ , a folder for every supported CPU architecture is created (broadwell, skylake, etc.). These contain subfolders of the specific EB toolchain release dates (i.e. 2018b, 2019a etc.) where one finds the modules and installed software. In addition the folders configfiles, hooks, sources and easybuild_repo can be found here. The easybuild_repo folder is synced via a local git repository and contains a folder archive and custom_easyconfigs.

/sw
└── easybuild
    ├── configfiles
    ├── custom_easyconfigs
    ├── hooks   # --> Python script which can alter the build process
    └── sources # --> place to store all downloaded source-files; prevents downloading  a source multiple times
    ├── stacks
        ├── broadwell
        ├── skylake
            └── 2018b
            └── 2019a
                ├── modules
                └── software
            ├── ...
        ├── ...

Further Configuration of Easybuild

Config files

For every toolchain release there exists a base configuration file inside the folder configfiles. Personal settings of different admins can be made in local config.cfg files in ~/.config/easybuild/. Configfiles can be used via

eb --configfile=/PATH/TO/CONFIG.CFG

Files which are listed within --configfile are treated first.

Example of a basic skylake-2020a.cfg:

[config]
prefix = /sw/easybuild/stacks/skylake/2020a
module-naming-scheme = HierarchicalMNS
sourcepath=/sw/easybuild/sources/
robot-paths=/sw/easybuild/custom_easyconfigs:%(DEFAULT_ROBOT_PATHS)s
group-writable-installdir=true

Using different config files for different architectures and toolchain releases helps to handle heterogeneous systems and to keep an overview of installed software.

Create a bash alias for every config file for ease of use.

alias ebsky-2020a='eb --configfile=/sw/easybuild/configfiles/skylake-2020a.cfg'

Archive and self written Easyconfig files

After every successful build, the used Easyconfig file will be be archived in stacks/$ARCH/$RELEASE/easybuild_repo/. Self-written Easyconfig files can be stored in the folder custom_easyconfigs and will be considered when searching for software. A guide on how to write your own easyconfig file can be found here.

SLURM integration

SLURM can be used as a job backend to compile multiple programs at the same time. Dependencies will be resolved automatically and the order of running jobs is taken care of. Use the flags --job when you run eb. You can add the following to your config files. Modify appropriately.

job-backend=Slurm
job-cores=<NCORES>
job-max-jobs=<NJOBS>

Hooks

Hooks are small Python scripts which can directly influence the build process. These can be e.g. used to make site-specific adjustment to Easyconfig file without the need to create a completely new one each time.

An example hook script is given below, which adds some slurm specific configuration, adds flags to every OpenMPI build and points to a license file for intel software installations:

import sys, os
from easybuild.tools.build_log import print_msg
from distutils.version import LooseVersion

def start_hook(*args, **opts):
    if "--job" in sys.argv:
        # Check if env var was set
        slurm_partition = os.getenv("SBATCH_PARTITION")
        if slurm_partition is not None:
            print_msg("[start-hook] SLURM_PARTITION ENV VAR set: %s."%slurm_partition)
        else:
            slurm_partition = "normal"

        slurm_mem_per_node = os.getenv("SBATCH_MEM_PER_NODE")
        if slurm_mem_per_node is not None:
            print_msg("[start-hook] SLURM_MEM_PER_NODE ENV VAR set: %s."%slurm_mem_per_node)
        else:
            slurm_mem_per_node = "36G"


        import easybuild.tools.job.slurm as slurm
        class slurm_job(slurm.SlurmJob):
            def __init__(self, *args, **opts):
                super(slurm_job, self).__init__(*args, **opts)
                self.job_specs['partition'] = slurm_partition
                self.job_specs['mem'] = slurm_mem_per_node
                self.job_specs['time'] = '12:00:00'
        slurm.SlurmJob = slurm_job
        print_msg("[start-hook] using partition << %s >> "%slurm_partition)

def pre_prepare_hook(self, *args, **kwargs):
    # SET PATH TO INTEL LICENSE FILE
    if self.name in ["icc", "ifort", "itac", "VTune"]:
        self.cfg['license_file'] = "/sw/licenses/USE_SERVER.lic"
        self.log.info("[pre-prepare hook] Setting path to license file: %s" % self.cfg['license_file'] )
        print_msg("Intel license file: %s" % self.cfg['license_file'])

def pre_configure_hook(self, *args, **kwargs):
    if self.name == 'OpenMPI':
        extra_opts = ""
        # Enable slurm and pmi support
        extra_opts += "--with-slurm --with-pmi"

        # Now add the options
        self.log.info("[pre-configure hook] Adding %s" % extra_opts)
        self.cfg.update('configopts', extra_opts)

Toolchains

About every 6 months EasyBuild releases new toolchains which combine a set of specific modules for compilers, MPI and numerical libraries (cf. common toolchains). The two most common are:

  • intel + intelcuda –> icc/ifort, iMPI, MKL
  • foss + fosscuda –> gcc, OpenMPI, OpenBLAS, LAPACK, ScaLAPACK, FFTW

Modules

Modules in this example are automatically created using the hierarchical module naming scheme. There are other options (the default being the EasyBuildModuleNamingScheme) available as well as the option to create your own site-specific module naming scheme. See this link.

Meta-modules

A meta-module in this case is a module which makes a different module path available to the user while taking care of the correct path for the current architecture. An example of such a module is given below. These modules can e.g. reside in folder /sw/easybuild/meta-module/palma. The software in this example case is called palma and the versions correspond the EB toolchain releases, i.e. 2019b.lua, 2020a.lua etc. The CPU_ARCH environment variable which is used here, is exported in a modules.sh script within /etc/profile.d via CPU_ARCH=$(cat /sys/devices/cpu/caps/pmu_name).

help([==[

Description
===========
This is a meta module giving you access to the PALMA 2020a software stack. Software on PALMA is build using the
EasyBuild Python Framework.

Supported CPU Architectures: skylake

More information
================
 - PALMA: https://confluence.uni-muenster.de/display/HPC
 - EasyBuild: https://easybuild.readthedocs.io
]==])

whatis([==[This is a meta module giving you access to the PALMA 2020a software stack. Software on PALMA is build using
 the EasyBuild Python Framework.]==])
whatis([==[PALMA: https://confluence.uni-muenster.de/display/HPC]==])
whatis([==[EasyBuild: https://easybuild.readthedocs.io]==])


local version = "2020a"
local root = "/sw/easybuild/stacks"
local cpu_arch = os.getenv("CPU_ARCH")
local suffix = "/modules/all/Core"
local hostname = subprocess("hostname -s")

-- LmodMessage("cpu_arch = ", cpu_arch)
-- LmodMessage("hostname = ", hostname)

local cpu_arch = os.getenv("CPU_ARCH")
if (string.find(hostname, "^r13n[01-12].*")) then
    cpu_arch = cpu_arch .. "-IB"
end

-- ONLY SKYLAKE HAS THIS TOOLCHAIN AT THE MOMENT
if (cpu_arch ~= "skylake") then
    -- THROWS AN ERROR MESSAGE AND EXIT
    LmodError(version, "IS NOT YET AVAILABLE ON THE ", cpu_arch,  "ARCHITECTURE.")
end

conflict("palma")
prepend_path("MODULEPATH", pathJoin(root, cpu_arch, version, suffix))
-- add_property("lmod",)

Using the correct modules path

We created a modules.sh file inside of /etc/profile.d/ on the compute and login nodes instead of symlinking directly to lmod/init/profile. This deals with some issues Lmod encounters with slurm. In case of an interactive slurm session, it also has to reset $MODULEPATH before updating/reloading the currently loaded modules. Otherwise updating the modules will not unload the old path since we might be on a node with a different architecture than on the login node.

#!/bin/sh
 
# NOTE: In a slurm batch job (non-interactive, non-login) this file is not sourced.
#       Only the file where $BASH_ENV points to is sourced in this case.
 
# CPU ARCH ENV VARIABLE FOR META MODULES
if [ -f "/sys/devices/cpu/caps/pmu_name" ];then
    export CPU_ARCH=`cat /sys/devices/cpu/caps/pmu_name`
fi
 
# Special treatment if we are in an interactive slurm session
if [ ! -z "$SLURM_NODELIST" ];then
  echo "YOU ARE NOW IN AN INTERACTIVE SLURM JOB"
  # Reset module path
  export MODULEPATH=/sw/easybuild/meta-modules/
  # Update the currently loaded modules to account for architecture specific paths
  module update
  # re-source the bash init to gain module autocomplete functionality in an interactive session
  . /opt/lmod/lmod/init/bash >/dev/null
  return
fi
 
# Enable bash module support
. /opt/lmod/lmod/init/profile >/dev/null

Improvements

  • There are still a lot of (low-level) modules for system libraries exposed to the users –> A solution can be using hidden modules
  • Instead of using meta-modules, one can use bind mounts on every node, such that the architecture specific folder points to a generall easybuild install folder (symlinks are know to cause issues)
  • An automated build process for all architectures
  • ‘module spider’ output pointing to toolchain instead of compiler plus MPI library