Difference between revisions of "Admin Guide Compute Node Deployment"
(Article by Roland Pabel) |
|||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:HPC-Admin|Compute Node Deployment]]<nowiki /> | ||
+ | [[Category:HPC.NRW-Best-Practices|Compute Node Deployment]]<nowiki /> | ||
+ | [[Category:Warewulf]]<nowiki /> | ||
+ | [[Category:xCAT]]<nowiki /> | ||
+ | [[Category:luna]]<nowiki /> | ||
+ | {{DISPLAYTITLE:Compute Node Deployment (Admin Guide)}}<nowiki /> | ||
+ | |||
= Deployment Software = | = Deployment Software = | ||
Latest revision as of 18:50, 9 December 2020
Deployment Software
There are lots of software tools to install a node. This article just tries to give a small overview:
Warewulf
If you use OpenHPC the first deploy is usually done using warewulf
. How to use it is explained in great detail in the Installation Recipes. Warewulf works fine for what it promises, but there are a few drawbacks (last I checked):
- Warewulf brings its own version of
ipmitool
which does not supportlanplus
protocol (onlylan
) which is needed for many Dell servers. This can be remedied by deleting the/usr/libexec/warewulf/ipmitool
binary and setting a symlink to the OS/usr/bin/ipmitool
. - The installation environment is pretty limited, for example it is not possible to create
LVM
volumes during installation so one is limited to 7 partitions (which is sometimes not enough). - warewulf uses an
apache
webserver to copy the/etc/{passwd,shadow}
files to the compute nodes without any kind of authentication of the client (only the MAC address is used to identify the client). This is a huge security concern.
xCAT
Another option is xCAT (Extreme Cloud Administration Toolkit, https://xcat-docs.readthedocs.io/en/stable/index.html). There are many talks and websites about it and while it is very powerful it is also a complex piece of software. Especially tuning it correctly to deploy many nodes quickly is said to be quite difficult.
luna
In the Unix tradition, luna
is a software that pretty much only does one thing, but does it very well: Install nodes. It’s a project of a former ClusterVision employee and hosted on github, licensed under GPLv3. The software itself just binds together a database and some other open source software (mainly dracut
) using a few python
and C
programs. One can easily look throug the while code in an afternoon. The installation is very quick with low load on the servers because all clients during installation run a small bittorrent
client which is used to distribute the image data among all nodes.
Notes about Deployment
Some thoughts about deployment:
In this age, images to be deployed should be created on disk (for example using
yum --installroot
) rather than installing a “golden” node using kickstart from which is later than read a complete hard disk image.System Users (UID) and Group IDs (GID), i.e. anything below 500, should not be left to chance. Many UIDs and GIDs are fixed but there are still many dynamic ones. Before starting to install packages into an image directory,
/etc/passwd
and other files with all expected user and group entries should be created there. The files/etc/passwd
,/etc/group
,/etc/shadow
,/etc/gshadow
haveto be created with suitable content.We prefer not to have to run a configuration software like Ansible or others over all our compute nodes but prefer to have the nodes “self-configurable”. Our nodes use a naming scheme that contains location data like Rack, Chassis, and Node Numbers. This data is taken from the hostname and used on startup by
cron
scripts to self-configure the node, i.e. write some configuration files.