Parameters and Limits (Admin Guide)
This article talks about setting some Linux kernel parameters (sysctl values) and user limits specific to clusters.
Linux normally has very sensible default values when it comes to sysctl parameters (historically found in subdirectories of /proc, but also in /sys these days. There are some parameters that should be adapted for a HPC cluster though, for example anything to do with the retaining of MAC addresses (arp cache) or the maximum number of open TCP/IP connections. Since lots of software, e.g. monitoring, needs an open connection to all nodes (N), these numbers should be set to values of double that number to be safe.
N=1024) we use
# Increase the arp cache size of the kernel net.ipv4.neigh.default.gc_thresh1 = 4096 net.ipv4.neigh.default.gc_thresh2 = 8192 net.ipv4.neigh.default.gc_thresh3 = 8192 # keep entries longer in arp cache net.ipv4.neigh.default.base_reachable_time = 86400 net.ipv4.neigh.default.gc_stale_time = 86400 # The maximum number of "backlogged sockets". Value of 2*N recommended for GPFS net.core.somaxconn = 2000
Settings such as these should be saved to a file in
/etc/sysctl.d/ or historically to
Some user limits (number of open processes, etc) should be tweaked according to the rules of the cluster. For example, should core files be generated on compute nodes? The first place to check is the
/etc/security/limits.conf file. To override values, a new file
/etc/security/limits.d/99-cluster.conf can be created, for example:
* soft memlock 4086160 #Allow more Memory Locks for MPI * hard memlock 4086160 #Allow more Memory Locks for MPI * soft nofile 1048576 #Increase the Number of File Descriptors * hard nofile 1048576 #Increase the Number of File Descriptors * soft nproc 1024 #Limit Number of Processes * hard nproc 1024 #Limit Number of Processes * soft stack unlimited #Set soft to hard limit * soft core 4194304 #Allow Core Files