Difference between revisions of "Admin Guide Parameters and Limits"
(Author Roland Pabel) |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:HPC-Admin|Parameters and Limits]]<nowiki /> | ||
+ | [[Category:HPC.NRW-Best-Practices|Parameters and Limits]]<nowiki /> | ||
+ | {{DISPLAYTITLE:Parameters and Limits (Admin Guide)}}<nowiki /> | ||
+ | |||
This article talks about setting some Linux kernel parameters (sysctl values) and user limits specific to clusters. | This article talks about setting some Linux kernel parameters (sysctl values) and user limits specific to clusters. | ||
Latest revision as of 19:14, 9 December 2020
This article talks about setting some Linux kernel parameters (sysctl values) and user limits specific to clusters.
Sysctl-Parameters
Linux normally has very sensible default values when it comes to sysctl parameters (historically found in subdirectories of /proc, but also in /sys these days. There are some parameters that should be adapted for a HPC cluster though, for example anything to do with the retaining of MAC addresses (arp cache) or the maximum number of open TCP/IP connections. Since lots of software, e.g. monitoring, needs an open connection to all nodes (N), these numbers should be set to values of double that number to be safe.
For N=1000
(or N=1024
) we use
# Increase the arp cache size of the kernel
net.ipv4.neigh.default.gc_thresh1 = 4096
net.ipv4.neigh.default.gc_thresh2 = 8192
net.ipv4.neigh.default.gc_thresh3 = 8192
# keep entries longer in arp cache
net.ipv4.neigh.default.base_reachable_time = 86400
net.ipv4.neigh.default.gc_stale_time = 86400
# The maximum number of "backlogged sockets". Value of 2*N recommended for GPFS
net.core.somaxconn = 2000
Settings such as these should be saved to a file in /etc/sysctl.d/
or historically to /etc/sysctl.conf
.
User Limits
Some user limits (number of open processes, etc) should be tweaked according to the rules of the cluster. For example, should core files be generated on compute nodes? The first place to check is the /etc/security/limits.conf
file. To override values, a new file /etc/security/limits.d/99-cluster.conf
can be created, for example:
* soft memlock 4086160 #Allow more Memory Locks for MPI * hard memlock 4086160 #Allow more Memory Locks for MPI * soft nofile 1048576 #Increase the Number of File Descriptors * hard nofile 1048576 #Increase the Number of File Descriptors * soft nproc 1024 #Limit Number of Processes * hard nproc 1024 #Limit Number of Processes * soft stack unlimited #Set soft to hard limit * soft core 4194304 #Allow Core Files