Difference between revisions of "Admin Guide File System Separation"
m |
|||
Line 1: | Line 1: | ||
− | [[Category:HPC-Admin|File System Separation]] | + | [[Category:HPC-Admin|File System Separation]]<nowiki /> |
− | [[Category:HPC.NRW-Best-Practices|File System Separation]] | + | [[Category:HPC.NRW-Best-Practices|File System Separation]]<nowiki /> |
− | [[Category:File_Systems]] | + | [[Category:File_Systems]]<nowiki /> |
+ | {{DISPLAYTITLE:File System Separation (Admin Guide)}}<nowiki /> | ||
At most HPC sites/clusters there are two separate filesystems available on both gateway servers and all compute nodes: | At most HPC sites/clusters there are two separate filesystems available on both gateway servers and all compute nodes: |
Latest revision as of 19:06, 9 December 2020
At most HPC sites/clusters there are two separate filesystems available on both gateway servers and all compute nodes:
/home
/work
In addition, every compute node provides a separate /scratch
filesystem for local, temporary data storage.
On both, /home
and /work
, disk quotas per user are enabled. Available disk space quota and current quota usage are automatically shown when logging in. At this point, no file quota beyond filesystem limitations exist.
/home
/home
could have, for example, a quota of 32 GiB for user data, but usually its content is backed up on tape such that in case of a filesystem problem the /home
filesystem and its data can be restored. The tape backup is usually done on a daily basis, deleted files are kept for a maximum of 60 days or similar durations.
/home
is usually provided by a redundant infrastructure, e.g., two redundant NFS servers and is hence a network filesystem, but not a parallel filesystem. To avoid excessive load on the /home
filesystem, it can be mounted write-protected on all compute nodes.
/work
/work
has different characteristics: it usually has a much larger quota for user data, but the files are not saved externally. It is provided by several redundant file servers and uses some kind of parallel filesystem. /work
can be read from and written to on both gateway servers and all compute nodes.
Extras
To mimic the filesystem layout of some other faculties a link /home/$USER/nobackup -> /work/$USER
is added in each home directory.
Alternative Setup
This section describes an alternative setup that only uses one file system underneath. The notes sizes are given as examples for a small-sized cluster system: The system can store a total amount of 2 petabytes of (HDD) data which consists of 4 JBODs. The cluster’s filesystem is managed by 4 storage servers with 10 terabytes of SSD metadata each. Two metadata servers are grouped into one bodymirror. Together, both bodymirrors, contain the entire namespace. Conversely, all bodies in a bodymirror contain the same namespace, making the system a bit more fail save. Every user has a storage limit, which can be queried via the BeeGFS interface:
beegfs-ctl --getquota --uid `id -u $USER`
There are four different partitions set up for different purposes:
path | backup | intended usage |
---|---|---|
/work/home/ | daily | scripts, executables, etc. |
/work/TEMP | no | large (temporary) files |
/work/CONF | no | large files |
/work/DATA | daily | many small files (e.g. output data) |
There is no speed difference between the partitions.
ZFS is used and is well known for its data protection, not only because of the copy on write semantics but also due to the snapshot system. This snapshot system is used in the backup process. As shown in the table above, backups are done on a daily basis for the /work/DATA
and /work/home/
directories.
This backup is organized in the following way
- Mirror the HOME, DATA, CONF, and TEMP on the backup server using rsync
rsync --delete
flag is used so the data here is deleted after one day- ZFS-Snapshot is made of DATA and HOME backup on the backup server every day
- DATA and HOME data is copied to the TSM (Tivoli Storage Manager) Taperoboter. This way of making a backup is apparently quite fast since the snapshot is currently done in about 20 seconds and uses way less space. The HOME directory backup on TSM is stored for 6 months.
File transfer/access speed is the same for all partitions. If the user has very large amounts of data that do not necessarily need to be backed up, they are asked to store it in /work/TEMP
as that will reduce the time it takes for the backup to complete. Furthermore, the user is asked to keep the number of files minimal (e.g. by combining many small text files into a single big one or combining different files into one tar-ball for storage), as that will not only increase the speed of scripts working on this data (HDDs have very limited IOPS) but also the general performance of the filesystem.