Admin Guide Lustre Tuning

This article describes tuning options for the Lustre file system with the example of the Noctua cluster of PC².

Lustre File System

The high-performance parallel file system of Noctua is a Lustre File System.

This Lustre file system has three major functional units

one Metadata Server (MDS) with two metadata targets (MDTs)
- stores namespace metadata, such as filenames, directories, Access permissions, and file layout
- stores small files on SSDs to accelerate data access
two Object Storage Servers (OSS), each with two Object Storage Targests (OST)
- each OST manages a single local disc filesystem
Clients (the Noctua nodes) that access the data (read/write)
- Lustre presents all Clients with a unified Namespace for all the files and data in the file System
- allows concurrent and coherent read and write access to the files in the filesystem

Key points

Lustre achieves high Performance through parallelism
- best Performance from multiple Clients writing to multiple OSTs
Lustre is designed to achieve high bandwidth to/from a small number of files
- used as a scratch file System
- good match for scientific datasets and/or checkpoint data
Lustre is not designed to handle large numbers of small files
- potential bottle necks at the MDS when files are opened
- data will not be spread over multiple OSTs
- not a good choice for program compilation

A powerful Lustre utility is lfs. The tool has a built-in help system.

> lfs help
Available commands are:
        setstripe
        getstripe
        setdirstripe
        getdirstripe
        mkdir
        rm_entry
        pool_list
        find
...

Metadata operations are expensive

the stat operations return information on file ownerships, permissions, size, update times etc.
to obtain the file size requires a lookup on the MDS and an enquiry for file size on each OST owning a stripe

therefore

avoid ls -l (like color ls)
avoid file completion in shells
open and fail instead of stat/INQUIRE
don't stripe small files, Lustre check every OST that might own a part of the file
open a file read-only if that is what you will do
use tools optimzied for (aware of) Lustre
- lfs find, lfs df, ...
- stripe-aware tar (star)
avoid to read the same files on many processes, better to read on one process and use MPI communication to move data to other processes
avoid large directories, organize directory structure by processes/clients
open() and seek() if you know the size, otherwise try to organize applications to write from only one process
use the Lustre API in your application (see man lustreapi)

More Tuning tips for Lustre are in Noctua Tuning.

Admin Guide Lustre Tuning

Lustre File System

Navigation menu

Search