Lustre Tuning (Admin Guide)

From HPC Wiki
Admin Guide Lustre Tuning
Jump to navigation Jump to search

This article describes tuning options for the Lustre file system with the example of the Noctua cluster of PC².

Lustre File System

The high-performance parallel file system of Noctua is a Lustre File System.

This Lustre file system has three major functional units

  • one Metadata Server (MDS) with two metadata targets (MDTs)
    • stores namespace metadata, such as filenames, directories, Access permissions, and file layout
    • stores small files on SSDs to accelerate data access
  • two Object Storage Servers (OSS), each with two Object Storage Targests (OST)
    • each OST manages a single local disc filesystem
  • Clients (the Noctua nodes) that access the data (read/write)
    • Lustre presents all Clients with a unified Namespace for all the files and data in the file System
    • allows concurrent and coherent read and write access to the files in the filesystem

Key points

  • Lustre achieves high Performance through parallelism
    • best Performance from multiple Clients writing to multiple OSTs
  • Lustre is designed to achieve high bandwidth to/from a small number of files
    • used as a scratch file System
    • good match for scientific datasets and/or checkpoint data
  • Lustre is not designed to handle large numbers of small files
    • potential bottle necks at the MDS when files are opened
    • data will not be spread over multiple OSTs
    • not a good choice for program compilation

A powerful Lustre utility is lfs. The tool has a built-in help system.

> lfs help
Available commands are:
        setstripe
        getstripe
        setdirstripe
        getdirstripe
        mkdir
        rm_entry
        pool_list
        find
...

Metadata operations are expensive

  • the stat operations return information on file ownerships, permissions, size, update times etc.
  • to obtain the file size requires a lookup on the MDS and an enquiry for file size on each OST owning a stripe

therefore

  • avoid ls -l (like color ls)
  • avoid file completion in shells
  • open and fail instead of stat/INQUIRE
  • don't stripe small files, Lustre check every OST that might own a part of the file
  • open a file read-only if that is what you will do
  • use tools optimzied for (aware of) Lustre
    • lfs find, lfs df, ...
    • stripe-aware tar (star)
  • avoid to read the same files on many processes, better to read on one process and use MPI communication to move data to other processes
  • avoid large directories, organize directory structure by processes/clients
  • open() and seek() if you know the size, otherwise try to organize applications to write from only one process
  • use the Lustre API in your application (see man lustreapi)


More Tuning tips for Lustre are in Noctua Tuning.