Lustre Tuning (Admin Guide)

From HPC Wiki
Admin Guide Lustre Tuning
Jump to: navigation, search

This article describes tuning options for the Lustre file system with the example of the Noctua cluster of PC².

Lustre File System

The high-performance parallel file system of Noctua is a Lustre File System.

This Lustre file system has three major functional units

  • one Metadata Server (MDS) with two metadata targets (MDTs)
    • stores namespace metadata, such as filenames, directories, Access permissions, and file layout
    • stores small files on SSDs to accelerate data access
  • two Object Storage Servers (OSS), each with two Object Storage Targests (OST)
    • each OST manages a single local disc filesystem
  • Clients (the Noctua nodes) that access the data (read/write)
    • Lustre presents all Clients with a unified Namespace for all the files and data in the file System
    • allows concurrent and coherent read and write access to the files in the filesystem

Key points

  • Lustre achieves high Performance through parallelism
    • best Performance from multiple Clients writing to multiple OSTs
  • Lustre is designed to achieve high bandwidth to/from a small number of files
    • used as a scratch file System
    • good match for scientific datasets and/or checkpoint data
  • Lustre is not designed to handle large numbers of small files
    • potential bottle necks at the MDS when files are opened
    • data will not be spread over multiple OSTs
    • not a good choice for program compilation

A powerful Lustre utility is lfs. The tool has a built-in help system.

> lfs help
Available commands are:
        setstripe
        getstripe
        setdirstripe
        getdirstripe
        mkdir
        rm_entry
        pool_list
        find
...

Metadata operations are expensive

  • the stat operations return information on file ownerships, permissions, size, update times etc.
  • to obtain the file size requires a lookup on the MDS and an enquiry for file size on each OST owning a stripe

therefore

  • avoid ls -l (like color ls)
  • avoid file completion in shells
  • open and fail instead of stat/INQUIRE
  • don't stripe small files, Lustre check every OST that might own a part of the file
  • open a file read-only if that is what you will do
  • use tools optimzied for (aware of) Lustre
    • lfs find, lfs df, ...
    • stripe-aware tar (star)
  • avoid to read the same files on many processes, better to read on one process and use MPI communication to move data to other processes
  • avoid large directories, organize directory structure by processes/clients
  • open() and seek() if you know the size, otherwise try to organize applications to write from only one process
  • use the Lustre API in your application (see man lustreapi)


More Tuning tips for Lustre are in Noctua Tuning.