Table of contents:
This is a set of scripts supplementing the btrfs filesystem and aims to automate a few maintenance tasks. This means the scrub, balance, trim or defragmentation.
Each of the tasks can be turned on/off and configured independently. The default config values were selected to fit the default installation profile with btrfs on the root filesystem.
Overall tuning of the default values should give a good balance between effects of the tasks and low impact of other work on the system. If this does not fit your needs, please adjust the settings.
The following sections will describe the tasks in detail. There's one
config option that affects the task concurrency,
BTRFS_ALLOW_CONCURRENCY
. This is to avoid extra high
resource consumption or unexpected interaction among the tasks and will
serialize them in the order they're started by timers.
Description: Scrub operation reads all data and metadata from the devices and verifies the checksums. It's not mandatory, but may point out problems with faulty hardware early as it touches data that might not be in use and bit rot.
If there's a redundancy of data/metadata, ie. the DUP or RAID1/5/6 profiles, scrub is able to repair the data automatically if there's a good copy available.
Impact when active: Intense read operations take place and may slow down or block other filesystem activies, possibly only for short periods.
Tuning:
BTRFS_SCRUB_READ_ONLY
)BTRFS_SCRUB_PRIORITY
)Related commands:
btrfs scrub status /path
btrfs scrub cancel /path
), the progress state is saved
each 5 seconds and next time scrub will start from that pointDescription: The balance command can do a lot of things, in general moves data around in big chunks. Here we use it to reclaim back the space of the underused chunks so it can be allocated again according to current needs.
The point is to prevent some corner cases where it's not possible to eg. allocate new metadata chunks because the whole device space is reserved for all the chunks, although the total space occupied is smaller and the allocation should succeed.
The balance operation needs enough workspace so it can shuffle data
around. By workspace we mean device space that has no filesystem chunks
on it, not to be confused by free space as reported eg. by
df
.
Impact when active: Possibly big. There's a mix of read and write operations, is seek-heavy on rotational devices. This can interfere with other work in case the same set of blocks is affected.
The balance command uses filters to do the work in smaller batches.
Before kernel version 5.2, the impact with quota groups enabled can be extreme. The balance operation performs quota group accounting for every extent being relocated, which can have the impact of stalling the file system for an extended period of time.
Expected result: If possible all the underused
chunks are removed, the value of total
in output of
btrfs fi df /path
should be lower than before. Check the
logs.
The balance command may fail with no space reason but this is considered a minor fault as the internal filesystem layout may prevent the command to find enough workspace. This might be a time for manual inspection of space.
Tuning:
BTRFS_BALANCE_DUSAGE
or
BTRFS_BALANCE_MUSAGE
. Higher value means bigger impact on
your system and becomes very noticeable.Changed defaults since 0.5:
Versions up to 0.4.2 had usage filter set up to 50% for data and up to 30% for metadata. Based on user feedback, the numbers have been reduced to 10% (data) and 5% (metadata). The system load during the balance service will be smaller and the result of space compaction still reasonable. Multiple data chunks filled to less than 10% can be merged into fewer chunks. The file data can change in large volumes, eg. deleting a big file can free a lot of space. If the space is left unused for the given period, it's desirable to make it more compact. Metadata consumption follows a different pattern and reclaiming only the almost unused chunks makes more sense, otherwise there's enough reserved metadata space for operations like reflink or snapshotting.
A convenience script is provided to update the unchanged defaults,
/usr/share/btrfsmaintenance/update-balance-usage-defaults.sh
.
Description: The TRIM operation (aka. discard) can instruct the underlying device to optimize blocks that are not used by the filesystem. This task is performed on-demand by the fstrim utility.
This makes sense for SSD devices or other type of storage that can translate the TRIM action to something useful (eg. thin-provisioned storage).
Impact when active: Should be low, but depends on the amount of blocks being trimmed.
Tuning:
Description: Run defragmentation on configured directories. This is for convenience and not necessary as defragmentation needs are usually different for various types of data.
Please note that the defragmentation process does not descend to
other mount points and nested subvolumes or snapshots. All nested paths
would need to be enumerated in the respective config variable. The
command utilizes find -xdev
, you can use that to verify in
advance which paths will the defragmentation affect.
Special case:
There's a separate defragmentation task that happens automatically and defragments only the RPM database files. This is done via a zypper plugin and the defrag pass triggers at the end of the installation.
This improves reading the RPM databases later, but the installation process fragments the files very quickly so it's not likely to bring a significant speedup here.
There are now two ways how to schedule and run the periodic tasks: cron and systemd timers. Only one can be active on a system and this should be decided at the installation time.
Cron takes care of periodic execution of the scripts, but they can be
run any time directly from /usr/share/btrfsmaintenance/
,
respecting the configured values in
/etc/sysconfig/btrfsmaintenance
.
The changes to configuration file need to be reflected in the
/etc/cron
directories where the scripts are linked for the
given period.
If the period is changed, the cron symlinks have to be refreshed:
systemctl restart btrfsmaintenance-refresh
(or the
rcbtrfsmaintenance-refresh
shortcut)btrfsmaintenance-refresh.path
, this will utilize the file
monitor to detect changes and will run the refreshThere's a set of timer units that run the respective task script. The
periods are configured in the
/etc/sysconfig/btrfsmaintenance
file as well. The timers
have to be installed using a similar way as cron. Please note that the
'.timer' and respective '.service' files have to be installed
so the timers work properly.
The tasks' periods and other parameters should fit most use cases and
do not need to be touched. Review the mount points (variables ending
with _MOUNTPOINTS
) whether you want to run the tasks there
or not.
Currently the support for widely used distros is present. More distros can be added. This section describes how the pieces are put together and should give some overview.
For debian based systems, run dist-install.sh
as
root.
For non-debian based systems, check for distro provided package or do manual installation of files as described below.
btrfs-*.sh
task scripts are expected at
/usr/share/btrfsmaintenance
sysconfig.btrfsmaintenance
configuration template is
put to:/etc/sysconfig/btrfsmaintenance
on SUSE and RedHat
based systems or derivatives/etc/default/btrfsmaintenance
on Debian and
derivatives/usr/lib/zypp/plugins/commit/btrfs-defrag-plugin.sh
or
/usr/lib/zypp/plugins/commit/btrfs-defrag-plugin.py
post-update script for zypper (the package manager), applies to
SUSE-based distros for nowThe defrag plugin has a shell and python implementation, choose what suits the installation better.
The periodic execution of the tasks is done by the 'cron' service.
Symlinks to the task scripts are located in the respective directories
in /etc/cron.<PERIOD>
.
The script btrfsmaintenance-refresh-cron.sh
will
synchronize the symlinks according to the configuration files. This can
be called automatically by a GUI configuration tool if it's capable of
running post-change scripts or services. In that case there's
btrfsmaintenance-refresh.service
systemd service.
This service can also be automatically started upon any modification
of the configuration file in
/etc/sysconfig/btrfsmaintenance
by installing the
btrfsmaintenance-refresh.path
systemd watcher.
The package database files tend to be updated in a random way and get
fragmented, which particularly hurts on btrfs. For rpm-based distros
this means files in /var/lib/rpm
. The script or plugin
simply runs a defragmentation on the affected files. See
btrfs-defrag-plugin.sh
or
btrfs-defrag-plugin.py
for more details.
At the moment the 'zypper' package manager plugin exists. As the package managers differ significantly, there's no single plugin/script to do that.
The settings are copied to the expected system location from the
template (sysconfig.btrfsmaintenance
). This is a shell
script and can be sourced to obtain values of the variables.
The template contains descriptions of the variables, default and possible values and can be deployed without changes (expecting the root filesystem to be btrfs).
There are various tools and handwritten scripts to manage periodic snapshots and cleaning. The common problem is tuning the retention policy constrained by the filesystem size and not running out of space.
This section will describe factors that affect that, using snapper as an example, but adapting to other tools should be straightforward.
Snapper is a tool to manage snapshots of btrfs subvolumes. It can create snapshots of given subvolume manually, periodically or in a pre/post way for a given command. It can be configured to retain existing snapshots according to time-based settings. As the retention policy can be very different for various use cases, we need to be able to find matching settings.
The settings should satisfy user's expectation about storing previous copies of the subvolume but not taking too much space. In an extreme, consuming the whole filesystem space and preventing some operations to finish.
In order to avoid such situations, the snapper settings should be tuned according to the expected use case and filesystem size.
Default settings of snapper on default root partition size can easily lead to no-space conditions (all TIMELINE values set to 10). Frequent system updates make it happen earlier, but this also affects long-term use.
Each will be explained below.
The way how the files are changed affects the space consumption. When a new data overwrite existing, the new data will be pinned by the following snapshot, while the original data will belong to previous snapshot. This means that the allocated file blocks are freed after the last snapshot pointing to them is gone.
The administrator/user is supposed to know the approximate use of the partition with snapshots enabled.
The decision criteria for tuning is space consumption and we're optimizing to maximize retention without running out of space.
All the factors are intertwined and we cannot give definite answers but rather describe the tendencies.
automatic: if turned on with the
TIMELINE
config option, the periodic snapshots are taken
hourly. The daily/weekly/monthly/yearly periods will keep the first
hourly snapshot in the given period.
at package update: package manager with snapper support will create pre/post snapshots before/after an update happens.
manual: the user can create a snapshot manually
with snapper create
, with a given snapshot type (ie.
single, pre, post).
This is a parameter hard to predict and calculate. We work with rough estimates, eg. megabytes, gigabytes etc.
The user is supposed to know possible needs of recovery or examination of previous file copies stored in snapshots.
It's not recommended to keep too old snapshots, eg. monthly or even yearly if there's no apparent need for that. The yearly snapshots should not substitute backups, as they reside on the same partition and cannot be used for recovery.
Bigger filesystem allows for longer retention, higher frequency updates and amount of data changes.
As an example of a system root partition, the recommended size is 30 GiB, but 50 GiB is selected by the installer if the snapshots are turned on.
For non-system partition it is recommended to watch remaining free
space. Although getting an accurate value on btrfs is tricky, due to
shared extents and snapshots, the output of df
gives a
rough idea. Low space, like under a few gigabytes is more likely to lead
to no-space conditions, so it's a good time to delete old snapshots or
review the snapper settings.
Suggested values:
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="5"
TIMELINE_LIMIT_WEEKLY="2"
TIMELINE_LIMIT_MONTHLY="1"
TIMELINE_LIMIT_YEARLY="0"
The size of root partition should be at least 30GiB, but more is better.
Most data changes come probably from the package updates, in the range of hundreds of megabytes per update.
Suggested values:
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="4"
TIMELINE_LIMIT_MONTHLY="6"
TIMELINE_LIMIT_YEARLY="1"
Suggested values:
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="4"
TIMELINE_LIMIT_MONTHLY="6"
TIMELINE_LIMIT_YEARLY="0"
Note, that deleting a big file that has been snapshotted will not free the space until all relevant snapshots are deleted.
Examples:
Not possible to suggest config numbers as it really depends on user expectations. Keeping a few hourly snapshots should not consume too much space and provides a copy of files, eg. to restore after accidental deletion.
Starting point:
TIMELINE_LIMIT_HOURLY="12"
TIMELINE_LIMIT_DAILY="7"
TIMELINE_LIMIT_WEEKLY="1"
TIMELINE_LIMIT_MONTHLY="0"
TIMELINE_LIMIT_YEARLY="0"
Type | Hourly | Daily | Weekly | Monthly | Yearly |
---|---|---|---|---|---|
Rolling | 12 | 5 | 2 | 1 | 0 |
Regular | 12 | 7 | 4 | 6 | 1 |
Big files | 12 | 7 | 4 | 6 | 0 |
Mixed | 12 | 7 | 1 | 0 | 0 |
The goal of this project is to help administering btrfs filesystems. It is not supposed to be distribution specific. Common scripts/configs are preferred but per-distro exceptions will be added when necessary.
License: GPL 2