Concepts
Sample size
One sample is one logical offset chosen at random by btdu. Because we
know the total size of the filesystem, we can divide this size by the
total number of samples to obtain the approximate size of how much data
one sample represents. (This size is also shown at the bottom as
"Resolution".)
Confidence
For the represented and exclusive size, btdu displays a confidence
range, e.g.:
- Represented size: ~763.0 GiB (6006 samples), ±16.9 GiB
This should be interpreted as: given the data btdu collected so far,
it is confident with
95% certainty that the object size is within 16.9 GiB of 763.0
GiB.
Logical vs. physical space
Quoting On-disk
format:
Btrfs makes a distinction between logical and physical addresses.
Logical addresses are used in the filesystem structures, while physical
addresses are simply byte offsets on a disk. One logical address may
correspond to physical addresses on any number of disks, depending on
RAID settings.
In this regard, btdu has two modes of operation:
- In logical space mode, btdu samples the logical offset space. As
such, a 1GB file (containing unique uncompressed unshared data) will
show up with a size of 1GB, regardless of whether it is stored in a
SINGLE, DUP, or RAID1 profile block group.
- In physical space mode, btdu samples offsets from the underlying
block devices, translating each to a logical offset first. The file in
the example above will thus show up with a size of 2GB if it is stored
on a block group using the RAID1 or DUP profiles.
In physical space mode, btdu will also show unallocated space
(represented as an <UNALLOCATED> node in the
hierarchy root) and any device slack (represented as a
<SLACK> node).
Logical space mode is the default. To use physical space mode, run
btdu with --physical (-p).
Representative location
After picking a logical offset to sample, btdu asks btrfs what is
located at that offset. btrfs replies with zero or more locations. Out
of these locations, btdu picks one location where it should place the
sample within its tree, to represent the space occupied by this
data. We call this location the representative location.
The way in which btdu selects the representative location aims to
prefer better visualization of what the data is used for, i.e., the
simplest explanation for what is using this disk space. For instance, if
one location's filesystem path is longer than the other, then the
shorter is chosen, as the longer is more likely to point at a snapshot
or other redundant clone of the shorter one.
Examples:
- For data which is used exactly once, the representative location
will be the path to the file which references that data.
- For data which is used in
/@root/file.txt and
/@root-20210203/file.txt, the representative location will
be /@root/file.txt, because it is shorter.
- For data which is used in
/@root/file1.txt and
/@root/file2.txt, the representative location will be
/@root/file1.txt, because it is lexicographically
smaller.
Size metrics
In --expert mode, btdu shows four size metrics for tree
nodes:
Represented size
- The represented size of a node is the amount of disk space that this
path is representing.
- For every logical offset, btdu picks one representative location out of all
locations that reference that logical offset, and assigns the sample's
respective disk space usage to that location.
- This location is thus chosen to represent this disk space.
So, if a directory's represented size is 1MiB, we can say that this
directory is the simplest explanation for what is using that 1MiB of
space.
- This metric is most useful in understanding what is using up disk
space on a btrfs filesystem, and is what's shown in the btdu directory
listings.
- The represented size of a directory is the sum of represented sizes
of its children.
- Adding up the represented size for all filesystem objects (btdu tree
leaves) adds up to the total size of the filesystem.
Distributed size
- To calculate the distributed size, btdu evenly distributes
a sample's respective disk space usage across all locations which
reference data from that logical offset.
- Thus, two 1MiB files which share the same 1MiB of data will each
have a distributed size of 512KiB.
- The distributed size of a directory is the sum of distributed sizes
of its children.
- Adding up the distributed size for all filesystem objects (btdu tree
leaves) also adds up to the total size of the filesystem.
Exclusive size
- The exclusive size represents the samples which are used
only by this file or directory.
- Specifically, btdu awards exclusive size to the common
prefix of all paths which reference data from a given logical
offset.
- Two files which are perfect clones of each other will thus both have
an exclusive size of zero. The same applies to two identical
snapshots.
- However, if the two clones are in the same directory, and the data
is not used anywhere else, then that data will be represented in the
directory's exclusive size.
- The exclusive size can also be described as the amount of space
which would be freed if the corresponding object were to be
deleted.
- Unlike other size metrics, adding up the exclusive size of all items
in a directory may not necessarily add up to the exclusive size of the
directory.
Shared size
- The shared size is the total size including all references of a
single logical offset at this location.
- This size generally correlates with the "visible" size, i.e. the
size reported by classic space usage analysis tools, such as
du. (However, if compression is used, the shown size will
still be after compression.)
- The shared size of a directory is the sum of shared sizes of its
children.
- The total shared size will likely exceed the total size of the
filesystem, if snapshots or reflinking is used.
As an illustration, consider a file consisting of unique data
(dd if=/dev/urandom of=a bs=1M count=1):

Here is what happens if we clone the file
(cp --reflink=always a b):

Finally, here is what the sizes would look like for two 2M files
which share 1M. Note how the represented size adds up to 3M, the total
size of the underlying data.
