Garbage Collector


The little space of a writer, tinkerer, and a coffee addict

Linux Explained part 3 : The Filesystems

Linux Explained part 3 : The Filesystems
Tux, the Linux mascot

In this third entry of our Linux Explained series, we’re going to see at a deeper level the filesystem management that have been mentioned previously.

The first part will be a general description of what is a filesystem. Then, we will see describe how Linux organizes and uses its filesystem.

What is a filesystem ?

A filesystem, or FS, is the method an operating system uses to structure the data on a storage device (hard drive, SSD, USB flash stick, whatever). It’s a logical view of the physical storage that describes how the disk is organized and how and where the various data pieces are stored and identified to compose a consistent file. Indeed, when a file is written on disk, the content is actually cut into several parts written in blocks. Your file is like a series of books stored on a shelf. In this metaphor, the shelf would be the file and the books its fragments. The blocks size may vary according to the filesystem format settings.

In a more simple words, the filesystem is the mapping chart of the physical storage for the operating system. Here is a very simple diagram to explain this :

filesystem The files description is stored into an allocation table, containing the address of the blocks that constitute the data pieces physically written on the device.

Ever heard about “formatting a disk” ? For the end-user, this action usually means “erasing the content”. Actually that’s a side effect because the formatting process will commonly rewrite the file allocation table, delete the files metadata, and throw back the related sectors into the “free sectors pool”. In a nutshell, the disk “forgets” its contents. But the data pieces on the disk may still remain until they are overwritten by new ones. This is how the recovery tools work when you try to recover a file you’ve erased by mistake : they usually inspect the sectors for orphaned data pieces and try to reconstruct the file.

There are various filesystems type existing. This diversity is mostly related to the diversity of storage types : optical, tapes, database filesystems, network filesystem, etc. One very well known is the FAT - File Allocation Table -, created in 1977 for floppy disks and adapted for hard drives, it was also the MS-DOS default filesystem, also for Microsoft Windows, until NTFS replaced it. One of the biggest advantage for the FAT filesystem (mostly in this exFAT and FAT32 version) is that it became a de facto standard and is used in almost every device that uses external storage. For instance, my camera formats its memory cards in exFAT. Despite being patented by Microsoft, the exFAT and FAT specifications has finally been publicly released by the company in 2019.

On Linux, the most common filesystems type today are ext4 (preceded by ext2 and ext3), xfs or btrfs. Each of them has its specificity and purpose :

ext4, xfs and Btrfs have all a common thing : they’re journaling filesystems (NTFS too, but it’s a Windows native FS, not Linux). In a nutshell, a journaling filesystem keeps tracks of the changes not yet committed into a “journal”, a circular log file. In case of a failure event (power failure, system crash, etc), these filesystems can be brought online faster and are more resilient to the risk of corruption. Basically, the journal stores the write events before actually executing them on the disk, preventing the risk of orphaned data blocks or storage leak, meaning the free sectors are not released into the pool for being reused. The journaling filesystem has been created by IBM with AIX 3.1’s JFS in 1990. Microsoft implemented it in its Windows NT’s NTFS in 1993. For Linux, the journaling FS arrived in 2001 with ReiserFS and ext3.

The filesystem structure and mount concept

Linux’s filesystem organization is standardized in the Filesystem Hierarchy Standard specification. Inherited from the Unix Filesystem, the structure is very similar. On Microsoft Windows, a filesystem is represented by a letter, usually the C: drive for the main one, and D: for the next one, which may be an optical drive, etc. On Unix and Linux, this notion does not exists.

The Filesystem hierarchy

On Linux, the primary hierarchy level is the root, represented by the / character. There is no logical separation like Windows do with its letters. On Linux, the various filesystems present in the partition table are mounted in subsequent levels of the root folder. All files and folders are stored after the root level, even if they are physically on another device.

Here is the content of my root folder, /, displayed by the command tree :

$ tree -d -L 1 /
/.
β”œβ”€β”€ afs
β”œβ”€β”€ bin -> usr/bin
β”œβ”€β”€ boot
β”œβ”€β”€ dev
β”œβ”€β”€ etc
β”œβ”€β”€ home
β”œβ”€β”€ lib -> usr/lib
β”œβ”€β”€ lib64 -> usr/lib64
β”œβ”€β”€ lost+found
β”œβ”€β”€ media
β”œβ”€β”€ misc
β”œβ”€β”€ mnt
β”œβ”€β”€ net
β”œβ”€β”€ opt
β”œβ”€β”€ proc
β”œβ”€β”€ root
β”œβ”€β”€ run
β”œβ”€β”€ sbin -> usr/sbin
β”œβ”€β”€ snap -> /var/lib/snapd/snap
β”œβ”€β”€ srv
β”œβ”€β”€ sys
β”œβ”€β”€ tmp
β”œβ”€β”€ usr
└── var

Let’s explain some of these folders :

Directory Description
/ The first level of the folder hierarchy. Everything is under it.
/bin This folder contains the essential binaries, or executable programs, required for the single-user mode, the mode without the network layers enabled commonly used for administrative and recovery tasks (We’ve seen that in the Part 2’s Runlevels sections)
/boot This is the place where the bootloader is installed.
/dev This folder contains the devices files, a description of the physical devices on the system, and also logical ones. For example, you can invoke /dev/random to display a random number, or redirect something to /dev/null to discard it immediately. Also your storage is represented in a way such as /dev/sda1, etc.
/etc This folder contains the host-specific configurations files. Here, you may find for example the /etc/hosts file that associates an IP address to a host name, or /etc/fedora-release (adapt it to any distro) that contains the release version of your distribution, or /etc/hostname that contains the computer’s name.
/home This is the home folder of the users of the system. Unless other statement during the creation, a user will always have its personal files stored into a subfolder named after the username in this directory. (ex : /home/seb). A shortcut to this folder exists : the character ~. Typing cd ~ is equivalent to typing cd /home/seb for example. Additionally, invoking cd without args will also redirect to the home folder.
/lib Contains the essential libraries for the binaries. Libraries are specific components included during the coding of a binary to provide a specific feature, a collection of functions, dataset definitions, etc. For example, the openssl lib will provide the TLS encryption functions that can be used by a program or libyaml will provide the functions for YAML data structure manipulation.
/media This folder is a mount point for the removable medias like CD-ROM and USB flash storage. A directory in/run can also be used for temporary filesystems mount points.
/mnt This folder is also for mount points, for temporary ones too.
/opt This folder usually stores the application software packages, which are not especially shipped with the distribution by default.
/proc This virtual filesystem stores the process and kernel informations files. You may find there files named after the Process ID of a running program and various informations inside the kernel uses too.
/root The only by-design exception of the /home, /root is the root user home folder. We never talked about the root user : it’s the super admin of a Linux system. We will address this point in the article dedicated to the users management.
/run A folder containing run-time data like variables, currently logged-in users, running daemons…
/sbin The essential system binaries for the single-user mode. It’s like the /bin, but the commands available there are only addressed to power users or the system admin (root).
/srv A folder reserved for server data, like a web server or FTP.
/tmp A folder, which is now a filesystem stored in RAM, used for temporary files. It’s oftenly not-preserved on reboot, you may only store here disposable files.
/usr A secondary hierarchy structure that reproduces the / structure. It contains the majority of the multi-users commands, utilities and applications. If you look back at the directories lists I’ve displayed above, you may notice that /bin and that /sbin are directing to /usr/bin and /usr/sbin. It’s usage and integration may vary across the Linux distrubtions.
/var Variable files, may content caches, logs, locks files, etc. It can be seen like the /tmp as a non-critical file storage, but this one is preserved are reboot.

What is mounting a filesystem ?

So far, we have seen that a filesystem is how the storage is seen and sorted by an operating system, and how it structures it. But in order to be exploited, a filesystem needs to be mounted. Mounting a filesystem is associating a storage device’s partition to a path the system will be able to access and use the data stored on it.

The physical device has a partition table described how its content is organized : which type of filesystem, if it’s a bootable filesystem, etc. When you plug an USB flash drive to your computer, the Kernel will detects a new device. The device is detected as a file storage and the related driver takes it in charge. Then, the Kernel associates the filesystem present on the removable drive to a mount point (usually /mnt/the-partition-name or /run/media/user/partition-name).

For example, when I insert my camera SD Card, the system will mount it like this :

/dev/sdc1 on /run/media/seb/NIKON Z 5 type exfat 

To compare with the Windows world, my SD Card would have been mounted with a D:\ letter. In the Linux world, the SD Card is mounted in /run/media/seb/NIKON Z 5, a directory path. Why ? Because as we said above, in the Unix filesystem, each file and folder is below the root top level hierarchy.

Another example. My computer has two drives : one for the system, one for the data. If I go in /media and I use the command df . to display the remaining storage capacity of the current path, I’ll see I’m on the root filesystem.

$ cd /media
$ df . 
Filesystem               1K-blocks    Used   Available Use% Mounted on
/dev/mapper/fedora-root  71670904 58155208   9829296   86%  /

Now, I move to the data/ folder which is actually the mountpoint of my secondary drive. The filesystem has changed, the amount of storage also (yes, I will do some cleanup one day).

$ cd /media/data
$ df .
Filesystem     1K-blocks      Used Available Use% Mounted on
/dev/sda       960303848 885484368  25964968  98% /media/data

If I do the same for my NAS, a storage available from my network, mounted on /media/nas :

$ cd /media/nas
$ df . 
Filesystem                1K-blocks       Used      Available Use% Mounted on
192.168.1.101:/nas    5814086144 4873881088 647169024  89% /media/nas

Let’s go a little further with the filesystem mountpoint concept. Usually, the root filesystem contains /boot, /etc, /bin, etc. But, let’s admit I have a 250GB disk for the system, and I want to use a 1TB disk for the /home. That’s not a problem, the root filesystem could be installed on the 250GB disk, and the /home folder will be mounted from the partition on the 1TB disk. That’s a common setup to preserve the content of the /home directory in case of a system reinstallation : you can avoid formatting the filesystem hosting the /home data in the process.

When we seen the startup sequence of a Linux-based operating system, one step was “Mounting the filesystems”. Linux don’t guess it miraculously, they’re defined into a system file : /etc/fstab. This file contains the name of the physical partition, the mountpoint, and the options to apply to each of the entries defined here. Mount the filesystems during a Linux boot sequence means : mounting the /etc/fstab content.

Example of a fstab file content :

/dev/mapper/fedora-root /                       ext4    defaults        1 1
UUID=0feb407e-cc33-4718-98b7-f9e7362968c3 /boot                   ext4    defaults        1 2

Each filesystem defined in /etc/fstab can be quickly mounter by the mount command.

For example, if I have the following line in /etc/fstab :

/dev/sdc1 /media/somefilesystem ext4 defaults 1 1

Using mount /media/somefilesystem will mount it according to this entry defined in /etc/fstab. There is also the autofs module that is able to automatically mount a filesystem defined in another set of configuration files when going into its mountpoint path.

On Linux servers, we usually use the Logical Volume Management which is an flexible abstractive filesystem that present a consistent storage volume composed of several physical disks. It’s main asset is that you can extend the filesystem capacity by adding physical disks to the array and allocating the capacity to the filesystem, it’s very used in virtualization (in this case, the disks are virtual). On a Server’s filesystem, the /var, /opt and /srv would be mounted from LVM filesystems than can be extended it the storage capacity gets too small during the ramp-up of a service for example.

So far…

Following this article, you should have a clearer view about how an operating system organizes its storage volumes and how Linux sorts the data on the filesystems according to a standard hierarchy. The next part will explain how Linux manages the permission for the users to access file and directories.


πŸ“‘ Table of Contents

πŸ“š Read my books

Follow me on Mastodon

🏷️ All Tags πŸ“„ All Posts πŸ—Ί Sitemap RSS Feed