0.0(0)

Filesystems

Introduction to Filesystems

In Linux (and all UNIX-like operating systems) it is often said “Everything is a file”, or at least it is treated as such

This means whether you are dealing with normal data files and documents, or with devices such as sound cards and printers, you interact with them through the same kind of Input/Output (I/O) operations
- This simplifies things: you open a “file” and perform normal operations like reading the file and writing on it

On many systems (including Linux), the filesystem is structured like a tree

The tree is usually portrayed as inverted, and starts at what is most often called the root directory, which marks the beginning of the hierarchical filesystem and is also sometimes referred to as the trunk, or simply denoted by /
- The root directory is not the same as the root user
- The hierarchical filesystem also contains other elements in the path (directory names), which are separated by forward slashes (/)

Filesystem Varieties

Linux supports a number of native filesystem types, expressly created by Linux developers, such as:

ext3
ext4
squashfs
btrfs

It also offers implementations of filesystems used on other alien operating systems, such as those from:

Windows (ntfs, vfat)
SGI (xfs)
IBM (jfs)
MacOS (hfs, hfs+)

Many older, legacy filesystems, such as FAT, are also supported

It is often the case that more than one filesystem type is used on a machine, based on considerations such as the size of files, how often they are modified, what kind of hardware they sit on and what kind of access speed is needed, etc.

The most advanced filesystem types in common use are the journaling varieties: ext4, xfs, btrfs, and jfs (These have many state-of-the-art features and high performance, and are very hard to corrupt accidentally)

Linux Partitions

Each filesystem on a Linux system occupies a disk partition

Partitions help to organize the contents of disks according to the kind and use of the data contained
- ex: Important programs required to run the system are often kept on a separate partition (known as root or /) than the one that contains files owned by regular users of that system (/home*)*
- In addition, temporary files created and destroyed during the normal operation of Linux may be located on dedicated partitions
- One advantage of this kind of isolation by type and variability is that when all available space on a particular partition is exhausted, the system may still operate normally

Mount Points

Before you can start using a filesystem, you need to mount it on the filesystem tree at a mount point (This is simply a directory, which may or may not is empty, where the filesystem is to be grafted on - Sometimes, you may need to create the directory if it does not already exist)

WARNING*: If you mount a filesystem on a non-empty directory, the former contents of that directory are covered-up and not accessible until the filesystem is unmounted - Thus, mount points are usually empty directories*

Mounting and Unmounting

The mount command is used to attach a filesystem (which can be local to the computer or on a network) somewhere within the filesystem tree

The basic arguments are the device node and mount point
- ex: $ sudo mount /dev/sda5 /home will attach the filesystem contained in the disk partition associated with the /dev/sda5 device node, into the filesystem tree at the /home mount point
There are also other ways to specify the partition other than the device node, such as using the disk label or UUID
The mount command without options and arguments will list all of the currently mounted filesystems, along with information that will indicate whether they are mounted as read-only or writable
The command df -Th (disk-free) will display information about mounted filesystems, including the filesystem type, and usage statistics about currently used and available space

The umount command is used to unmount the partition

Only a root user (logged in as root, or using sudo) has the privilege to run these commands unless the system has been otherwise configured

If you want it to be automatically available every time the system starts up, you need to edit /etc/fstab accordingly (the name is short for filesystem table)

Looking at this file will show you the configuration of all pre-configured filesystems. man fstab will display how this file is used and how to configure it
/etc/fstab shows which filesystems will be automatically mounted when the system is brought up into multi-user mode

NFS and Network Filesystems

It is often necessary to share data across physical systems which may be either in the same location or anywhere that can be reached by the Internet

A network (also sometimes called distributed) filesystem may have all its data on one machine or have it spread out on more than one network node
- A variety of different filesystems can be used locally on individual machines; a network filesystem can be thought of as a grouping of lower-level filesystems of varying types

Many system administrators mount remote users' home directories on a server in order to give them access to the same files and configuration files across multiple client systems

This allows the users to log in to different computers, yet still, have access to the same files and resources
- The most common such filesystem is named simply NFS (the Network Filesystem)
  - It has a very long history and was first developed by Sun Microsystems
- Another common implementation is CIFS (also termed SAMBA), which has Microsoft roots

NFS on the Server

On the server machine, NFS uses daemons (built-in networking and service processes in Linux) and other system servers are started at the command line by typing: $ sudo systemctl start nfs

NOTE*: On RHEL/CentOS 8****,*** the service is called nfs-server*, not* nfs*.*

The text file /etc/exports contain the directories and permissions that a host is willing to share with other systems over NFS. A very simple entry in this file may look like the following: */projects .example.com(rw)

This entry allows the directory /projects to be mounted using NFS with read and write (rw) permissions (Every file in Linux has three possible permissions: read (r), write (w) and execute (x)) and shared with other hosts in the example. com domain
After modifying the /etc/exports file, you can type exportfs -av to notify Linux about the directories you are allowing to be remotely mounted using NFS
You can also restart NFS with sudo systemctl restart nfs, but this is heavier, as it halts NFS for a short while before starting it up again
To make sure the NFS service starts whenever the system is booted, issue sudo systemctl enable nfs

NFS on the Client

On the client machine, if it is desired to have the remote filesystem mounted automatically upon system boot, /etc/fstab is modified to accomplish this

ex: An entry in the client's /etc/fstab might look like the following: servername:/projects /mnt/nfs/projects nfs defaults 0 0
You can also mount the remote filesystem without a reboot or as a one-time mount by directly using the mount command: $ sudo mount servername:/projects /mnt/nfs/projects

Remember, if /etc/fstab is not modified, this remote mount will not be present the next time the system is restarted

Furthermore, you may want to use the nofail option in fstab in case the NFS server is not live at boot

Filesystem Architecture

Overview of Home User Directories

Each user has a home directory (Usually placed under /home)
The /root directory on modern Linux systems is no more than the home directory of the root user/superuser/system administrator account
On a multi-user system, the /home directory infrastructure is often mounted as a separate filesystem on its own partition, or even exported (shared) remotely on a network through NFS
- Sometimes, you may group users based on their department or function by creating subdirectories under the /home directory of each of these groups (ex: /home/students/, /home/staff/, etc.)

/bin and /sbin Directories

The /bin directory contains executable binaries, essential commands used to boot the system or in single-user mode, and essential commands required by all system users (such as cat, cp, ls, mv, ps, rm, etc.)

The /sbin directory is intended for essential binaries related to system administration (such as fsck, ip, etc.)

Commands that are not essential (theoretically) for the system to boot or operate in single-user mode are placed in the /usr/bin and /usr/sbin directories

Historically, this was done so /usr could be mounted as a separate filesystem that could be mounted at a later stage of system startup or even over a network
- However, nowadays most find this distinction obsolete. In fact, many distributions have been discovered to be unable to boot with this separation, as this modality had not been used or tested for a long time)
- Thus, on some of the newest Linux distributions /usr/bin and /bin are just symbolically linked together, as are /usr/sbin and /sbin

/proc Filesystem

Certain filesystems, like the one mounted at /proc, are called pseudo-filesystems because they have no permanent presence anywhere on the disk

The /proc filesystem contains virtual files (files that exist only in memory) that permit viewing constantly changing kernel data
/proc contains files and directories that mimic kernel structures and configuration information (It doesn't contain real files, but runtime system information, ex: system memory, devices mounted, hardware configuration, etc.)
- Some important entries in /proc are:
  - /proc/cpuinfo
  - /proc/interrupts
  - /proc/meminfo
  - /proc/mounts
  - /proc/partitions
  - /proc/version

/proc has subdirectories as well
- /proc/<Process-ID-#>
- /proc/sys

The /proc filesystem is very useful because the information it reports is gathered only as needed and never needs storage on the disk

/dev Directory

The /dev directory contains device nodes (A type of pseudo-file used by most hardware and software devices, except for network devices)

This directory is:
- Empty on the disk partition when it's not mounted
- Contains entries which are created by the udev system, which creates and manages device nodes on Linux, creating them dynamically when devices are found
- The /dev directory contains items such as
  - /dev/sda1 (First partition on the first hard disk)
  - /dev/lp1 (Second printer)
  - /dev/random (A source of random numbers)

/var Directory

The /var directory contains files that are expected to change in size and content as the system is running (var stands for variable)

The /var has entries, such as
- /var/log - System log files
- /var/lib - Packages and database files
- /var/spool - Print queues
- /var/tmp - Temporary files

Network services directories, such as /var/ftp (the FTP service) and /var/www (the HTTP web service) are also found under /var

The /var directory may be put on its own filesystem so that the growth of the files can be accommodated and any exploding file sizes don’t fatally affect the system

/etc Directory

The /etc directory is the home for system configuration files

It contains no binary programs, although there are some executable scripts
- ex: /etc/resolv.config tells the system where to go on the network to obtain host name to IP address mappings (DNS)
Files like passwd, shadow and group for managing user accounts are found in the /etc directory
While some distributions have historically had their own extensive infrastructure under /etc (for example, Red Hat and SUSE have used /etc/sysconfig), with the advent of systemd there is much more uniformity among distributions today
/etc is for system-wide configuration files and only the superuser can modify files there
- User-specific configuration files are always found under their home directory

/boot Directory

The /boot directory contains the few essential files needed to boot the system

For every alternative kernel installed on the system, there are four files, which have a kernel version appended to its name:
- vmlinuz - The compressed Linux kernel, required for booting
- initramfs (Sometimes called initrd) - The initial ram filesystem, required for booting
- config - The kernel configuration file, only used for debugging and bookkeeping
- System.map - Kernel symbol table, only used for debugging

GRUB (Grand Unified Bootloader) files, such as /boot/grub/grub.config or /boot/grub2/grub2.cfg are also found under the /boot directory

/lib and /lib64 Directories

/lib contains libraries (common code shared by applications and needed for them to run) for the essential programs in /bin and /sbin

These filesnames either start with ld or lib
- ex: /lib/libncurses.so.5.9
Most of these are what is known as dynamically loaded libraries (Also known as shared libraries or SO (Shared Objects))
On some Linux distributions, there exists a /lib64 directory containing 64-bit libraries, while /lib contains 32 bit versions

Just like for /bin and /sbin, the directories just point to those under /usr
Kernel modules (kernel code, often device drivers, that can be loaded and unloaded without restarting the system) are located in /lib/modules/<kernel-version-number>

Removable media: the /media, /run and /mnt Directories

One often uses removable media, such as USB drives, CDs and DVDs

To make the material accessible through the regular filesystem, it has to be mounted at a convenient location
Most Linux systems are configured so any removable media are automatically mounted when the system notices something has been plugged in

While historically this was done under the /media directory, modern Linux distributions place these mount points under the /run directory

For example, a USB pen drive with the label myusbdrive for a user named student would be mounted at /run/media/student/myusbdrive

The /mnt directory has been used since the early days of UNIX for temporarily mounting filesystems

These can be those on removable media, but more often might be network filesystems, which are not normally mounted - Or these can be temporary partitions, or so-called loopback filesystems, which are files which pretend to be partitions

Additional Directories Under /:

There are some additional directories to be found under the root directory

/opt - Optional application software packages
/sys - Virtual pseudo-filesystem giving information about the system and the hardware (Can be used to alter system parameters and for debugging purposes)
/srv - Site-specific data served up by the system (Seldom used)
/tmp - Temporary files; On some distributions erased across a reboot and/or may actually be a ramdisk in memory
/usr - Multi-user applications, utilities and data

/usr Directory Tree

The /usr directory tree contains theoretically non-essential programs and scripts (in the sense that they should not be needed to initially boot the system) and has at least the following sub-directories:

/usr/include - Header files used to compile applications
/usr/lib - Libraries for programs in /usr/bin and /usr/sbin
/usr/lib64 - 64-bit libraries for 64-bit programs in /usr/bin and /usr/sbin
/usr/sbin - non-essential system binaries, such as system daemons
/usr/share - Shared data used by applications, generally architecture-independent
/usr/src - Source code, usually for the Linux kernel
/usr/local - Data and programs specific to the local machine; subdirectories include bin, sbin, lib, share, include, etc.
/usr/bin - This is the primary directory of executable commands on the system

Comparing Files and File Types

diff

diff is a utility used to compare files and directories, and has many useful options including:

diff -c - Provides a listing of differences that include three lines of context before and after the lines differing in content
diff -r - Used to recursively compare subdirectories, as well as the current directory
diff -i - Ignore the case of letters
diff -w **-**Ignore differences in spaces and tabs
diff -q - Be quiet: only report if files are different without listing the differences

To compare two text files, at the command prompt, type: diff [options] <filename1> <filename2>

To compare two binary files, at the command prompt, type: cmp [options] <filename1> <filename2>

diff3 and patch

You can compare three files at once using diff3, which uses one file as the reference basis for the other two

For example, suppose you and a co-worker both have made modifications to the same file working at the same time independently. diff3 can show the differences based on the common file you both started with.

The syntax for diff3 is as follows: $ diff3 MY-FILE COMMON-FILE YOUR-FILE

Many modifications to source code and configuration files are distributed utilizing patches, which are applied, not surprisingly, with the patch program

A patch file contains the deltas (changes) required to update an older version of a file to the new one
The patch files are actually produced by running diff with the correct options, as in $ diff -Nur originalfile newfile > patchfile
Distributing just the patch is more concise and efficient than distributing the entire file
- For example, if only one line needs to change in a file that contains 1000 lines, the patch file will be just a few lines long
To apply a patch, you can just do either of these two methods: $ patch -pl < patchfile or $ patch originalfile patchfile
- The first usage is more common, as it's often used to apply changes to an entire directory tree, rather than just one file (as in the second example)

file

In Linux, a file's extension often does not categorize it the way it might in other operating systems

One cannot assume that a file named file.txt is a text file and not an executable program
In Linux, a filename is generally more meaningful to the user of the system than the system itself
In fact, most applications directly examine a file's contents to see what kind of object it is rather than relying on an extension
- This is very different from the way Windows handles filenames, where a filename ending with .exe, for example, represents an executable binary file

The real nature of a file can be ascertained by using the file utility

For the file names given as arguments, it examines the contents and certain characteristics to determine whether the files are plain text, shared libraries, executable programs, scripts, or something else.

Backing Up Data

There are many ways to back up data or even your entire system

cp

cp can only copy files to and from destinations on the local machine (unless you're copying to or from a filesystem mounted using NFS)

rsync

More efficient since it checks if the file being copied already exists

If the file exists and there's no change in size or modification time, rsync will avoid an unnecessary copy and save time
Also because rsync copies only the parts of files that have actually changed, it can be very fast
rsync can also be used to copy files from one machine to another
rsync is very efficient when recursively copying one directory tree to another because only the differences are transmitted over the network
- One often synchronizes the destination directory tree with the origin, using the -r option to recursively walk down the directory tree copying all files and directories below the one listed as the source

Using rsync

rsync is a very powerful utility

For example, a very useful way to back up a project directory might be to use the following command: $ rsync -r project-X archive-machine:archives/project-X)

rsync can be very destructive as well

Accidental misuse can do a lot of harm to data and programs, by inadvertently copying changes to where they're not wanted
- Take care to specify the correct options and paths
  - It's highly recommended that you first test your rsync command using the -dry-run option to ensure that it provides the results that you want

To use rsync at the command prompt, type rsync sourcefile destinationfile, where either file can be on the local machine or on a networked machine (The contents of sourcefile will be copied to destinationfile)

ex: $ rsync --progress -avrxH --delete sourcedir destdir

Locations

Locations are designed in the target:path form

target can be in the form of someone@host (The someone@ part is optional and used if the remote user is different from the local user)

Compressing Data

File data is often compressed to save disk space and reduce the time it takes to transmit files over networks

Linux uses a number of methods to perform this compression:
- gzip - The most frequently used Linux compressions utility
- bzip2 - Produces files significantly smaller than those produced by gzip
- xz - The most space-efficient compressions utility used in Linux
- zip - Is often required to examine and decompress archives from other OSes

These techniques vary in the efficiency of the compression (how much space is saved) and in how long they take to compress

Generally, the more efficient techniques take longer
Decompression time does not vary as much across different methods.

tar - Often used to group files in an archive and then compress the whole archive at once

Compressing Data Using gzip

gzip is the most often used Linux compression utility

It compresses very well and is very fast
gzip usage examples:
- **gzip *** (Compresses all files in the current directory; each file is compressed and renamed with a .gz extension)
- gzip -r projectX - Compresses all files in the projectX directory, along with all files in all of the directories under projectX
- gunzip foo - De-compresses foo found in the file foo.gz
  - Under the hood, the gunzip command is actually the same as gzip -d

Compressing Data Using bzip2

bzip2 has a syntax that's similar to gzip but it uses a different compression algorithm and produces significantly smaller files, at the price of taking a long time to do its work

Thus, it's more likely to be used to compress larger files
gzip2 usage examples:
- **bzip2 *** - Compresses all of the files in the current directory and replaces each file with a file renamed with a .bz2 extension

bunzip2 .bz2 - Decompresses all of the files with an extension of .bz2 in the current directoryUnder the hood, bunzip2 is the same as calling bzip2 -dNOTE**:* bzip2 has lately become deprecated due to lack of maintenance and the superior compression ratios of xz

* * * * *

*which is actively maintained.*Compressing Data Using xz**xz** is the most space-efficient compression utility in Linux and is used to store archives of the Linux kernelIt trades a slower compression speed for an even higher compression ratioxz usage examples:**xz *** - Compresses all of the files in the current directory and replaces each file with one with a .xz extension**xz foo** - Compresses foo into foo.xz using the default compression level (-6), and removes foo if compression succeeds**xz -dk bar.xz** - Decompresses the bar.xz file into bar and doesn't remove the bar.xz fileeven if decompression is successful**xz -dcf a.txt b.txt.xz > abcd.txt** - Decompresses a mix of compresses and uncompressed files to standard output, using a single command*xz -d .xz - Decompresses the files compressed using xz

Compressed files are stored with a .xz extension

Handling Files Using zip

The zip program isn't often used to compress files in Linux but is often required to examine and decompress archives from other OSes

It's only used in Linux when you get a zipped file from a Windows user
It's a legacy program

zip usage examples:
- **zip backup *** - Compresses all files in the current directory and places them in the backup.zip
- zip -r backup.zip - Archives your logging directory (~) and all files and directories under it in backup.zip
- unzip backup.zip - Extracts all files in backup.zip and places them in the current directory

Archiving and Compressing Data Using tar

Historically, tar stood for "tape archive" and was used to archive files to a magnetic tape

It allows you to create or extract files from an archive file, often called a tarball
- At the same time, you can optionally compress while creating the archive, and decompress while extracting its contents

tar usage examples:
- tar xvf mydir.tar - Extracts all the files in mydir.tar into my directory
- tar zcvf mydir.tar.gz mydir - Create the archive and compress with gzip
- tar jcvf mydir.tar.bz2 mydir - Create the archive and compress with xz)
- tar xvf mydir.tar.gz - Extract all the files in mydir.tar.gz into the mydir directory
  - Note: You don't have to tell tar it's in gzip format

You can separate out the archiving and compression stages

This is slower and wastes space by creating an unneeded intermediary .tar file
- ex: $ tar cvf mydir.tar mydir ; gzip mydir.tar
- ex: $ gunzip mydir.tar.gz ; tar xvf mydir.tar

Relative Compression Times and Sizes

As compression factors go up, CPU time does as well

Disk-to-Disk Copying (dd) / Disk Destroyer / Delete Data

The dd program is very useful for making copies of raw disk space

ex: To back up your MBR (The first 512-byte sector on the disk that contains a table describing the partitions on that disk), you might type: $ dd if=/dev/sda of=sda.mbr bs=512 count=1
- Warning: Typing $ dd if=/dev/sda of=/dev/sdb to make a copy of one disk onto another, will delete everything that previously existed on the second disk

Home

LFS101x: Chapter 10 - File Operations