Chapter 2, Exploring the UNIX/Linux File Systems and File Security

LUX 205 - Introduction to Linux/UNIX

Chapter 2, Exploring the UNIX/Linux File Systems and File Security

Objectives:

This lesson discusses files in UNIX and Linux, including their hierarchical storage structure and commonly used security permissions. Objectives important to this lesson:

UNIX and Linux file systems
Partitions and inodes
Root hierarchical structures
The mount command
Paths, pathnames, and prompts
Navigating the file system
Directories
Copying files
File permissions

Concepts:

This chapter begins by telling us that a file is the basic structure for data storage. In fact, it is the basic structure for data and device use, as well. The text agrees that UNIX and Linux like to see everything they encounter as a file, including hardware components. For the moment, we will consider the data structures you typically call files, and will consider the implications of treating devices as files a bit later.

We can think of a file system in three ways that may be useful.

A logical file system is how the operating system thinks of the files and how they are organized. Different operating systems can work in different ways, but most will show a file system to the user that resembles a root, branches, and leaf objects.
A physical file system is how the hardware stores each piece of a file, updates it with changes, moves it when needed, and removes it when told to do so. Commands from an application pass through an operating system to drivers for devices, and changes are made in ways the user does not need to understand.
A virtual storage system is like the logical file system described above, but it allows the system to contain multiple devices, multiple operating systems, and multiple differences that will be presented to the user as one integrated system, regardless of its actual complexity. This is the view of files that you have seen in Windows, OSx, Linux, or any other file system you are likely to have used.

The UNIX file system is meant to operate as a virtual storage system. It presents a hierarchical concept of file storage to users, regardless of whether there is a graphic user interface. The text tells us that the original UNIX file system is called ufs (UNIX people love acronyms), and its extended version is called ext or ext fs. (Sometimes the short name is not an acronym.) We are told that a newer version of ext has added a concept called an extent, which is also used in some database programs. This makes sense if we think of a file system as also being a database about its component files and folders. (Folders are also called directories.) An extent, in this sense, is a contiguous group of data blocks allocated for a file when we store it. It need not be just the space needed to store the initial file. It will be more data blocks as well, in anticipation of the file growing in size. Why does it matter that the extent is contiguous blocks of storage space? A file that is stored in blocks that are physically next to each other can be accessed and manipulated faster than a file that is stored in discontiguous blocks. This is an advantage when dealing with large files.

There are tables of information about file systems supported by UNIX on pages 56 through 58 in the text. You should browse through this material to learn about file systems you may never have known existed.

The text reveals, a bit late, that the file system in UNIX and Linux is arranged in a hierarchy, as shown on page 55.

In the image on the right, a common Linux file structure is shown on the left side of the picture, and a common Windows file structure is shown on the right. In both cases there are icons for devices and for folders, and if we opened a folder we would see icons for files as well.. If you click the image, you will open a browser window on a short article which introduces some of the concepts of this chapter. The author of that article points out that the beginning of UNIX and Linux file systems is always called the root, and is represented by the slash, also called forward slash character: /. The slash is also used as a marker between directory names in a path to a file. Note that UNIX/Linux systems use the slash for this purpose, but Windows systems use the backslash: \. Paths in URLs follow UNIX notation perhaps because most servers on the Internet were running UNIX when the Internet was created.

The text reminds us that folders are also called directories, which you will need to know when creating, removing, or otherwise managing them from a UNIX or Linux command line. Directories can, and often do, contain other directories. These directories are in a parent/child relationship, and the child can be called a subdirectory with respect to its parent. We will talk more about folders and files in a bit, but the text covers two other concepts first.

Before you can add a file system to a hard drive, you have to create a partition on that drive to hold the file system. A partition may be an entire hard drive or only a portion of it, which allows you to create several partitions on a hard drive. Each of them may contain different file systems if that is what you need, or just different parts of the same file system.

When making partitions on a disk with UNIX or Linux, they are numbered and the first one is numbered 1. The text's discussion is about some older technologies, IDE and SCSI. If you have a system that still uses them, an IDE drive might have partitions called hdax, where x is some number, and where a indicates that this is primary (boot) device. A device with SCSI disks might have partitions called sdax. Using the table below as a guide, you would choose one element from each column to form the name of a partition.

To avoid confusion in naming partitions, use of automatic naming is recommended.

Device type	Sequential order indicator for the disk	Sequential order indicator for the partition
hd for an IDE, ATA, or SATA disk	a for a primary drive	1 for the first
sd for a SCSI disk	b for a secondary drive, etc	2 for the second, etc.

You will not have to partition your virtual machine in our lab, but if you install Linux on an actual hard drive, you will need to know some of the material on pages 60 through 62. You may also want to read this article on the How-To Geek website, that has slightly different recommendations.

The text suggests that your hard drive should have three or more partitions. It suggests others, but mentions that vendors often recommend a root partition, a boot partition, and a swap partition.
It mentions that partitions must be mounted to make them part of the file system. Part of what? The article linked above does a good job of explaining that UNIX and Linux think every partition is part of the same file system tree. Here is a quote that may make you want to read the article (nudge, nudge, hint, hint). I have added bold to some of the author's text:
- "The way Linux works is that it puts everything onto a tree. If you have another partition or disk, it gets “mounted” as a branch in a specific folder, usually /media or /mnt. The directory that a partition gets mounted to is called a “mount point.” This method works better with Linux’s tree structure, and you can mount partitions as folders nearly anywhere. In Windows, this is not so easily done; new partitions generally show up as separate drives. In addition, Linux can work with many more types of file systems natively than Windows."
  
  This fits the illustration above. On the Linux side of the graphic, note that the root partition is symbolized by a forward slash at the top of the tree. The admin for that system has created a folder called mnt in which he has mounted partitions for the floppy drive and the CD-ROM drive. These could have been mounted in other locations, as were the music drive and the home drive.
As our text notes, we need the root partition to have enough space to hold the operating system.
The text specifies a separate boot partition to hold the kernel of your version of Linux. The author of the article simply includes that as part of his root partition, which also matched the illustration above.
Each author, each distribution, and each friend who knows something about computers will have a different recommendation about the size of your swap partition. When in doubt, give it more space to minimize disk thrashing, which is caused by constant spinning, reading, and writing to the disk. Why should it do that? It does that when you don't have enough RAM in your system and it has to swap files from memory to disk more often than it should. Yes, the better answer is to add RAM, but you still should have a swap folder to hold temporary files. Recommendations for size range from the same size as your RAM to twice the size of your RAM. An even larger factor may be desirable in cases where you know the system RAM is low and you can't change it.
The text recommends a usr partition if your system will have multiple users. This partition is to hold the installed software for users.
The text also recommends a home partition to hold the personal files for each user in their own directories.

The second topic covered with partitions is inodes, which is short for information nodes, according to our text. That may be so, but I think the quote on Wikipedia attributed to Dennis Ritchie (one of the creators of UNIX) is probably also true: it may have meant index node when it was created. An inode is a data structure that holds information about and points to a file or a directory. UNIX and Linux use inodes the same way that DOS and Windows use File Allocation Table information to find files and note their size.

Let's go back to the idea of the file structure, as the text does on page 63. The text is rather pedantic about the idea of mounting the file system. This is something that takes place automatically when the computer starts. As a user or administrator, you would only need to worry about mounting new devices to the file system when they are added to your computer. Let's assume that the file system exists and starts (as it must) at the root, as shown in the image above. The text discusses the first three folders in that image on page 64, and the discussion continues with several others through page 70.

/bin - This folder is for programs. bin stands for binary code, which is how most executable programs are stored on a computer. (As opposed to readable text, which can also be executable, but that is another feature we will get to later.) This folder is for commonly used programs, ones that are typically included with the operating system, that are meant to be available to all users.
/boot - File required to start the operating system, including the kernel, go here.
/dev - This folder is for device special files, which we discussed in class last week as files that are called drivers in Windows operating systems. Linux and UNIX place these files in two categories: those that move blocks of data at a time (such as data drives connected through the motherboard) and those that move single characters of data at a time (such as devices attached to a USB or other serial bus). Note the table on page 65 that lists several subdirectories that may be found in the /dev folder, and note the use of the b and c letters to
/etc - The text explains that this folder is for configuration files, but it is also a catch all folder for files that are not numerous enough to need their own folder.
/home - A folder for the home directories for various users.
/lib - This is easier to understand if you have heard of the idea of shared code. Programs can be written that call on standard modules of code that are available to any program that needs them. This folder is meant to hold libraries of such code, files that hold standard procedures for printing, displaying information or graphics, and doing anything else that is normally done with a computer.
/mnt - This is one possible folder that might hold mount points for devices that can be temporarily or permanently installed on a computer, such as memory sticks, internal or external disc drives, or cell phones.
/media - As the text explains, this folder is a recent addition to the commonly available folders. It serves the same purpose as /mnt, but it is meant for devices that are temporarily connected to a computer, such as cameras, DVDs, and other devices holding data or entertainment media.
/proc - The text is a little hard to understand. The /proc folder is explained a bit more on the this web page, where we are told that it is like scratchpad of notes for kernel while the file system is running.
/root - As if we had not called enough things by this name already, this folder is the home directory for the user called root on this system. /root is the home directory of the system ID called root. Someone needs a larger vocabulary.
/sbin - This is like the /bin directory above, but this one is only for programs used by the system administrator. Programs found here are not meant to be used by average system users.
/tmp - As you might imagine, this folder is used to hold files that are only useful for a short time, temporary files that are written by processes or programs that need to use them as scrap paper. This folder is typically emptied when the system is shut down.
/usr - This folder is meant to hold programs that are not part of the operating system, but are still made available to system users, such as productivity software meant for the use of all employees.
/var - This folder burdens our imagination. It holds other folders that will vary in size, such as folders for log files and folders for print jobs. I might have been tempted to put the print jobs in the tmp folder, and the log files in the root user's folder, but then again, they never asked me about it.

The text explains the use of the mount command on the next three pages. It is not as clear as it might be. The mount command potentially has four parts, so I will try to explain it a little differently:

mount [if the device has a file system on it, -t name-of-the-file-system] pathname-of-the-driver-for-the-device pathname-of-the-mount-point-you-are-creating-for-the-device

The command is not optional, but the name of the file system is optional if the device has no file system on it. The paths to the driver and to the mount point are not optional.

To remove the device from the system, use the unmount command, which only requires the path to its mount point as an argument.

Having shown us a command that requires us to use pathnames, the text pauses to explain what a pathname is.

The path to any file consists of a list of the directories that you pass through to find that file.
The pathname of that file starts with a slash, then lists all those directories, each of them followed by a slash, then ends with the name of the file itself. This is true about folders as well as files. The leading slash stands for the root of the file system, and each of the subsequent slashes indicates a parent-child relationship between the labels that it separates.

On pages 73 and 74, the text explains how to change a system prompt to show more, less, or different information than it currently shows. There is a system variable called PS1 that holds a series of codes that are used to create the system prompt. This information about changing the prompt is not used on a daily basis by anyone, so you should be aware that it exists and be able to look up the codes that might be used to change the system prompt as desired.

The text explains that the pwd command is useful when you want to verify which directory the system is thinking about presently. The text says that the command stands for print working directory. I have also heard it called present working directory. Both interpretations are correct. "print" means to show it to us on the default output device (the screen), while "present" (emphasis on the first syllable) reminds us that the system's attention may be changed from one directory to another, and it may not be thinking about the directory that we intend it to be thinking about.

The text seems to be making a habit of telling us about something, then deciding that there is something else we should have been told first to understand its explanation. That being the case, it proceeds to tell us about a command to move the system's attention from the current working directory to some other directory. The command is cd, which stands for change directory. The cd command must be given one argument, the path to a specific directory, but that path can be given in one of two forms.

absolute path - This is the full pathname to a directory, starting at the root, and working all the way down to the directory in question.
relative path - This notation makes some assumptions, making it shorter in some cases, but more complicated in others.
- If you want to change to a directory that is a child of the current directory, you could just issue the command cdfoldername, because the logic of the command assumes that the folder is a chid of the current working directory.
- Assume you are in a directory that is a child of folder1. You want to change to a directory that is also a child of folder1 (your sibling directory?). Assume it is called child2.
  Use the fact that .. stands for your current parent directory. Issue the command cd../child2. Note that the use of the .. shorthand avoids the need to state the name of the current parent directory.
- A single . stands for the current working directory, and .. stands for its parent. The single . is not very useful in navigating from one folder to another.

The contents of a folder can be shown with the ls command, which stands for list. This short command has several options shown on page 77. One of the more useful ones is ls -l, which means list in long form, showing details about files and folders.

Note the example on page 78, ls -l /, which means to list the content of the root directory in long form. This listing includes more information that the chapter has not explained yet, so hold on a bit for more details. For the moment, note that the listing shows a d in the first column for items that are directories, and a dash for items that are files.

The text continues with some variations on using the ls command with wildcard characters. The first example, on page 79, is the command ls *.txt.
The asterisk is a wildcard character that matches any character in the position you place it in your search string. Search string? That's a fancy way of saying "the thing you asked Linux to look for". In this example, the command would list files that begin with anything (since the asterisk is the first character in the search string) but end with .txt. Consider some other examples:

ls me*
This would list files whose names begin with me, and end with anything. Note that "anything" includes the concept of "nothing". If there was a file whose name was just me, this command would list it as well as any other file whose name had me as the first two characters.
ls me*.txt
This one would list files whose names begin with me, and end with .txt, whether they have anything in between those required items or not.
ls me?.txt
The question mark is a wildcard that stands for any single character. The asterisk stands for any number of characters (even zero), but the question mark stands for one and only one character. So, this command would list files whose names begin with me, have one more character after that, then end with .txt. You might wonder about the usefulness of the question mark, but note the examples in the text that show that you can use more than one of them. They are useful when you are looking for a filename whose length you know, but whose complete spelling you do not know.

Directories can be created and deleted with the commands mkdir (make directory) and rmdir (remove directory). Those of you who know the DOS commands for these actions will be tempted to type md and rd, but those abbreviations do not always exist in UNIX or Linux, so be content to type the entire five characters, followed by the name of the directory, or the path to it.
mkdir directoryname
rmdir directoryname

Note the warning about rmdir on page 81:When removing a directory, you should first make sure that it contains no files. Most versions of UNIX/Linux require that a directory be empty before removing it. Oh wait, the text has not told us how to delete a file yet. The command to remove a file takes one argument:

rm filename (I guess someone didn't like the word "delete".)

The command to copy a file takes two arguments:

cp sourcefile destinationfile

The text continues with a section about file system security. Linux/UNIX files (and directories) have permissions assigned to them. There are three basic permissions:

read - you can see what is in a file
write - you can change what is in a file
execute - you can run a file, if it contains commands

Linux/UNIX also divides the world into three categories, with regard to files. First, you should know that users on a Linux/UNIX system are classified as belonging to groups. These groups are artificial, and are set up by the system administrator. A user on the system must fall into one of three categories with respect to any particular file:

user - person who owns the file, and probably wrote it
group - person in the same group as the user who owns it
other - everybody else in the universe

Think of permissions as being in three groups of three when seen on a list of files. Use the ls command with the modifier -al (In DOS we use forward slashes to show how to do a command. In Linux/UNIX we often use hyphens.) The command might look like:

ls -al

It means you want the long form listing of all files in the current directory.

On the left side of the listing are the permissions. Directories have a d first, file permission lists start with a hyphen (They love that hyphen in Linux/UNIX. Just wait...)
Let's go through an example. A file's permissions might look like this:

-rwxr-xr--

If we ignore the leading hyphen, this is three sets of three letters or hyphens.

The first set is for the User, and rwx means he/she can read, write and execute that file.
The second set of three is for the Group the User belongs to. The combination r-x would means they can read it and execute it, but not write to it (the w is missing).
The third set is for anybody else wandering across this file in the system. They have r-- in this example. That means they can read the file but not write to it or execute it.

There are several ways to set or change the permissions assigned to a file. Only the owner, a system administrator, a superuser, or a semi-talented hacker can do so. I usually use the chmod command with the octal number system described on page 85 of your text. You summarize the permissions down to three digits. Each digit represents the rights you grant one category above. Use this chart to decide what number to give each kind of person:

0 - no rights
1- execute only
2 - write only
3 - write and execute
4 - read only
5 - read and execute
6 - read and write
7 - all three: read, write and execute

Issue the command like this:

chmod 751 filename

This sets the owner's permissions to full (7), the group's permissions to read and execute (5), and common people's rights to execute only (1). You might want to do this to protect shell scripts you write.

That is sufficient for this chapter. Let's think about doing some exercises/projects.

Week 2 Assignments: Files and Directories

Individual assignment 1: Once you have a Linux machine running, carry out the following projects from Chapter 2. Turn in notes about what happens in each step.

Project 2-3.

Project 2-7.

Project 2-10.

Individual assignment 2: Once you have a Linux machine running, carry out the following exercises from Chapter 2 that begin on page 109. Turn in notes about what happens in each step.

Exercise 6.

Exercise 9.

Exercise 12.