LUX 211 - Shell Programming

Lesson 2: Chapter 4, The Filesystem; Chapter 5, The Shell

Objectives:

This lesson introduces the concepts from the first three chapters of the text. Objectives important to this lesson:

  1. File system characteristics
  2. File system management
  3. Permissions
  4. Using ACLs
  5. Hard links and symbolic links
  6. Command-line syntax
  7. How the shell interprets a command
  8. Redirection and pipes
  9. Background commands
  10. Stand-alone and built in utilities
Concepts:
Chapter 4

Despite the author's desire to type "filesystem" in each of his books, I believe we can turn that into two words without losing any meaning. Doing so will also make our spelling checkers happier. The author spends two pages reminding us that a file system is typically explained and shown as a hierarchy that starts at a root and grows branches (containers, folders, directories), which can contain other branches (more containers, folders, directories), any of which can also contain leaf objects (files).

The author also reminds us that the list of all containers that we pass through from the root to any given container or leaf object is the path or pathname to that object. Remember that a pathname that starts at the root is an absolute pathname, and that a relative pathname starts at some point other than the root of the file system.

A pathname, for example, may start with a tilde (~) and a slash, which means that it starts at the current user's home directory, not the current directory and not the root. Page 89 shows us this usage and a more cryptic one. If the tilde is immediately followed by a user name, that means the path starts at the home directory of that user. A pathname may also use a single dot, which stands for the path from the root to the current directory. A double dot stands for the path from the root to the parent of the current directory. (Unlike children, directories never have more than one parent, so the double dot is unambiguous.)

ADA compliance checklist 1On pages 84 and 85, the text gives us a short lesson on filenames. It makes some recommendations we should follow:

  • directories need names and they should follow the same rules as filenames
  • use characters that can be entered from the keyboard
  • make the filename relevant to the file's content
  • respect the length restrictions of legacy file systems when you must interface with them
    • DOS systems used 8.3: up to 8 characters, followed by a dot, followed by up to three characters as an extension
    • some UNIX systems were limited to 14 characters
    • some Macintosh systems were limited to 31 characters
  • pick a case or a rule about case and stick with it, because some file systems ignore case, some see upper and lower case as different letters, and some allow upper and lower case but they don't care about it
  • don't use spaces in filenames, because of the way you have to handle the spaces in Linux commands (see page 85) and because they violate the rules about complying with the Americans with Disabilities Act, as in item 1.1 in the checklist shown on the right  (https://www.hhs.gov/web/section-508/making-files-accessible/checklist/multimeda/index.html)
  • use hidden filenames with care; in Linux a filename that begins with a dot is hidden from the ls command unless the -a switch is used with it, which means "show all"

Starting on page 90, the text reviews commands to create, remove, and navigate directories, as well as the more flexible mv and rm commands. Note the discussion of the mv command that should make you eager to use a mouse to move files and folders.

Pages 96 and 97 present a long list of directories that are compliant with the Linux Filesystem Hierarchy Standard, (FHS), a standard whose pedigree the book traces through three other standards committees with long impressive names. (Truly, it must be worthy.) In short, these are directories whose locations we can hope to be where the text says they will be in current Linux releases. (Your mileage may vary. Check before trusting.)

Page 98 reviews using the ls -l command to view the permissions that are assigned to a file (or directory). Page 99 begins a discussion about using the chmod command to change permissions. This should be familiar to you. If it is not, review it and learn it, because it is about to get harder.

Page 101 introduces a new kind of permission, but it does not introduce the topic in a sensible way. I showed you how to set permissions on a file with a three digit number like 750 or 751. You can also set a more important permission with a four digit number, or with a +s if you are using that notation for chmod.  The text presents two versions of the technique: setuid and setgid. It may be easier to understand with an example of why you would do so.

When you use the passwd command, you change your password on the system. That doesn't sound like much, but the truth is that in order to change a password, you have to access a folder on the system that only root has the right to read. Well, you aren't root, so how can that work? It works through running the process (passwd) with the rights of the owner of the process, who happens to be root in this case. At some point, the passwd command was altered with a command like one of these two variations:

chmod 4755 passwd
chmod u+s passwd

Either of these commands would set the permissions on the passwd command to be -rwsr-xr-x. Note the s instead of an x in the set of permissions for the user who owns the file. In this case, the owner still gets execute rights to the file, and so does the group, and so do others. What the 4 in the first example, or the s in the second example, does is to change the file itself in a special way. Normally, when you run a file that has the x permission for you, the process you start affects other files and objects on the system. It can only affect those other objects if you have the right to do so. When you run a process that has the s permission set for its owner, that process runs with the permissions of the owner instead of your permissions. That means that anything the command you started needs to do, it does with all the rights associated with the user who owns that command. And when the owner of the process is root, that means the process can do anything at all that the programmer wanted done.

That's how we turn on the setuid permission, and why we need it. If we wanted to enable a process to use the rights of the group associated with it instead of the owner, we would turn on the setgid permission:

chmod 2755 passwd
chmod g+s passwd

This discussion could go on, but your text does not discuss the other related points. Let there be some mysteries we will not explore yet.

On page 103, the text reminds us that the standard r, w, and x permissions mean something different when applied to a directory.

  • The read permission means you are allowed to see what is in the directory. You can use the ls command to do so.
  • The write permission for a directory means you are allowed to create objects in it.
  • The execute permission means that you are allowed to use the change directory (cd) command to make it your current directory. If you only have the execute permission, you can see the permissions set for the directory with ls -l, but you must also use the -d switch (ls -dl). Even then, you will only see data about the directory, not about its contents. If you use a ls command to look for a specific file in a directory, you will only see it if you have rights to the file.

Page 104 begins a discussion about ACLs, Access Control Lists. The text mentions that ACLs provide finer control over permissions, but they present a larger burden on the system since there is more to process. ACL support can be added to a file system when it is mounted by using the acl switch, but the default is not to add it (no_acl). The text recommends not enabling ACLs on file systems that contain system files. In Linux, you can set (modify) the rules of an file's ACL with the setfacl command, and you can view the ACL of a file with the getfacl command. If the text seems a little unapproachable to you, try reading this article written by a system administrator.

Each ACL (every file can have one) should contain a list of rules that refer to the usual user, group, and other, and may contain rules for specific users and/or specific groups. (There are no specific others, so you won't see an ACL rule for a named other.) You can also set an effective rights mask that will override the settings you make for individual users and groups. (Yes, that seems illogical to me as well.)

Assume we have a file named ACLexample. We can set basic permissions with the chmod command:

chmod 644 ACLexample

If we ask to see the ACL of this file (without a header), we should do it like this, and expect this kind of output:

getfacl --omit ACLexample

user::rw-
group::r--
other::r--

The user line above has no specific user name between its two colons, which means that line is for the user who owns this file. The same thing is true for the group line: it applies to the owner's group. If we run the command "ls -l ACLexample", we should see a plus sign (+) at the end of the permissions list, which means that the file has an ACL. If we do not see a plus sign at the end of the permissions, there is no ACL for that file.

To add a line to the ACL for a user named bob with full permissions, we can do it like this:

setfacl -m u:bob:rwx ACLexample

That command would modify the ACL to look like this

user::rw-
user:bob:rwx
group::r--
other::r--

The four lines above refer to the rights of the owner of the file, the user named bob, the owner's group, and others. The text shows us on page 107 that we can add lines for multiple users at once. We just need to separate the user permission phrases with commas.

setfacl -m u:bob:rwx,u:tom:rw- ACLexample

The text also mentions that you can apply a setfacl command to multiple files at once by listing more than one file as the target of the command.

Turn to page 109 to begin the section on links, which the text tells us are pointers. Pointers typically hold a memory location, but in this case a link holds the address of a location on a hard drive. Filenames in directories are pointers in this respect, although they technically point to the inode of that file. In order to share a link to a file with someone else, the text gives us a procedure to follow:

  1. Grant the read and write permissions for the file to the new user. (Use a setfacl command to do so.)
  2. Make sure the user has read, write, and execute permissions for the file's folder to the user. (Use another setfacl.)
  3. Use the ln utility as described in the test to make a hard link or a soft (symbolic) link. A hard link points to the same inode that the directory entry for the file points to. They both point to a place on a hard drive.
    A soft/symbolic link holds a path to a file, so it is resolved each time it is used. A soft link can point to a file in another file system, but a hard link cannot.
  4. If you are making a soft link (as the book suggests you always should), use an absolute pathname to your target file. Soft links that are made with relative pathnames will break if they are moved from a folder at one level to a folder at another level in the tree.

If you remove a file, you should probably remove all links to it as well. If you are only removing a file temporarily, planning to replace it later, soft links will still work after the file is replaced, but hard links will not, since the file will probably have the same pathname, but it will be located in a different block on the hard drive.

The chapter ends with several pages on esoteric information about links that will only be confusing at this point. Let's move on to the next chapter.

Chapter 5

The chapter begins with a discussion of command syntax. Confusingly, the author points out that when he uses the word command, he may mean what you type on the command line, or he may mean the actual program that runs when your typed command is processed. Both definitions are correct. Isn't English fun? On page 126, he starts with an example of a simple command.

Syntax means grammar, the rules of putting words together so they make sense. Command syntax means the specific grammar that is required by whichever command we are thinking about.

  • The syntax for a simple command begins with the name of the command,
  • followed by any options (switches) we want to use,
  • followed by any arguments that may be required,
  • with spaces between tokens.

The text tells us that a token is a sequence of non-blank characters, sometimes called a word. Calling it a word helps maintain the similarity to a discussion of the grammar of a regular human language. Taking that approach for a moment:

  • the command name is the verb of the sentence, telling the computer what to do
  • the options are the adverbs of the sentence, telling the command how to do it
  • the arguments may be the objects of the sentence, the things on which the verb performs its actions

There are dependencies that happen for some arguments. If you choose particular options, associated arguments may be required, which makes writing a script that performs this way more challenging. The text points out that options often begin with a dash or a double dash, but filenames typically do not, which leads to the advice that you should never give files names that start with any number of dashes. You don't want the command to think that an argument is actually an option, or vice versa. When a script/command/program runs, it has to parse the command line. This means that it must receive parameters, correctly interpret what to do with them, then run, ask for input, present an error message, or just fail. The text remarks that the script/command/utility must do its own error trapping and interpreting, that the shell has no way to do that for us. This should be obvious, if you think about it, so think about it.

When we want to execute a script, we must give ourselves (and others?) the execute permission for the file, but that is not the only thing that would stand in your way in your assignments. The text offers some tips on page 132 that apply to what we might think of as housekeeping:

  • Set appropriate execute permissions for the script.
  • Either place the script file in one of the folders that is listed in the PATH variable, or amend the PATH variable like this:
    PATH=$PATH:.
    This command would take the current list of paths in the PATH variable ($PATH), and append a colon and the absolute path your current directory (:.). This assumes that your current directory is where your script is.
  • Consider starting the script with something link this:
    ./script_name
    The initial dot stands for the path to the current directory, and the slash serves as a separator between that path and the name of your script. This gives the shell all the information it will need to find the script, so the paths in PATH are not consulted.

The text also points out that a command line might not begin with the command itself. That is true, but it is standard practice to do so. The example at the bottom of page 131 is less helpful than it is remarkable as a badly written command line that will still work. It does give us a good excuse to consider the standard input, standard output, and standard error streams.

For any hardware setup, there is an assumed standard input or output device for each of these streams of data. The actual hardware that is installed affects the assumption, as does the nature of the program being run. How a data stream is sent to a device may surprise you. Page 134 tells us that Linux treats most devices like files. We should know that driver files for devices are typically stored in subdirectories of the /dev directory, and that when we send data to the operating system, bound for a device, the OS passes the data through the driver, which acts like a filter that sends output to the intended device. As far as the operating system is concerned, it is just writing to a file.

The text reviews redirection in this part of the chapter. It reminds us that the output redirector (>) can cause a file to be overwritten, if the target file already exists. Redirection operators typically pass input from a file to a process, or from a process to a file.

Operator
Result
>
sends output to specified target, creating or overwriting it
<
takes input from specified source
>>
sends output to specified target, creating or appending to it
noclobber
when turned on, attempting to overwrite with redirection will cause an error
|
the pipe character is used to pass output from one process directly to another process as its input


In the rare case in which a process creates output that we want to ignore (or delete), we can redirect that output to /dev/null, which the text refers to as a bit bucket or data sink. Data sent to this location is not saved.

On page 141, the text reviews how a pipe character works on a command line. You should be aware that it takes output from one process and passes it as input to another process. Using this operator removes the need to write to a file, then read from a file, which you would have to do if you only used the "less than" and "greater than" redirection operators. The text refers to a command line that uses a pipe operator as a pipeline.

Page 145 shows an example of using a pipeline with three processes. First the user runs the who command to find out which users are logged in. The output of who is passed with a pipe to a new process, tee, which passes output to two locations: to a location you specify (often a temporary file), and to standard output. The pipeline being demonstrated then passes the standard output of tee through another pipe, which flows to a grep process that looks for a specific user name. If the user name is found in the output from who, that line appears on the screen. The advantage of using tee in this example is that it can save a copy of its input in case you need to examine that body of data again.

On page 146, we review using an ampersand (&) to move a process into the background. The previous text did not mention benefits of doing this. Our current text says that we can take advantage of multitasking with background processes. The foreground can only run one process at a time, which is fine for processes that don't take very long to run. The background can hold several running processes, which may speed up your script a good bit.

The text illustrates calling a process and sending it to the background at the bottom of page 146. Note that the system generates two numbers that appear on the screen. The first is a job number, a number in square brackets that indicates the command line includes a pipe. Pipelines are given job numbers when they are sent to the background. The second number is the Process ID of the first process in the pipeline. Both are useful if you want to manipulate a process in the background.

  • The job number can be used to bring a process from the background to the foreground. The command to do so is simply fg, if there is only one process in the background, but it is fg job_number or % job_number, if there are multiple jobs in the background. If you don't know the job number, enter the jobs command to see a list of running jobs, including their numbers and the command lines that started them. There is an example on page 148.
  • The process ID can be used with the kill command to stop a process, whether it is running in the foreground or the background. The text shows a sensible way to find the process number you need on page 147. If we know the name of the command that is running, we can enter ps | grep command_name, which should produce one line of output for that command, and that one line will start with the process ID. The syntax for the kill command is just kill process_ID.

The chapter makes another odd jump to a new topic on page 148 where it spends several pages discussing the use of wildcard characters and lists of characters enclosed in square brackets. If you need review on this material, read it over. This should not really be new to you in this class.