Teach Yourself UNIX Shell Programming in 14 Days

Chapter 7: File Input/Output

Objectives:

This chapter discusses input and output redirection, background processing, and shell archives. Objectives important to this chapter are:

  • use of file descriptors
  • redirection operators
  • sending processes to the background
  • using pipe and tee
  • creating and extracting shell archives
Concepts:

UNIX tends to think of everything that can be written to or read from as a file. This includes files, directories, and the standard data streams that we referred to in c as stdin, stdout and stderr. These files are referred to with internally by number, a series of integers called file descriptors. The numbers for the standard files are:

  • stdin - 0
  • stdout -1
  • stderr - 2

Input and output can be redirected from their usual paths by using the redirection operators. When using the greater than and less than signs, think of data as flowing toward the point of the sign.

A single greater than sign sends output to the designated file destructively. That is, it creates the file if it does not exist, and it destroys the files contents if any exist already. In order to preserve the present contents of the destination file, use a double greater than sign, which means to append.

In UNIX it is possible to concatenate several files into one destination file. Simply enter the command like this:

	cat file1 file2 > file3

This would concatenate the contents of file1 and file2 and place them into file3. In order to preserve an existing file3, you would use the double greater than. sign.

To redirect input, we can use the less than sign. The book illustrates this with the sort command, which is also useful for sorting files, line by line. To sort a file of data, we might use redirection to feed it a file like this:

	sort < filename

This would result in the sorted output being sent to the standard output device (the screen.) To save it in a file, we might use both types of redirection:

	sort < file_to_sort > sorted_file

There is also a double less than sign, used to set a marker at which to stop reading input. If used with cat, like this:

	cat << stop

it means to read the standard input stream until encountering the word "stop" on a line by itself. If used like this:

	cat input_file << "Magic Word"

it means to cat each line of the file onto the screen until encountering a line the contains only the text "Magic Word". The quotes are used to mark the complete phrase as the stop value, not just the first word, and the line containing it and only it is not put on the screen.

The pipe is used to link two commands, taking the output of the command on the left of the pipe and sending as input to the command on the right of the pipe. Example:

	ls -s | sort -n

This command illustrates the -s switch for Ls It means to output the short form of the list. The -n switch of the sort command means to sort numerically, as opposed to alphabetically, which is the default. The use of the pipe is recommended when you do not wish to create a file that holds the output of the first command, sends the text to the second command, and then would have to be deleted in order to clean up.

The tee command is like an additional option for the pipe. It allows you to take the output from a pipe and send it to a file as well as to the screen. In the example:

	Ls -s | sort -n | tee filename

we are reading a directory list in short form, sorting it numerically, and sending the output of the sort to the designated file and to the screen.

File Descriptor numbers can be used with redirection symbols to send output to a file instead of the usual device. For instance:

	cc newstuff.c 2>errors

would run the c compiler (the cc part) on the source file called newstuff. The 2 right next to a greater than symbol sends the stream of data destined for stderr to a file instead (called errors). It is important for this command not to space around the greater than sign. If it is desired to send the output to another numbered file stream, the greater than is followed by an ampersand and then the file descriptor number.

Processes can be forced to run in the background. This is useful for processes that take a long time to run, if you want to "free up" the terminal. (You are actually just getting control of it again.) Simply follow the command with an ampersand and everything to the left of it goes to the background.

On the other hand, you may wish for a process to finish before proceeding with your next command. The wait command helps here. Since the syntax is

	wait process-id

it helps to find out the process id first. The ps -ax command will list all current processes, even those in the background.

Shell archives are simply script files that contain other files. They can be created by the method on page 186, inserting lines that will write the constituent files back to new versions or their original selves when the archive file is executed. You should be careful to put end of file markers at the end of each file you concatenate into the archive, making sure they are unique and do not appear in the text of the file being archived. You should also make sure to precede the end of file marker with a backslash as at the bottom of page 184, since this tells the shell to copy all text of the source file into the target without modification.

In order to facilitate extraction of the archived files, you will wish to start the archive file with the text

	#!/bin/sh

which means, when executed, that this is to be run in a Bourne shell. Since other shells, notably the C shell, do not follow Bourne syntax, this will ensure proper handling of your commands. Naturally, you could call another shell, if you are using another shell's syntax to write the script.