CIS 303a - Computer Architecture

Chapter 12, File and Secondary Storage Management, part 1

Objectives:

This lesson begins the discussion of file storage and secondary storage concerns that relate to system development. Objectives important to this lesson:

  1. File management
  2. Organization of files and directories
  3. Relating file storage and secondary storage
  4. File manipulation
  5. Access controls for files and directories
Concepts:
Chapter 12, part 1

Chapter 12 begins with a discussion about file and secondary storage, but does not mention the difference between them. The usual way to view the two related concepts is that file storage is immediately available to a user, while secondary storage may take more time or effort to access.

The text tells us that a file management system (FMS) is typically part of the OS of any system, as we learned in the chapter on operating systems. The author discusses four layers of an FMS and relates them to the four layers of an OS:

OS Layers FMS Layers Purpose of FMS Layers
Command/Application Layer Command/Application Layer interface for users and applications
Service Layer File Control services that manipulate files and directories
Kernel Storage I/O Control moves data between storage devices and RAM
Hardware Storage Device bits, bytes, and blocks are stored on a device and read from it when needed

In the table above, the user interface is at the top of each column, and the hardware interface is at the bottom. The author's discussion on page 447 begins with the bottom layer.

  • Storage Device Layer - data is stored on a device through its device controller, which may be implemented as hardware on the device itself or on a controller card inserted into a bus slot; the device controller is the interface between the system bus and the device driver, which is a program that interfaces with the system OS
  • Storage I/O Control - part of the OS kernel that moves data between storage devices and RAM; includes device drivers and interrupt handlers
  • File Control Layer - provides the services used to manipulate files and directories
  • Command/Application Layer - provides the user interface and the application interface for file copying, moving, renaming; provides access to the utilities to format devices and create backup copies of them

The text uses vocabulary in this chapter that is specific to the topic. It has already used the word directory a few times. If this work bothers you, it means the same thing as the word folder, when used with regard to file systems. a number of terms are used when we are talking about the logical structure of the FMS:

  • A file is a collection of data of some sort.
  • A folder is a container object for some number of files or sub-folders. Folder and directory mean the same thing in this context.
  • A volume can be an entire hard drive, a removable device, a partition of a hard drive, or a larger collection of storage space that can span multiple hard drives (or their equivalent).
  • Some data files, typically those for database programs, are divided into records and fields. A field is a kind of data we saving about an entity, such as name, address, account number, or balance. A record is the collection of data from all fields that relates to one instance in the database, such as the information about a particular student in a database about students.

The logical structure of the FMS gives the user a way to organize and manipulate data. The physical structure of the FMS has to do with the actual devices being used for storage, and is independent of the logical structure. Users are typically not exposed to the physical structure of the FMS.

The text reminds us that files come in many types, and the FMS must be able to store and use all of them. In general, files typically fall into one of several types that are recognized by an FMS, often recognizable by their file extensions:

  • executable files (e.g. exe and dll files)
  • OS command files (e.g. bat and cmd files)
  • text or unformatted data files (e.g. txt files)
  • files associated with an application (e.g. docx and xlsx files)

The text is a little unclear on page 451. It should tell us that a directory can store files and other directories, and that it also stores information about those files and directories. This information is available to users in a number of formats, depending on the OS, and usually included the following properties:

  • name
  • type
  • location
  • size
  • ownership
  • access controls
  • time stamps, such as creation date, last modified date, last read date, last backup date

In any directory, each filename must be unique, which may not be apparent in systems that only display a portion of each file's name. Filenames have restrictions in some file systems that do not exist in others. For instance, UNIX and Linux systems would see file1, File1, and FILE1 as three different filenames. Windows would see all three as the same name, because it does not care about the case of the characters in a filename. The text reminds us that older DOS (Disk Operating System) versions restricted filenames to no more than eight characters, followed by a dot, followed by no more than three more characters in the file extension. This was called the 8.3 format for filenames. Microsoft removed this restriction in Windows 95 and Windows NT 3.51, so names of that sort are now referred to as Short Filenames.

On page 452, the text list some of the access controls, rights to manipulate a file, that may be assigned to individuals or groups:

  • list - the right to see a file in a directory listing
  • read - the right to see the contents of a file
  • modify - the right to add to, remove from, or alter the contents of a file
  • change - the right to change who has what rights to a file

As we have noted above, a directory can contain another directory, creating a parent-child relationship between them. The text notes that a directory may have multiple children, but it may only have one immediate parent. This is the nature of a hierarchical relationship: a file or directory must have a single. specific path from its current location up to the root of the volume in which it presently exists.

On page 453, the text shows us a classic view of a file system using a hierarchical structure. In this case it starts at a computer, which is the parent to a local disk (C:) and a network volume (T:), which is the parent to many folders (directories), and the files inside one selected folder are displayed in panel on the right. The currently selected directory is usually just called the current directory. The text also calls the current directory the working directory, but this phrase also refers to whichever directory an application has been coded to use when creating the temporary files it needs while it runs.

The text describes a complete description of the path to a file as its complete path, or fully qualified reference. In such a path, the usual symbol used to show progression from one directory to another is either a slash (/) or a backslash (\). Slashes are used in UNIX, Linux, and HTTP notation references. Backslashes are used in Microsoft notation references.

The text also discusses the idea of reference that will take you from one place in a directory structure directly to another place. In UNIX derived systems this is called a link (just like on the world wide web) and in Microsoft systems it is called a shortcut.

The text moves on to discuss storage allocation. Files take up a minimum amount of space, regardless of how small they are. This minimum space is referred to as a file allocation unit, which is also called a cluster. The size of a cluster or FAU varies with the OS, the choices the user makes, and the size of the volume involved. The larger the volume, the larger the FAU has to be. Think of clusters as being logical aspects of a disk that are dealt with by the Operating System. A cluster is defined as the smallest unit on a disk that the operating system can read from or write to at one time.

This is a chart of cluster sizes for various logical drive sizes using FAT12, FAT16, FAT32, and NTFS. For another view of the same data, see this page at the Microsoft Help and Support site.

DEFAULT Cluster Sizes, from Microsoft Help and Support site
FAT Type Logical Drive Size Cluster Size
FAT12 360 KB 2 sectors = 1KB
  720 KB 2 sectors = 1KB
  1.2 MB 1 sector = 512 bytes
  1.44 MB 1 sector = 512 bytes
  < 1 MB to 15 MB 8 sectors = 4 KB
FAT16 16 MB to 127 MB 4 sectors = 2 KB
128 MB to 255 MB 8 sectors = 4 KB
256 MB to 511 MB 16 sectors = 8 KB
512 MB to 1023 MB 32 sectors = 16 KB
1 GB to 2 GB (limit for DOS and Windows 9x) 64 sectors = 32 KB
2 GB to 4 GB (must be using NT, 2000, or XP) 128 sectors = 64 KB
FAT32 257 MB to 8 GB 8 sectors = 4 KB
8 GB to 16 GB 16 sectors = 8 KB
16 GB to 32 GB (limit for Windows 2000 and XP) 32 sectors = 16 KB
NTFS Up to 512 MB 8 sectors = 4 KB
512 MB to 1 GB 8 sectors = 4 KB
2 GB to 2 TB 8 sectors = 4 KB
2 TB to 16 TB 8 sectors = 4 KB
16 TB to 32 TB 16 sectors = 8 KB
32 TB to 64 TB 32 sectors = 16 KB
64 TB to 128 TB 64 sectors = 32 KB
128 TB to 256 TB 128 sectors = 64 KB
More than 256 TB not supported

The OS keeps track of the allocation of space with a Storage Allocation Table, in which all the possible space is represented, and the currently allocated space is marked as such. An example of such a table appears on page 457. Note that it shows blocks allocated to the same file in the same color. (The white ones are not in use. They are allocated to a file called SysFree.) Note also that the table lists a pointer for each block. A pointer is usually a variable that remembers a memory address. In this case, it remembers the location of the next block that holds a part of the allocated file.

Let's move on to File Manipulation on page 461. As most programmers will tell you, the basic actions you code in a program typically relate to opening a file, reading it, writing to it, and closing it. The text inserts a series of five steps that the FMS must take before opening the file:

  1. Find the file in the FMS.
  2. Check to see if the file is already open. If so, stop the procedure, unless we can share the file. If not, continue.
  3. Check the privileges our process has with regard to the file. If they are insufficient, stop. If sufficient, continue.
  4. Allocate buffers as needed (RAM space).
  5. Update the table of open files, and open the file.

Open files must be closed when a process is done with them. The text provides four steps the FMS will follow when you are done with a file:

  1. Flush the buffers; this means to copy the buffers' contents to file storage.
  2. Deallocate the memory the buffers were using.
  3. Update the relevant time stamps for the file.
  4. Update the table of open files, and close the file.

Deleting files is sometimes more complicated than it sounds. Instead of erasing the file from storage, the OS may just mark the FAUs allocated to the file as free, and remove the entry for the file from the directory table it appears in. The FAUs will be used the next time they are needed, and are as likely to be used for the next file that requires space as any other FAUs. This means that they may be unused for a while, and that the file they still hold might be undeleted until the FAUs are allocated and used. This also means that someone trying to harvest information from the storage device might be able to harvest that file or portions of it.

The text discusses access controls again, starting on page 462.The three classic UNIX permissions are listed in bullets on that page:

  • read - can read the file's contents
  • write - can modify the file's contents, which includes the right to delete the file
  • execute - can run the program that the file contains, if it is a program, command, batch, or script file

In UNIX, these rights are typically assigned or denied to three user classes: the user who owns the file, any group the user belongs to, and every other user on the system.