CIS 2210 - Database Management and Design

Recap of SQL Basics

Objectives:

This lesson reviews material about basic SQL. Objectives important to this lesson:

  1. Retrieving data
  2. Compound conditions
  3. Computed columns
  4. SQL operators and functions
  5. Sorting
  6. Nesting queries
Concepts:

Retrieving data begins with the SELECT command. A query using this command may have several variations. This is a classic form:

SELECT column list FROM data table WHERE conditional test ;

  • SELECT is followed by a list of columns, which may be any columns in the table being accessed, separated by commas. If you want all columns, you can use a wildcard character. In MySQL, the wild card for "all" is the asterisk.
  • FROM is followed by the name of the table to be read.
  • If you are running a single table query, WHERE is optional. It can be used to specify which rows to retrieve. If the WHERE clause is not used, all rows will be be retrieved.

Remember that the command must end with a semicolon, and that you must space between words. Other than that, the syntax for the command is free form, which means that you can put line returns in it to make it easier to read, and the command will perform the same way.

The WHERE clause can be a simple condition, such as testing for a particular value in one field. If so, you can use the standard comparison operators listed in the text. You are probably familiar with most of them:

Operator Meaning
= is equal to
< is less than
> is greater than
<= is less than or equal to
>= is greater than or equal to
<> does not equal
!= does not equal

Example from the bookstore database:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
WHERE PRICE > '20.00' ;

Note that this comparison operator has a field name on its left, and a comparison value on its right. The comparison value is enclosed in what some people call single quotes. Other people call these apostrophes. The purpose of the quotes is twofold: to show that the value is not the name of a field, and to enclose the exact value to compare.

Compound conditions are just two or more conditions linked by logical operators: AND and OR. These are also called Boolean operators. There is also a  Boolean NOT operator. One way to use the NOT operator is to compose a comparison, and to precede it with NOT. Example:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
WHERE NOT (PRICE > '20.00') ;

In this example, the query is looking for the books whose prices are not greater than 20.00. It is not required to put the condition in parentheses, but it helps clarify the meaning of the operation.

Computed columns are usually created by using math operators with the names of fields. The standard math operators used are + for addition, - for subtraction, * for multiplication, and / for division. In this example, we will create a column that computes the price of a book including the local sales tax.

SELECT BOOK_CODE, TITLE, PRICE, (PRICE * 1.06) AS WithTax
FROM book
WHERE PRICE > '20.00' ;

This command would create a computed column, and display the results under the heading "WithTax", which does not exist as a field name, but is used as a column heading in the output of the command.

A useful operator if you are searching for a partial match is the LIKE operator. When searching for a character string that occurs somewhere in a field, use the LIKE operator with a comparison value enclosed in percent signs. Example:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
WHERE TITLE LIKE ('%of%') ;

This command would retrieve the selected fields from all records having of anywhere in their names.

  • I used parentheses to mark the argument being handed to LIKE.
  • I used single quotes to mark the beginning and end of the argument.
  • The two % characters placed at the beginning and end of the argument will match any number of characters before and after the comparison value. Placing a % only before or only after the comparison value would match any characters at that position. (In some databases, you would use the * instead of the %.)

The IN operator is used to match against any of a set of values. For example, we can ask for customers having any of several specific credit limits:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
WHERE PRICE IN (21, 35, 45) ;

This query will report all books whose prices match any of the three listed values. Note that these values are not markers for ranges: a book's price must match one of those three numbers, or that record is not retrieved. Can you rewrite that query to pull books whose prices are inside a range? (We will add that to the assignment this week.)

Sorting is accomplished with the ORDER BY clause. ORDER BY is followed by the names of the fields on which to sort the output. If sorting on more than one field, the first listed is the key for the primary sort, the second is the key for the secondary sort, and so on. Example:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
ORDER BY PRICE, TITLE;

A sort will display results sorted in ascending order, unless the DESC switch is used. This switch follows the name of field to be sorted in descending order. Example:

SELECT BOOK_CODE, TITLE, PRICE
FROM book
ORDER BY PRICE DESC, TITLE;

Remember that the ORDER BY command will sort the rows in your output based on a specified column. It does NOT sort only the named column, which would disrupt the integrity of the data records.

A list of functions common to spreadsheets is offered, called aggregate functions. They give us the ability to provide a summary report from a query, rather than a detail report.

Function Result
AVG Average (mean) of a column
COUNT The number of rows matching retrieval criteria.
MAX Greatest value in a column
MIN Smallest value in a column
SUM The total of the values in a column

These functions can be combined with WHERE criteria, in order to apply them to subsets of the data. Example:

SELECT COUNT(*) FROM book WHERE TITLE LIKE ('%of%');

This query would reply with the number of records matching the criteria. MySQL works with this syntax, and with an alternate syntax, replacing the * with the name of any field.

It is possible to run a query on the results of another query by nesting the one whose results you need inside the one that reads those results. Figure out what you want to do for each query. The first one you need to run must be enclosed in parentheses, it must follow the IN operator, and that operator comes at the end of the second query to run. Example (colors are added for reference):

SELECT DISTINCT AUTHOR_NUM FROM wrote
WHERE BOOK_CODE IN
(SELECT BOOK_CODE FROM book
WHERE TYPE = 'FIC') ;

In this example the blue query runs first, and results in a list of book numbers that fall in the FIC type. The orange query then runs, matching records in the wrote table against the list of book numbers from the first query. I have made the semicolon orange because the command does not end until the second query runs. I have used the DISTINCT operator to avoid repetition of author numbers. Can you adapt this to give us the names of the authors who wrote fiction books, sorted by last name?

SELECT AUTHOR_FIRST, AUTHOR_LAST from author
WHERE AUTHOR_NUM IN
(SELECT DISTINCT AUTHOR_NUM FROM wrote
WHERE BOOK_CODE IN

(SELECT BOOK_CODE FROM book
WHERE TYPE = 'FIC'))
ORDER BY AUTHOR_LAST;

In this example, I have added a new query. The blue query runs first, generating the list of book codes for fiction books. The orange query runs next, generating the list of author numbers for fiction authors. Then the black query runs, giving us the names of the fiction writing authors, and ordering the list by last name. The trick was to determine how to get what I actually wanted. I pulled the book codes for all fiction books, then pulled the author numbers for the authors who wrote those books, then reported the first and last names for the authors whose numbers I just pulled. In the online interface for a bookstore, the SQL would be hidden from the customer who would have no need to understand it.

When you are troubleshooting commands, it is useful to ask for a bit more information, to make sure you are getting the results you want. If I were trying to troubleshoot the orange section of the command above, I might remove the DISTINCT operator to see how many separate book hits we get for each author. I might ask for more fields in each query to make sure I was getting the right data, such as getting book titles as well as codes in the blue query.

(SELECT AUTHOR_NUM FROM wrote
WHERE BOOK_CODE IN

(SELECT BOOK_CODE, TITLE FROM book
WHERE TYPE = 'FIC'));

I have asked SQL to do the same thing, except that this time I want to see the book titles that match the book numbers in the result. This way, I can read the output and make sure that only the records I want to see are being reported. (Assuming I know anything about the books in my store.) After I was sure that the query worked, I would change it so that the output only contained what I (or my customer) really wanted in the result.

That might not tell me enough, so I would also run the blue section by itself, with a similar change:

SELECT BOOK_CODE, TITLE, TYPE FROM book
WHERE TYPE = 'FIC';

This command would show me the output of the inner query, along with the type for each book selected. That output would be easy to check visually, to verify that I am only getting book numbers for the correct type. Again, I would only do this to check the command. If I ran the nested query with this change in it, I might get results that I do not want. The purpose of the blue query is just to generate a control list for the orange query. It is more important to remove these visual checks from the inner query than from the outer one, because we are feeding the result of the inner query to the outer one. We don't want the outer query to choke on more data than it expects. This applies to each of the nested query pairs, orange to blue, and blue to black.