CIS 2210 - Database Management and Design

Recap of Second Level SQL

Objectives:

This lesson reviews more material about SQL. Objectives important to this lesson:

  1. Retrieving data from more than one table
  2. Using IN and EXISTS
  3. Aliases
  4. Set operations
  5. Using ANY and ALL
  6. Special JOIN operations
Concepts:

Retrieving data from multiple tables also begins with the SELECT command. A query on multiple tables may come in three parts:

SELECT column list
FROM data tables
WHERE relationship between the tables ;

  • SELECT is followed by a list of columns, which may be any columns in the tables being accessed, separated by commas. For columns that are unique to one table, the column name is sufficient. For columns that appear in both tables, use the table.column notation.
  • FROM is followed by the name of the tables to be read, separated by commas.
  • WHERE is not optional if the data being read from tables is based a relation.
    In this case, the WHERE clause must name the columns that form the relationship between the tables, in table.column notation, with an equal sign between them.
    Example: book.BOOK_CODE = wrote.BOOK_CODE;
    This tells the database which rows in one table are supposed to match which rows in the other table. If you don't give it this information, your query result will be full of nonsense.

The WHERE clause must contain a statement of the table relationship when you want data from the second table that relates to the first. The WHERE clause may also contain restrictions. If it does, the statement of relationship is followed by AND, which is followed by your list of restrictions. Format:

SELECT column list
FROM data tables
WHERE a valid relationship between the tables
AND column_name operator comparison_value ;

The IN operator can be used to match against any item in a set of values. The text shows an example that is a variation of the syntax shown above. It is not as clear, but it does about the same thing. The difference is that we are running a nested query. Format:

SELECT column list for second table
FROM second_table
WHERE column in second table that relates to first table IN
(SELECT column in first table that relates to second table
FROM first_table
WHERE column_name operator comparison_value ) ;

This query can also be done with the EXISTS operator. Format:

SELECT column list for second table
FROM second_table
WHERE EXISTS
(SELECT *
FROM first_table
WHERE second_table.column_name = first_table.column_name
AND column_name operator comparison_value ) ;

This runs the inner query on both tables, looking for rows that meet the condition in the final line. The outer query then runs, selecting the columns desired from the second table, retrieving only the rows that match the output of the first query.

It is often useful to link multiple tables in a SELECT query, typically when you want to gather some data from each of them. Although it is lengthy, the process uses the same syntax as other examples.

SELECT column list
FROM
data tables
WHERE
relationship between two tables
AND relationship between two other tables
AND another relationship if needed ;

The idea is to link the tables in the WHERE clause, continuing with as many linking statements as there are relationships between the tables you are using.

This looks more cryptic than it actually is. When you use a command that relates two or more tables, it makes some commands I showed you in the last lesson easier to run. Let's modify one of them to make it more useful.

SELECT AUTHOR_FIRST, AUTHOR_LAST, TITLE
FROM author, wrote, book
WHERE author.AUTHOR_NUM = wrote.AUTHOR_NUM
AND wrote.BOOK_CODE = book.BOOK_CODE
AND book.TYPE = 'FIC'
ORDER BY AUTHOR_LAST;

  1. This is one query (no nesting) that starts out reading three tables: author, wrote, and book.
  2. It will display data from three columns that do not need to be qualified in the SELECT command because each of those columns only exists in one of the tables being read, so the computer is not confused. A qualification will be needed in the WHERE clause.
  3. Now it gets more interesting.
    For each author name that is pulled from the author table, the database finds rows in the wrote table that match the author's number. How?

    Both of those tables have a column called AUTHOR_NUM. To match the right rows, we use a comparison between the author number we just pulled from the author table, and the author numbers in the wrote table. In a sense, the author number from the author table is like a constant at this moment, and the database is looking for rows that match it. Several rows may be pulled, and may be pulled for the next author we pull from the author table, and so on.

    Each comparison requires a qualified name for each of those fields, prefacing the field name with the table name. That makes it a qualified field name:
    author.AUTHOR_NUM = wrote.AUTHOR_NUM.

    This is the relationship that determines the correct match between data from those tables. Without a properly worded WHERE clause, the database would either throw an error message on the screen, or it would pull data from rows that were not related. Neither of those errors is acceptable.
  4. For each row that is read in the wrote table, the database also checks that row's book code for that book's record in the book table. Again, this is the relationship that tells the computer which rows in each table match.
  5. Then the database checks to see if that book is type FIC. If it is, that book is used in the output. If it is not, that book is ignored in the output.
  6. Finally, the output is ordered by the authors' last names.

You should note that this illustrates the idea of relationships between tables, the concept that allows us to pull correct data.

The relationships in this case are pertinent to the query at hand. The first line in the WHERE clause tells the database how to read rows in the author table and match them to rows in the wrote table. The second line tells it how to match the rows it finds in the wrote table to rows in the book table. The third sets the criterion for what kind of books we are interested in.

In the image on the right, you can see the results of this query run on the MySQL Online web site.

As a reminder from last week's lesson, you can run a subquery in a subquery, as an extension of the format already shown for one nested query. Last week, we looked at a query that had two subqueries. The innermost query ran first, then the query that it was inside, and finally the outermost query. This follows the standard computer principle of evaluating the expressions inside parentheses first. When you have nested parentheses, the innermost phrases evaluate first, followed by the next innermost, working our way to the outside.

A new topic is aliases. An alias can be used to serve as shorthand notation for the name of a table. This can save typing, but it looks odd until you understand it. The syntax itself is odd, in that we can use the alias before we define it in the command. This example is from another database that has tables called Customer and Rep, for customers and salespeople:

SELECT CustomerNum, CustomerName, R.RepNum
FROM Customer C, Rep R
WHERE C.RepNum = R.RepNum ;

In this example, C was made an alias for the Customer table, and R was made an alias for the Rep table, both in the FROM clause. The aliases were used in the SELECT clause and in the WHERE clause, which come before and after the aliases are declared. This is not logical to a programmer expecting left to right execution of a command, but it is how this feature of SQL works. If it gives you a headache, remember that the database engine reads the entire line of code, though the semicolon, before it evaluates what to do.

This comparison operator in this example has a field name on its left, as is typical, but the comparison value on its right is also a field name. This is unlike some comparisons. Often, the comparison value on the right is just a value, enclosed in what some people call single quotes. Other people call these apostrophes. The purpose of the quotes is twofold: to show that the value is not the name of a field, and to enclose the exact value to compare. In this example, like the one above, we want to compare the value of a given field in two tables. Where the values match, we use the data.

Set operations relate to math concepts. Three set operations are discussed.

  • As stated in another chapter, an intersection of two tables is the set of all rows that appear in both of those tables. The SQL command for intersection is INTERSECT.
  • The difference between two tables is the set of rows in the first table, minus any row that appears in the second table. The SQL command for difference is MINUS.
  • The union of two tables is the set of all rows that exist in either or both tables. It is not the same thing as tacking one table on the end of the other: a row that appears in both tables is only represented once in the result of the query. The SQL command for union is UNION.

Set operations can only be carried out on tables with the same columns. One way to make these commands available to you is to use queries to draw out the required columns from as many tables as needed, then use the set operator on the results.

The ANY and ALL operators are used to match a query against the results of another query. The inner query produces a set of data, and the outer query matches against the result. Consider this example from W3 Schools:

SELECT column_name(s)
FROM table_name
WHERE column_name = ANY
(SELECT column_name
FROM table_name
WHERE condition);

The outer query will match if the value from the current row of its selected column matches any of the values in the subquery.

If the query is like this example, in which I have changed the operator to < ALL:

SELECT column_name(s)
FROM table_name
WHERE column_name < ALL
(SELECT column_name
FROM table_name
WHERE condition);

The outer query will only match if the value being compared is less than every value in the subquery. As you should imagine, the ALL operator is unlikely to work when preceded by an equal sign. It is, however, useful when looking for data that falls outside another group of data.

Three more types of queries are defined in the text.

  • An inner join is one that shows only the rows of data that match the conditions given. Most students will wonder at this statement, since it describes every query we have discussed.
  • An outer join shows all the rows from one or both tables in a join. There are three types. To understand them, assume that we are running a query on two tables. The first one listed can be called the left table, and the second one listed can be called the right table.
    • A left outer join will include all rows from the left table, and only the rows from the right table that match the query. This could be a query that seeks customers who ordered a certain item, but the result shows all customers, so we can see who ordered the item and who did not.
    • A right outer join will include all the rows from the right table, and only the rows form the left table that match the query. This might show all stock items, but only the customers who actually ordered them.
    • A full outer join includes all rows from both tables. We would see customers who had and had not ordered anything, as well as items that had and had not been sold.
  • The product of two tables is the combination of all rows in the first table and all rows in the second table. It can be created as an error by leaving out a necessary WHERE clause. In other words, you use the SELECT and FROM commands, but have no rule for the selection, so you get everything too many times. If you selected fields from the Customer and Orders tables, and did not use a WHERE clause, the result would look like every customer ordered every single order, which is obviously not true. As noted, this is probably the result of an error in constructing your query. I have yet to see an example of a real business need for this kind of query, which is why you should consider it to be an error when it occurs.