|
|
CIS 2210 - Database Management and Design
Recap of Second Level SQL
Objectives:
This lesson reviews more material about SQL. Objectives important
to this lesson:
- Retrieving data from more than one table
- Using IN and EXISTS
- Aliases
- Set operations
- Using ANY and ALL
- Special JOIN operations
Concepts:
Retrieving data from multiple tables also begins with the SELECT
command. A query on multiple tables may come in three parts:
SELECT column list
FROM data tables
WHERE relationship between the tables ;
- SELECT is followed by a list of columns, which may be any columns
in the tables being accessed, separated by commas. For columns that
are unique to one table, the column name is sufficient. For columns
that appear in both tables, use the table.column notation.
- FROM is followed by the name of the tables to be read, separated
by commas.
- WHERE is not optional if the data being read from tables is
based a relation.
In this case, the WHERE clause must name the columns that form the relationship
between the tables, in table.column notation,
with an equal sign between them.
Example: book.BOOK_CODE = wrote.BOOK_CODE;
This tells the database which rows in one table are supposed to match
which rows in the other table. If you don't give it this information,
your query result will be full of nonsense.
The WHERE clause must contain a statement of the table
relationship when you want data from the second table that relates to
the first. The WHERE clause may also contain restrictions.
If it does, the statement of relationship is followed by AND, which is
followed by your list of restrictions. Format:
SELECT column list
FROM data tables
WHERE a valid relationship between the tables
AND column_name operator comparison_value
;
The IN operator can be used to match against any item in a set
of values. The text shows an example that is a variation of the syntax
shown above. It is not as clear, but it does about the same thing. The
difference is that we are running a nested query. Format:
SELECT
column list for second table
FROM
second_table
WHERE
column in second table that
relates to first table IN
(SELECT column
in first table that relates to second table
FROM
first_table
WHERE
column_name operator comparison_value )
;
This query can also be done with the EXISTS operator. Format:
SELECT
column list for second table
FROM
second_table
WHERE
EXISTS
(SELECT
*
FROM
first_table
WHERE
second_table.column_name = first_table.column_name
AND
column_name operator comparison_value )
;
This runs the inner query on both tables,
looking for rows that meet the condition in the final line. The outer
query then runs, selecting the columns desired from the second table,
retrieving only the rows that match the output of the first query.
It is often useful to link multiple tables in a SELECT query,
typically when you want to gather some data from each of them. Although
it is lengthy, the process uses the same syntax as other examples.
SELECT column list
FROM data tables
WHERE relationship between two tables
AND relationship between two other tables
AND another relationship if needed ;
The idea is to link the tables in the WHERE clause, continuing with as
many linking statements as there are relationships between the tables
you are using.
This looks more cryptic than it actually is. When you use a command that
relates two or more tables, it makes some commands I showed you in the
last lesson easier to run. Let's modify one of them to make it more useful.
SELECT AUTHOR_FIRST, AUTHOR_LAST, TITLE
FROM author, wrote, book
WHERE author.AUTHOR_NUM = wrote.AUTHOR_NUM
AND wrote.BOOK_CODE = book.BOOK_CODE
AND book.TYPE = 'FIC'
ORDER BY AUTHOR_LAST;
- This is one query (no nesting) that starts
out reading three tables: author, wrote, and book.
- It will display data from three columns
that do not need to be qualified in the SELECT command because
each of those columns only exists in one of the tables being
read, so the computer is not confused. A qualification will be needed
in the WHERE clause.
- Now it gets more interesting.
For each author name that is pulled from the author table,
the database finds rows in the wrote table that match the author's
number. How?
Both of those tables have a column called AUTHOR_NUM. To match the right
rows, we use a comparison between the author number we just pulled from
the author table, and the author numbers in the wrote table. In a sense,
the author number from the author table is like a constant at this moment,
and the database is looking for rows that match it. Several rows may
be pulled, and may be pulled for the next author we pull from the author
table, and so on.
Each comparison requires a qualified name for each of those fields,
prefacing the field name with the table name. That makes it a qualified
field name:
author.AUTHOR_NUM = wrote.AUTHOR_NUM.
This is the relationship that determines the correct match between
data from those tables. Without a properly worded WHERE clause, the
database would either throw an error message on the screen, or it would
pull data from rows that were not related. Neither of those errors is
acceptable.
- For each row that is read in the wrote
table, the database also checks that row's
book code for that book's record in the book table. Again,
this is the relationship that tells the computer which rows in each
table match.
- Then the database checks to see if that book is type FIC.
If it is, that book is used in the output. If it is not,
that book is ignored in the output.
- Finally, the output is ordered by the authors' last names.
You
should note that this illustrates the idea of relationships between
tables, the concept that allows us to pull correct data.
The relationships in this case are pertinent to the query at hand. The
first line in the WHERE clause tells the database how to read rows in
the author table and match them to rows in the wrote table. The second
line tells it how to match the rows it finds in the wrote table to rows
in the book table. The third sets the criterion for what kind of books
we are interested in.
In the image on the right, you can see the results of this query run
on the MySQL Online web site.
As a reminder from last week's lesson, you can run a subquery in a
subquery, as an extension of the format already shown for one nested
query. Last week, we looked at a query that had two subqueries. The innermost
query ran first, then the query that it was inside, and finally
the outermost query. This follows the standard computer principle of evaluating
the expressions inside parentheses first. When you have nested parentheses,
the innermost phrases evaluate first, followed by the next innermost,
working our way to the outside.
A new topic is aliases. An alias can be used to serve as shorthand
notation for the name of a table. This can save typing, but it looks odd
until you understand it. The syntax itself is odd, in that we can use
the alias before we define it in the command. This example
is from another database that has tables called Customer and Rep, for
customers and salespeople:
SELECT CustomerNum, CustomerName, R.RepNum
FROM Customer
C, Rep R
WHERE C.RepNum = R.RepNum
;
In this example, C was made an alias for the Customer table, and R was
made an alias for the Rep table, both in the FROM clause. The aliases
were used in the SELECT clause and in the WHERE clause, which come before
and after the aliases are declared. This is not logical to a programmer
expecting left to right execution of a command, but it is how this feature
of SQL works. If it gives you a headache, remember that the database engine
reads the entire line of code, though the semicolon, before it evaluates
what to do.
This comparison operator in this example has a field name on its
left, as is typical, but the comparison value on its right is also
a field name. This is unlike some comparisons. Often, the comparison value
on the right is just a value, enclosed in what some people call single
quotes. Other people call these apostrophes. The purpose of
the quotes is twofold: to show that the value is not the name of
a field, and to enclose the exact value to compare. In this example,
like the one above, we want to compare the value of a given field in two
tables. Where the values match, we use the data.
Set operations relate to math concepts. Three set operations are
discussed.
- As stated in another chapter, an intersection of two tables
is the set of all rows that appear in both of those tables. The
SQL command for intersection is INTERSECT.
- The difference between two tables is the set of rows in
the first table, minus any row that appears in the second
table. The SQL command for difference is MINUS.
- The union of two tables is the set of all rows that exist
in either or both tables. It is not the same thing as tacking one
table on the end of the other: a row that appears in both tables is
only represented once in the result of the query. The SQL command for
union is UNION.
Set operations can only be carried out on tables with the same columns.
One way to make these commands available to you is to use queries to draw
out the required columns from as many tables as needed, then use the set
operator on the results.
The ANY and ALL operators are used to match a query against
the results of another query. The inner query produces a set of data,
and the outer query matches against the result. Consider this example
from W3 Schools:
SELECT column_name(s)
FROM table_name
WHERE column_name = ANY
(SELECT column_name
FROM table_name
WHERE condition);
The outer query will match if the value from the current row of its selected
column matches any of the values in the subquery.
If the query is like this example, in which I have changed the operator
to < ALL:
SELECT column_name(s)
FROM table_name
WHERE column_name < ALL
(SELECT column_name
FROM table_name
WHERE condition);
The outer query will only match if the value being compared is less
than every value in the subquery. As you should imagine, the ALL operator
is unlikely to work when preceded by an equal sign. It is, however, useful
when looking for data that falls outside another group of data.
Three more types of queries are defined in the text.
- An inner join is one that shows only the rows of data that
match the conditions given. Most students will wonder at this
statement, since it describes every query we have discussed.
- An outer join shows all the rows from one or both tables in
a join. There are three types. To understand them, assume that we are
running a query on two tables. The first one listed can be called the
left table, and the second one listed can be called the right
table.
- A left outer join will include all rows from the left table,
and only the rows from the right table that match the query. This
could be a query that seeks customers who ordered a certain item,
but the result shows all customers, so we can see who ordered the
item and who did not.
- A right outer join will include all the rows from the right
table, and only the rows form the left table that match the query.
This might show all stock items, but only the customers who actually
ordered them.
- A full outer join includes all rows from both tables. We
would see customers who had and had not ordered anything, as well
as items that had and had not been sold.
- The product of two tables is the combination of all rows in
the first table and all rows in the second table. It can be created
as an error by leaving out a necessary WHERE clause. In other words,
you use the SELECT and FROM commands, but have no rule for the selection,
so you get everything too many times. If you selected fields from the
Customer and Orders tables, and did not use a WHERE clause, the result
would look like every customer ordered every single order, which is
obviously not true. As noted, this is probably the result of an error
in constructing your query. I have yet to see an example of a real business
need for this kind of query, which is why you should consider it to
be an error when it occurs.
|