|
|
CIS 2210 - Database Management and Design
Chapter 8: Advanced SQL
Objectives:
This lesson discusses material from chapter 8. Objectives important
to this lesson:
- More advanced joins
- Subqueries and correlated queries
- Working with dates, strings, and other data types
- Relational set operators
- Views
- Triggers and procedures
- Embedded SQL
Concepts:
Chapter 8
Introduction
The chapter continues the discussion of joins that began at the end of
the last chapter. Relational joins take advantage of the fact that relational
databases link tables through common attributes. This is one of the three
conditions the text tells make joins possible. For related tables, there
must be a way to look at their columns and match rows from one table to
the other. This is usually expressed as WHERE
table1.column
= table2.column.
The text makes an interesting point that if we join three
tables, we would actually join table
1 and table 2, then join the result
of that query with table 3. This can sort
of be done in one query, but it requires a subquery, so there are actually
two parts. There will also have to be a WHERE condition for each of the
joins you do, so the text gives us a formula: the number of conditions
in the WHERE clause will be the number
of tables joined minus one.
The chapter gets very term oriented next, so let's dig in and get it
over with. There are several kinds of joins, some of which only differ
in syntax. You tell them apart by how they do what they do. I will arrange
this table to match what you already know.
Join
Class
|
Join
Type
|
Example
|
Description
|
INNER
(4 types)
|
Old-Style
(classic)
|
SELECT
*
FROM T1, T2
WHERE T1.C1=T2.C1
|
Returns
only the rows that meet the join condition in the WHERE clause. Only
rows with matching values are selected.
|
|
NATURAL
|
SELECT
*
FROM T1 NATURAL JOIN T2
|
Assumes
that you have matching columns in the two tables. Returns only the
rows with matching values in the matching columns, unless there are
no matching values. In that case, it returns a relational product
of the two tables.
|
|
JOIN
USING
|
SELECT
*
FROM T1 JOIN T2 USING (C1)
|
Returns
only the rows with matching values in the columns indicated in the
USING clause. This syntax requires that the related column has the
same
name in each table.
|
|
JOIN
ON
|
SELECT
*
FROM T1 JOIN T2 ON T1.C1=T2.C1
|
Returns
only the rows that meet the join condition
indicated in the ON clause. This is more useful when the column names
do not match, but it requires table name qualification. (See discussion
at the link above.)
|
OUTER
(3 types)
|
LEFT
JOIN
|
SELECT
*
FROM T1 LEFT OUTER JOIN T2
ON T1.C1=T2.C1
|
Returns
rows with matching values and includes
all rows from the left
table (T1) with unmatched
values. This is what makes it a LEFT join.
|
|
RIGHT
JOIN
|
SELECT
*
FROM T1 RIGHT OUTER JOIN T2
ON T1.C1=T2.C1
|
Returns
rows with matching values and includes
all rows from the right
table (T2) with unmatched
values. This is what makes it a RIGHT join.
|
|
FULL
JOIN
|
SELECT
*
FROM T1 FULL OUTER JOIN T2
ON T1.C1=T2.C1
|
Returns
rows with matching values and includes
all rows from both
tables (T1 and T2) with
unmatched values. This is
what makes it unsuitable for most purposes.
|
CROSS
(2 syntaxes)
|
CROSS
JOIN
|
SELECT
*
FROM T1, T2
|
Returns
the Cartesian product of T1 and T2.
|
|
CROSS
JOIN
|
SELECT
*
FROM T1 CROSS JOIN T2
|
Returns
the Cartesian product of T1 and T2.
|
- In INNER JOINS, we are looking for matches. The NATURAL JOIN assumes
that there is a column on which to match values. JOIN USING requires
us the name the column. JOIN ON allows us to specify the names of the
columns, whether they are different or not, and it looks the same as
the old style except for the use of the word JOIN in place of the comma
between table names.
- OUTER JOINS return rows that match, and also rows that do not match.
The rows that do not match can come from one of the tables or both,
depending on the version you choose.
- CROSS JOINS produce a number of rows that is best described as every
row in the first table matched with every row in the second table. The
number of rows returned is the actual product of the rows in table 1
times the rows in table 2. Access does not support the second syntax,
but does support the first.
Subqueries
The text returns to the idea of nested
queries (queries that have subqueries) on page 349. The first point
discussed is that sometimes you have to run a query to get the data you
actually want to run a query
about. In the example below, the nested query runs first, which is controlled
by the parentheses around it.
SELECT V_CODE, V_NAME FROM VENDOR
WHERE V_CODE NOT IN (SELECT
V_CODE FROM PRODUCT);
Remember: the line returns in the examples are ignored by the SQL engine.
Each SQL command is really just one long (or short) line.
We need the nested query to run first because we want a list of all V_CODEs
in the PRODUCT table. Then we want to know which vendors in the VENDOR
table are not in that list. This query could have been run differently,
but the principle in well illustrated. We first determine which vendors
are being used, then check which known vendors are not on that list.
Subqueries are also called inner
queries, because they are inside parentheses, inside the main
query, which is also called the outer
query. The text provides several examples of using subqueries to
obtain information that in turn can be queried. Care should be taken to
make sure that the outer query receives the kind of information it (you)
are expecting from the inner query: one column from one row, one column
from multiple rows (a list), or one table. It is best to troubleshoot
this kind of situation by running only the inner query, to make sure it
produces what you expect, then running the combined nested query. Sometimes
your inner query will produce no result, and either your outer query will
have to deal with that, or you will have to recompose the inner query.
- The text points out that a subquery is often run on the right side
of an assignment operator or a comparison operator as part of a WHERE
clause. This is sensible, because such equations are always evaluated
on their right side first. Do not depend on this priority if you are
tempted to do so. Keep your inner queries inside parentheses and they
will be easier to find and troubleshoot.
- Sometimes you will want a subquery to return a list of data, such
as a single column with multiple rows. When you do this, make sure to
use the IN operator to the left
of the subquery. It is the appropriate operator to search a list.
- When using a GROUP BY command
to create summarized data, consider using a HAVING
clause to to restrict which rows are summarized. The text offers this
as an example:
SELECT P_CODE, SUM(LINE_UNITS)
FROM LINE
GROUP BY P_CODE
HAVING SUM(LINE_UNITS) > (SELECT AVG(LINE_UNITS) FROM LINE);
As you can see, there is a HAVING line which runs a subquery whose purpose
is to calculate the average of LINE_UNITS, then select for lines having
values greater than that average
- The text points out that the IN operator looks for exact
matches in a list, but you may want to look for things that do not match.
The ANY and ALL operators are used to match a query against
each of the results of a subquery, but in two different ways.
If the comparison is like this:
WHERE field < ANY (inner
query)
the outer query will match if the value being compared is less than
any of the values in the subquery. It does not matter how many items
it works for, as long as there is one that works.
If the comparison is like this:
WHERE field < ALL (inner
query)
the outer query will match if
the value being compared is less than every value in the subquery.
In this case, it has to compare to every
value returned by the subquery, and the comparison has to be valid for
all of them. If there is even one for which the comparison is not true,
then the whole thing is not true.
- Subqueries can also appear in the FROM clause of a SELECT command.
That takes a bit of explaining. The purpose of the FROM clause is to
specify tables from which to draw the columns and rows. It may be necessary
to run another SELECT command first, whose output will be used as a
table to draw from. This is fine, as long as you give it a name
for reference. You also have to make sure that the inner query produces
what the outer query has to have. The text provides an example:
SELECT DISTINCT CUSTOMER.CUS_CODE, CUSTOMER.CUS_LNAME
FROM CUSTOMER,
(SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE
WHERE P_CODE = '13-Q2/P2') CP1,
(SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE
WHERE P_CODE = '23109-HB') CP2
WHERE CUSTOMER.CUS_CODE = CP1.CUS_CODE AND
CP1.CUS_CODE = CP2.CUS_CODE;
In this example, the FROM clause contains one actual table and two inner
queries called CP1 and CP2 that produce what the text calls virtual
tables. Note the addition of the name after the closing parenthesis
for each query. This allows you to refer to these virtual tables in
the WHERE clause of the outer query. Each of those subqueries contains
its own SELECT, FROM, and WHERE clauses.
- Another variation is to place a subquery in the SELECT clause of a
query. Our book calls this an attribute list subquery, or an
inline subquery. It still requires you to name the subquery,
but you do it with a different syntax. The thing is, this time you are
creating a synthetic attribute instead of a synthetic table. Here is
an example:
SELECT P_CODE, P_PRICE, (SELECT AVG(P_PRICE) FROM PRODUCT)
AS AVGPRICE,
P_PRICE – (SELECT AVG(P_PRICE) FROM PRODUCT) AS DIFF
FROM PRODUCT;
The third attribute (shown in bold above) runs a subquery that
calculates the average price from the PRODUCT table. Note that the subquery
is contained inside parentheses, and that the name AVGPRICE is
given to it with the AS operator before the next comma. AVGPRICE
will be an aggregate value: it will be the same for every line
in the table, regardless of the fact that it is calculated again for
each line.
The next attribute (shown in italic in the example above) runs
the same subquery then subtracts that value from the P_PRICE value in
each line, and names the result DIFF. This values will be
different from one line to another, negative for those prices
that are lower
than the average. The subquery and subtraction will happen for each
line in the table. (This was misstated in the last version of these
notes. Correction made 2/22/2018)
The text continues its discussion with the topic of correlated subqueries,
which are a more advanced concept. It points out that each of the subquery
examples so far run independently of each other. This is not true for
a correlated subquery. The text compares it to a loop that runs
inside another loop, because the information in the outer loop is known
by the inner loop. The subqueries we have seen so far do not have this
feature.
Consider the first example in the text:
SELECT INV_NUMBER, P_CODE, LINE_UNITS
FROM LINE LS
WHERE LS.LINE_UNITS > (SELECT AVG(LINE_UNITS)
FROM LINE LA
WHERE LA.P_CODE = LS.P_CODE);
- The outer query begins with a selection of fields from the LINE
table.
- At this moment, the query is reading one particular line of data from
that table. Imagine that there is a record pointer that reads one line
at a time from the table. That's how this is working.
- Note the LS that appears after the table name in the FROM clause
of the outer query. This is the first alias given to this table.
- It is given another alias, LA, in the subquery.
- Think of these as meaning two separate reads that load the
same table into two separate parts of RAM.
- The next line begins the WHERE clause of the outer query. The
WHERE rule/condition/requirement references the value of the LINE_UNITS
field in the current line in LS version of the LINE table, the one being
read by the outer query.
- This value is about to be used in the subquery. The subquery mostly
uses the LA read of the LINE table.
- It computes the average number of units sold for the product whose
ID matches the P_CODE value of the line currently being read in the
outer query. This is a critical feature: the inner query has to know
what product the outer query is reading about.
- That average value is handed to the WHERE clause of the outer query,
which selects only lines whose number of sold units is greater than
the average number sold for that product code.
- This is fairly complex, so you should read through this part of the
notes a few times to understand it.
The author runs a second version of this query that has a subquery in
the SELECT clause whose purpose is to display the calculated average units
sold for any lines that are actually displayed to the operator. Note,
in the screen image of the runs of the two queries, the same rows are
printed, but in a different order. This demonstrates that you need to
use an ORDER BY command if you want consistent ordering of your lines
of data.
Let's move on to page 361, where the text discusses SQL functions. A
function is like a small program, in this case, a program built into the
SQL language. The text talks about date and time functions first, reminding
us that date and time data are saved in different formats in different
databases. This link will take you to a web page that discusses 59
such functions in MySQL. The text lists seven on page 365. Note
that the examples in the text are a little dangerous. When you use a function,
you typically have to pass it an argument in parentheses. This helps you
tell it apart from more ordinary usage of the name of that function. Doing
both in the same command is meant to be helpful, but may actually be more
confusing.
Function |
Usage |
DATE_FORMAT() |
MySQL DATE_FORMAT()
formats a date as specified in the argument. A list of format specifiers
given bellow may be used to format a date. |
YEAR() |
MySQL YEAR()
returns the year for a given date. |
MONTH() |
MySQL
MONTH() returns the month for the date within a range of 1 to 12 (
January to December). |
DAY() |
MySQL DAY()
returns the day of the month for a specified date. |
ADDDATE() |
MySQL ADDDATE()
adds a time value with a date.
|
DATE_ADD() |
MySQL DATE_ADD()
adds time values (as intervals) to a date value. The ADDDATE() is
the synonym of DATE_ADD(). |
LAST_DAY() |
MySQL LAST_DAY()
returns the last day of the corresponding month for a date or datetime
value. |
This link will take you to a list of numeric
functions in MySQL in the online manual. The text cautions us
not to confuse the aggregate functions we already know with these functions.
Aggregate functions summarize a column (or a subset of a column) of data,
but the functions in this chapter are related to individual rows. The
text offers three numeric functions. The links around the function names
below will take you to examples in the online manual.
Function |
Usage |
ABS() |
Return the
absolute value |
ROUND(x), ROUND(x,d)
|
Rounds to the nearest integer, or unless a second argument specifies a number of decimal places . |
CEIL()
CEILING() |
Return
the smallest integer value not less than the argument |
The text discusses a few sample string functions. The link in this line
will take you to the online
manual page for string functions. Each function, like the ones
linked to above, has its own section in manual's function examples, and
there are many more than the text includes in its sample.
- Concatenation - In MySQL, you can use the CONCAT() function to combine
the contents of two strings. In the example in the text, the content
of one field is combined with a short string containing a semicolon
and a space in a nested function call. The result is combined again
with the contents of another field, in the outer function call. The
result is given the label NAME.
SELECT CONCAT(CONCAT(EMP_LNAME, ', '),
EMP_FNAME) AS NAME
FROM EMPLOYEE;
The example in the text is missing a closing parenthesis, which
I have inserted in the text above.
- Changing case - In MySQL, you can use either the UPPER() function
or the UCASE() function to return a string in all upper case. You can
use either the LOWER() function or the LCASE() function to return a
string in all lower case. The examples in the text continue to show
the wrong number of parentheses.
mysql> SELECT UPPER('Hej');
-> 'HEJ'
- Substring retrieval - The text is not very clear about returning a
portion of a string in MySQL. This section
of the online manual is more enlightening.
mysql> SELECT SUBSTRING('Quadratically',5);
-> 'ratically'
In the example above, the string is read, and the portion that starts
at the fifth character is returned.
mysql> SELECT SUBSTRING('foobarbar' FROM 4);
-> 'barbar'
In the example above, the string is read, and the portion that starts
at the fourth character is returned.
mysql> SELECT SUBSTRING('Quadratically',5,6);
-> 'ratica'
In the example above, the string is read, and the portion that starts
at the fifth character and ends six character later is returned.
mysql> SELECT SUBSTRING('Sakila', -3);
-> 'ila'
In the example above, the string is read, and the portion that starts
at the third character from the end is returned.
- Length - MySQL uses the LENGTH()
function to return the length of a string. The example in the
text is incorrect. It sets this objective:
Lists all employee last names and the length of their names in
descending order by last name length.
It provides this code:
SELECT EMP_LNAME, LENGTH(EMP_LNAME) AS NAMESIZE
FROM EMPLOYEE;
What would you have to add to this command to meet the stated objective?
It is sometimes necessary to convert a value in one data type to another
data type. In MySQL, it is sometimes not necessary, as you see
if you follow this link to a discussion
of type conversion in the online manual. On the occasions in which
a conversion is necessary, MySQL provides the CAST()
function and the CONVERT() function. The text offers this example,
moving data from numeric type to character type in a query:
SELECT P_CODE, CAST(P_PRICE AS CHAR(8)) AS PRICE,
CAST(P_INDATE AS CHAR(20)) AS INDATE,
CAST(P_DISCOUNT AS CHAR(4)) AS DISC
FROM PRODUCT;
The conversions in the example above were not necessary, but they demonstrate
the use of the CAST() function for this purpose. In the example below,
the conversions are necessary, if you want to do math with a value from
a field that is stored as a character type:
SELECT CAST('-123.99' AS DECIMAL(8,2)),
CAST('-99.78' AS DECIMAL(8,2));
No math operation was performed here, but it could have been done once
the conversion was completed. It should be obvious that the name of a
field could have been used instead of a literal in each of the conversions
above, just as was done in the previous example.
Let's move on to page 377, where the text begins a discussion of views.
A view is essentially a filter that is designed for a user who needs to
see only some or particular combinations of items in a database. This
filtered view is described by the text as being a virtual table,
as opposed to an actual table in the database. The text refers to the
tables used to construct a view as that view's base tables.
As we have seen in the text before, a view is created with a command
like this one: CREATE VIEW name_of_view AS query;
The author puts the word SELECT before the word query in the text. I think
he is just trying to tell us that a view is usually the result of a query
that begins with SELECT.
Once a view is created, its name can be used in any SQL statement
that would otherwise use the name of a table. The text tells us
that views are dynamic. Every time they are used, they draw their
data fresh from the actual tables named in their query. This means that
a view is always up to date, which would not be true if we had saved the
results of the relevant query in a file. That file would be static, but
stable. The view in question would be dynamic, but would not work if the
tables needed were not available.
Views can often be used to update the files that were used to
create them. The text presents a list of restrictions on such operations,
and warns us that rules may vary from vendor to vendor. For a view to
considered an updatable view:
- Don't use GROUP BY or aggregate commands to create the
view. The UPDATE process would have no way to know what records to update.
- Don't use UNION, INTERSECT, or MINUS set operations
to create the view.
- Integrity of the key fields of the base tables must be maintained
in the update. There can be no duplicate items in a key field after
the update.
On page 382, the text discusses three methods of autoincrementing a primary key field, and introduces a fourth method.
- MS Access uses a data type it calls AutoNumber.
This type, when applied to a column in a table, will increment its
value by one with each new record. Access will not let you edit the
numbers it places in this column.
- MS SQL allows you to put one Identity column in each table. It performs the same function as the AutoNumber column in Access.
- MySQL allows you to assign AUTO_INCREMENT
to a column when you create a table. Only one column in a table can
have this feature. MySQL allows you to set the start value for the
column that will AUTO_INCREMENT to a value other than 1.
The fourth method is the one used by Oracle in versions before 12c, which is to use a sequence.
MS SQL Server now supports sequences, too. Its notation is a bit
different. The discussion of sequences in this chapter is mostly about
Oracle. This feature serves the same purpose as the features above, but
it works differently. A list of characteristics of a sequence appears
on page 383.
- A sequence is a separate database object.
- Sequences must be named.
- A sequence can be used in any table.
- Data in columns that are assigned to a sequence can still be modified.
- Sequences can start with any number, and can increment by any amount.
- You can use the DROP command to delete a sequence. This will not affect the data that it has already added to any table.
The basic format for creating a sequence looks like this:
CREATE SEQUENCE name [START WITH n] [INCREMENT BY n]
[CACHE | NOCACHE]
You can see by the notation above that setting the START WITH value is optional. So is setting the INCREMENT BY value. By default, each is set to 1 when the sequence is created.
On page 387, the text begins a discussion of programming techniques with SQL. The text starts out by telling us that standard SQL does not
include commands that support selection (conditional branching) or
looping. You can work around this by writing code in a regular
programming language that also calls SQL commands. You may also be able
to use standard programming techniques if your vendor supports them with extensions to SQL.
If SQL extensions exist for your database, you can create persistent stored modules (PSMs).
Storing a module in memory can decrease the time required to use such a
module. The major vendors discussed in the text offer version of SQL to
do this.
- MS SQL - provides Transact-SQL and .NET programming languages
- Oracle - provides Procedural Language SQL (PL/SQL)
- MySQL - provides a procedural version similar to Oracle's version
The text gives examples of writing PL/SQL programs in an
Oracle system. A PL/SQL code block can be created, and users can be
granted rights to execute it. Users can also not be granted rights, which allows restrictions on the use of a code block.
Code blocks that can be called by the system itself are called triggers.
- They run when an anticipated event occurs in the database.
- A table may have one or more triggers that run when changes are made to that table.
- Triggers are commonly used to enforce integrity constraints.
- Triggers may use DML commands, but may not be allowed to use DDL commands. The triggering event may be based on a DDL command.
- Like other PL/SQL blocks, triggers can use variables and follow programming logic.
The text provides a generic example of the command to create a trigger:
CREATE OR REPLACE TRIGGER trigger_name
[BEFORE / AFTER] [DELETE / INSERT / UPDATE OF column_name] ON table_name
[FOR EACH ROW]
[DECLARE]
[variable_namedata type[:=initial_value] ]
BEGIN
PL/SQL instructions;
…
END;
The code above has lots of optional features. Note the
language in the first line that says CREATE OR REPLACE, which will
cause it to overwrite an existing trigger with the same name. That line
is required, as is the associated table name, the BEGIN and END
markers, and the PL/SQL commands that are bounded by BEGIN and END.
The text explains that the words FOR EACH ROW make the trigger a row-level trigger,
which can run for each change to a different row in the associated
table. If those words are not included in the creation command, the
trigger will be a statement-level trigger, which is run once regardless of how many rows are changed in the associated table.
Note the two semicolons shown in the template above. The one at the end of the command is expected. The text warns us that each statement between BEGIN and END must also end in a semicolon. That is what the semicolon at the end of PL/SQL instructions; represents.
The text continues this discussion for several pages with several examples. On page 401, the text introduces stored procedures, which are like triggers as far as the code and commands that can be in them. The main difference seems to be that a stored procedure can be called by a user with rights to it, but it is not expected to run automatically. Several examples of this kind of object are shown in the text.
On page 407, the text points out a problem with PL/SQL objects. These programs are limited in that they can only return one
value. Returning more than one value will cause an error. If you have
to receive more than one value (such as a table), you need to declare a
section of memory to receive it. For some reason, this holding area
(which is in the server's memory, not your workstation's memory) is
called a cursor. It doesn't make sense, but that's what the text calls it. At this discussion on another web site, the cursor is explained as being a pointer to the actual storage
space in memory called a context area. The explanation in our text is a
simpler model, and it works as well as the more complicated one.
Cursors can be implicit or explicit. An implicit cursor is the little one that the program uses by default when it generates only one value. An explicit cursor must be created with a command like this one:
CURSOR cursor_name IS select-query;
In the command above, select-query
stands for the actual command or program that is going to return the multivalued
answer. The code below is an example from the web site in the link above:
DECLARE
c_id customers.id%type;
c_name customerS.No.ame%type;
c_addr customers.address%type;
CURSOR c_customers is
SELECT id, name, address FROM customers;
BEGIN
OPEN c_customers;
LOOP
FETCH c_customers into c_id, c_name, c_addr;
EXIT WHEN c_customers%notfound;
dbms_output.put_line(c_id || ' ' || c_name || ' ' || c_addr);
END LOOP;
CLOSE c_customers;
END;
Notice that the cursor is opened, fetched, and closed
in the body of the PL part of the example. It was created before the
BEGIN statement, along with three variables used in the program. The
text shows a couple of examples of using a cursor. The link above will
take you to a bit more.
In the code example above, the data in question is pulled by
the SELECT query, and processed by the PL portion of the saved block.
This is how we expect a stored procedure to work. On page 409, the text shows us an example of a code block that returns its payload in the PL section. That makes it a stored function. Other than that difference, the two types of code blocks are very similar.
The last topic in the chapter (huge sigh!) is embedded SQL.
The topic is about putting SQL commands in a program written in a
standard, high-level language. The text refers to the programming
language used as the host language, since the embedded SQL will be in
it. Not all DBMS systems allow embedded SQL. This list on Wikipedia
(Thank you, Jimmy) lists lots of DBMS systems, and notes that many
simply do not support this concept. The author warned us early in the
chapter that there would be several examples of code from Oracle. This
is a good reason that this is so. The basic format you may want to know
is this:
EXEC SQL
SQL statement;
END-EXEC
The labels are markers for the start and end of the SQL code embedded in the program.
EXEC SQL
DELETE FROM EMPLOYEE WHERE EMP_NUM = :W_EMP_NUM;
END-EXEC
The code above demonstrates the use of a variable from the
host code. W_EMP_NUM is preceded by a colon as a tag that tells SQL it
is a variable from the program itself.
|