CIS 2210 - Database Management and Design

Chapter 8: Advanced SQL

Objectives:

This lesson discusses material from chapter 8. Objectives important to this lesson:

  1. More advanced joins
  2. Subqueries and correlated queries
  3. Working with dates, strings, and other data types
  4. Relational set operators
  5. Views
  6. Triggers and procedures
  7. Embedded SQL
Concepts:

Chapter 8

Introduction

The chapter continues the discussion of joins that began at the end of the last chapter. Relational joins take advantage of the fact that relational databases link tables through common attributes. This is one of the three conditions the text tells make joins possible. For related tables, there must be a way to look at their columns and match rows from one table to the other. This is usually expressed as WHERE table1.column = table2.column.

The text makes an interesting point that if we join three tables, we would actually join table 1 and table 2, then join the result of that query with table 3. This can sort of be done in one query, but it requires a subquery, so there are actually two parts. There will also have to be a WHERE condition for each of the joins you do, so the text gives us a formula: the number of conditions in the WHERE clause will be the number of tables joined minus one.

The chapter gets very term oriented next, so let's dig in and get it over with. There are several kinds of joins, some of which only differ in syntax. You tell them apart by how they do what they do. I will arrange this table to match what you already know.

Join Class
Join Type
Example
Description
INNER (4 types)
Old-Style (classic)
SELECT *
FROM T1, T2
WHERE T1.C1=T2.C1
Returns only the rows that meet the join condition in the WHERE clause. Only rows with matching values are selected.

NATURAL
SELECT *
FROM T1 NATURAL JOIN T2
Assumes that you have matching columns in the two tables. Returns only the rows with matching values in the matching columns, unless there are no matching values. In that case, it returns a relational product of the two tables.

JOIN USING
SELECT *
FROM T1 JOIN T2 USING (C1)
Returns only the rows with matching values in the columns indicated in the USING clause. This syntax requires that the related column has the same name in each table.

JOIN ON
SELECT *
FROM T1 JOIN T2 ON T1.C1=T2.C1
Returns only the rows that meet the join condition
indicated in the ON clause. This is more useful when the column names do not match, but it requires table name qualification. (See discussion at the link above.)
OUTER (3 types)
LEFT JOIN
SELECT *
FROM T1 LEFT OUTER JOIN T2
ON T1.C1=T2.C1
Returns rows with matching values and includes
all rows from the left table (T1) with unmatched
values. This is what makes it a LEFT join.

RIGHT JOIN
SELECT *
FROM T1 RIGHT OUTER JOIN T2
ON T1.C1=T2.C1
Returns rows with matching values and includes
all rows from the right table (T2) with unmatched
values. This is what makes it a RIGHT join.

FULL JOIN
SELECT *
FROM T1 FULL OUTER JOIN T2
ON T1.C1=T2.C1
Returns rows with matching values and includes
all rows from both tables (T1 and T2) with
unmatched values. This is what makes it unsuitable for most purposes.
CROSS (2 syntaxes)
CROSS JOIN
SELECT *
FROM T1, T2
Returns the Cartesian product of T1 and T2.

CROSS JOIN
SELECT *
FROM T1 CROSS JOIN T2
Returns the Cartesian product of T1 and T2.
  • In INNER JOINS, we are looking for matches. The NATURAL JOIN assumes that there is a column on which to match values. JOIN USING requires us the name the column. JOIN ON allows us to specify the names of the columns, whether they are different or not, and it looks the same as the old style except for the use of the word JOIN in place of the comma between table names.
  • OUTER JOINS return rows that match, and also rows that do not match. The rows that do not match can come from one of the tables or both, depending on the version you choose.
  • CROSS JOINS produce a number of rows that is best described as every row in the first table matched with every row in the second table. The number of rows returned is the actual product of the rows in table 1 times the rows in table 2. Access does not support the second syntax, but does support the first.

Subqueries

The text returns to the idea of nested queries (queries that have subqueries) on page 349. The first point discussed is that sometimes you have to run a query to get the data you actually want to run a query about. In the example below, the nested query runs first, which is controlled by the parentheses around it.

SELECT V_CODE, V_NAME FROM VENDOR
WHERE V_CODE NOT IN (SELECT V_CODE FROM PRODUCT);

Remember: the line returns in the examples are ignored by the SQL engine. Each SQL command is really just one long (or short) line.

We need the nested query to run first because we want a list of all V_CODEs in the PRODUCT table. Then we want to know which vendors in the VENDOR table are not in that list. This query could have been run differently, but the principle in well illustrated. We first determine which vendors are being used, then check which known vendors are not on that list.

Subqueries are also called inner queries, because they are inside parentheses, inside the main query, which is also called the outer query. The text provides several examples of using subqueries to obtain information that in turn can be queried. Care should be taken to make sure that the outer query receives the kind of information it (you) are expecting from the inner query: one column from one row, one column from multiple rows (a list), or one table. It is best to troubleshoot this kind of situation by running only the inner query, to make sure it produces what you expect, then running the combined nested query. Sometimes your inner query will produce no result, and either your outer query will have to deal with that, or you will have to recompose the inner query.

  • The text points out that a subquery is often run on the right side of an assignment operator or a comparison operator as part of a WHERE clause. This is sensible, because such equations are always evaluated on their right side first. Do not depend on this priority if you are tempted to do so. Keep your inner queries inside parentheses and they will be easier to find and troubleshoot.
  • Sometimes you will want a subquery to return a list of data, such as a single column with multiple rows. When you do this, make sure to use the IN operator to the left of the subquery. It is the appropriate operator to search a list.
  • When using a GROUP BY command to create summarized data, consider using a HAVING clause to to restrict which rows are summarized. The text offers this as an example:

    SELECT P_CODE, SUM(LINE_UNITS)
    FROM LINE
    GROUP BY P_CODE
    HAVING SUM(LINE_UNITS) > (SELECT AVG(LINE_UNITS) FROM LINE);

    As you can see, there is a HAVING line which runs a subquery whose purpose is to calculate the average of LINE_UNITS, then select for lines having values greater than that average

  • The text points out that the IN operator looks for exact matches in a list, but you may want to look for things that do not match. The ANY and ALL operators are used to match a query against each of the results of a subquery, but in two different ways.

    If the comparison is like this:
    WHERE field < ANY (inner query)
    the outer query will match if the value being compared is less than any of the values in the subquery. It does not matter how many items it works for, as long as there is one that works.

    If the comparison is like this:
    WHERE field < ALL (inner query)
    the outer query will match if the value being compared is less than every value in the subquery. In this case, it has to compare to every value returned by the subquery, and the comparison has to be valid for all of them. If there is even one for which the comparison is not true, then the whole thing is not true.
  • Subqueries can also appear in the FROM clause of a SELECT command. That takes a bit of explaining. The purpose of the FROM clause is to specify tables from which to draw the columns and rows. It may be necessary to run another SELECT command first, whose output will be used as a table to draw from. This is fine, as long as you give it a name for reference. You also have to make sure that the inner query produces what the outer query has to have. The text provides an example:

    SELECT DISTINCT CUSTOMER.CUS_CODE, CUSTOMER.CUS_LNAME
    FROM CUSTOMER,
    (SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE
    WHERE P_CODE = '13-Q2/P2') CP1,
    (SELECT INVOICE.CUS_CODE FROM INVOICE NATURAL JOIN LINE
    WHERE P_CODE = '23109-HB') CP2
    WHERE CUSTOMER.CUS_CODE = CP1.CUS_CODE AND
    CP1.CUS_CODE = CP2.CUS_CODE;

    In this example, the FROM clause contains one actual table and two inner queries called CP1 and CP2 that produce what the text calls virtual tables. Note the addition of the name after the closing parenthesis for each query. This allows you to refer to these virtual tables in the WHERE clause of the outer query. Each of those subqueries contains its own SELECT, FROM, and WHERE clauses.
  • Another variation is to place a subquery in the SELECT clause of a query. Our book calls this an attribute list subquery, or an inline subquery. It still requires you to name the subquery, but you do it with a different syntax. The thing is, this time you are creating a synthetic attribute instead of a synthetic table. Here is an example:

    SELECT P_CODE, P_PRICE, (SELECT AVG(P_PRICE) FROM PRODUCT)
    AS AVGPRICE
    ,
    P_PRICE – (SELECT AVG(P_PRICE) FROM PRODUCT) AS DIFF
    FROM PRODUCT;

    The third attribute (shown in bold above) runs a subquery that calculates the average price from the PRODUCT table. Note that the subquery is contained inside parentheses, and that the name AVGPRICE is given to it with the AS operator before the next comma. AVGPRICE will be an aggregate value: it will be the same for every line in the table, regardless of the fact that it is calculated again for each line.

    The next attribute (shown in italic in the example above) runs the same subquery then subtracts that value from the P_PRICE value in each line, and names the result DIFF. This values will be different from one line to another, negative for those prices that are lower than the average. The subquery and subtraction will happen for each line in the table. (This was misstated in the last version of these notes. Correction made 2/22/2018)

The text continues its discussion with the topic of correlated subqueries, which are a more advanced concept. It points out that each of the subquery examples so far run independently of each other. This is not true for a correlated subquery. The text compares it to a loop that runs inside another loop, because the information in the outer loop is known by the inner loop. The subqueries we have seen so far do not have this feature.

Consider the first example in the text:

SELECT INV_NUMBER, P_CODE, LINE_UNITS
FROM LINE LS
WHERE LS.LINE_UNITS > (SELECT AVG(LINE_UNITS)
FROM LINE LA
WHERE LA.P_CODE = LS.P_CODE);

  • The outer query begins with a selection of fields from the LINE table.
  • At this moment, the query is reading one particular line of data from that table. Imagine that there is a record pointer that reads one line at a time from the table. That's how this is working.
  • Note the LS that appears after the table name in the FROM clause of the outer query. This is the first alias given to this table.
  • It is given another alias, LA, in the subquery.
  • Think of these as meaning two separate reads that load the same table into two separate parts of RAM.
  • The next line begins the WHERE clause of the outer query. The WHERE rule/condition/requirement references the value of the LINE_UNITS field in the current line in LS version of the LINE table, the one being read by the outer query.
  • This value is about to be used in the subquery. The subquery mostly uses the LA read of the LINE table.
  • It computes the average number of units sold for the product whose ID matches the P_CODE value of the line currently being read in the outer query. This is a critical feature: the inner query has to know what product the outer query is reading about.
  • That average value is handed to the WHERE clause of the outer query, which selects only lines whose number of sold units is greater than the average number sold for that product code.
  • This is fairly complex, so you should read through this part of the notes a few times to understand it.

The author runs a second version of this query that has a subquery in the SELECT clause whose purpose is to display the calculated average units sold for any lines that are actually displayed to the operator. Note, in the screen image of the runs of the two queries, the same rows are printed, but in a different order. This demonstrates that you need to use an ORDER BY command if you want consistent ordering of your lines of data.

Let's move on to page 361, where the text discusses SQL functions. A function is like a small program, in this case, a program built into the SQL language. The text talks about date and time functions first, reminding us that date and time data are saved in different formats in different databases. This link will take you to a web page that discusses 59 such functions in MySQL. The text lists seven on page 365. Note that the examples in the text are a little dangerous. When you use a function, you typically have to pass it an argument in parentheses. This helps you tell it apart from more ordinary usage of the name of that function. Doing both in the same command is meant to be helpful, but may actually be more confusing.

Function Usage
DATE_FORMAT() MySQL DATE_FORMAT() formats a date as specified in the argument. A list of format specifiers given bellow may be used to format a date.
YEAR() MySQL YEAR() returns the year for a given date.
MONTH() MySQL MONTH() returns the month for the date within a range of 1 to 12 ( January to December).
DAY() MySQL DAY() returns the day of the month for a specified date.
ADDDATE() MySQL ADDDATE() adds a time value with a date.
DATE_ADD() MySQL DATE_ADD() adds time values (as intervals) to a date value. The ADDDATE() is the synonym of DATE_ADD().
LAST_DAY() MySQL LAST_DAY() returns the last day of the corresponding month for a date or datetime value.

This link will take you to a list of numeric functions in MySQL in the online manual. The text cautions us not to confuse the aggregate functions we already know with these functions. Aggregate functions summarize a column (or a subset of a column) of data, but the functions in this chapter are related to individual rows. The text offers three numeric functions. The links around the function names below will take you to examples in the online manual.

Function Usage
ABS() Return the absolute value
ROUND(x), ROUND(x,d)
Rounds to the nearest integer, or unless a second argument specifies a number of decimal places .
CEIL()
CEILING()
Return the smallest integer value not less than the argument

The text discusses a few sample string functions. The link in this line will take you to the online manual page for string functions. Each function, like the ones linked to above, has its own section in manual's function examples, and there are many more than the text includes in its sample.

  • Concatenation - In MySQL, you can use the CONCAT() function to combine the contents of two strings. In the example in the text, the content of one field is combined with a short string containing a semicolon and a space in a nested function call. The result is combined again with the contents of another field, in the outer function call. The result is given the label NAME.
    SELECT CONCAT(CONCAT(EMP_LNAME, ', '),
    EMP_FNAME) AS NAME
    FROM EMPLOYEE;
    The example in the text is missing a closing parenthesis, which I have inserted in the text above.
  • Changing case - In MySQL, you can use either the UPPER() function or the UCASE() function to return a string in all upper case. You can use either the LOWER() function or the LCASE() function to return a string in all lower case. The examples in the text continue to show the wrong number of parentheses.
    mysql> SELECT UPPER('Hej');
    -> 'HEJ'
  • Substring retrieval - The text is not very clear about returning a portion of a string in MySQL. This section of the online manual is more enlightening.
    mysql> SELECT SUBSTRING('Quadratically',5);
    -> 'ratically'
    In the example above, the string is read, and the portion that starts at the fifth character is returned.
    mysql> SELECT SUBSTRING('foobarbar' FROM 4);
    -> 'barbar'
    In the example above, the string is read, and the portion that starts at the fourth character is returned.
    mysql> SELECT SUBSTRING('Quadratically',5,6);
    -> 'ratica'
    In the example above, the string is read, and the portion that starts at the fifth character and ends six character later is returned.
    mysql> SELECT SUBSTRING('Sakila', -3);
    -> 'ila'
    In the example above, the string is read, and the portion that starts at the third character from the end is returned.
  • Length - MySQL uses the LENGTH() function to return the length of a string. The example in the text is incorrect. It sets this objective:

    Lists all employee last names and the length of their names in
    descending order by last name length.

    It provides this code:

    SELECT EMP_LNAME, LENGTH(EMP_LNAME) AS NAMESIZE
    FROM EMPLOYEE;

    What would you have to add to this command to meet the stated objective?

It is sometimes necessary to convert a value in one data type to another data type. In MySQL, it is sometimes not necessary, as you see if you follow this link to a discussion of type conversion in the online manual. On the occasions in which a conversion is necessary, MySQL provides the CAST() function and the CONVERT() function. The text offers this example, moving data from numeric type to character type in a query:

SELECT P_CODE, CAST(P_PRICE AS CHAR(8)) AS PRICE,
CAST(P_INDATE AS CHAR(20)) AS INDATE,
CAST(P_DISCOUNT AS CHAR(4)) AS DISC
FROM PRODUCT;

The conversions in the example above were not necessary, but they demonstrate the use of the CAST() function for this purpose. In the example below, the conversions are necessary, if you want to do math with a value from a field that is stored as a character type:

SELECT CAST('-123.99' AS DECIMAL(8,2)),
CAST('-99.78' AS DECIMAL(8,2));

No math operation was performed here, but it could have been done once the conversion was completed. It should be obvious that the name of a field could have been used instead of a literal in each of the conversions above, just as was done in the previous example.

Let's move on to page 377, where the text begins a discussion of views. A view is essentially a filter that is designed for a user who needs to see only some or particular combinations of items in a database. This filtered view is described by the text as being a virtual table, as opposed to an actual table in the database. The text refers to the tables used to construct a view as that view's base tables.

As we have seen in the text before, a view is created with a command like this one: CREATE VIEW name_of_view AS query;
The author puts the word SELECT before the word query in the text. I think he is just trying to tell us that a view is usually the result of a query that begins with SELECT.

Once a view is created, its name can be used in any SQL statement that would otherwise use the name of a table. The text tells us that views are dynamic. Every time they are used, they draw their data fresh from the actual tables named in their query. This means that a view is always up to date, which would not be true if we had saved the results of the relevant query in a file. That file would be static, but stable. The view in question would be dynamic, but would not work if the tables needed were not available.

Views can often be used to update the files that were used to create them. The text presents a list of restrictions on such operations, and warns us that rules may vary from vendor to vendor. For a view to considered an updatable view:

  • Don't use GROUP BY or aggregate commands to create the view. The UPDATE process would have no way to know what records to update.
  • Don't use UNION, INTERSECT, or MINUS set operations to create the view.
  • Integrity of the key fields of the base tables must be maintained in the update. There can be no duplicate items in a key field after the update.

On page 382, the text discusses three methods of autoincrementing a primary key field, and introduces a fourth method.

  • MS Access uses a data type it calls AutoNumber. This type, when applied to a column in a table, will increment its value by one with each new record. Access will not let you edit the numbers it places in this column.
  • MS SQL allows you to put one Identity column in each table. It performs the same function as the AutoNumber column in Access.
  • MySQL allows you to assign AUTO_INCREMENT to a column when you create a table. Only one column in a table can have this feature. MySQL allows you to set the start value for the column that will AUTO_INCREMENT to a value other than 1.

The fourth method is the one used by Oracle in versions before 12c, which is to use a sequence. MS SQL Server now supports sequences, too. Its notation is a bit different. The discussion of sequences in this chapter is mostly about Oracle. This feature serves the same purpose as the features above, but it works differently. A list of characteristics of a sequence appears on page 383.

  • A sequence is a separate database object.
  • Sequences must be named.
  • A sequence can be used in any table.
  • Data in columns that are assigned to a sequence can still be modified.
  • Sequences can start with any number, and can increment by any amount.
  • You can use the DROP command to delete a sequence. This will not affect the data that it has already added to any table.

The basic format for creating a sequence looks like this:
CREATE SEQUENCE name [START WITH n] [INCREMENT BY n]
[CACHE | NOCACHE]

You can see by the notation above that setting the START WITH value is optional. So is setting the INCREMENT BY value. By default, each is set to 1 when the sequence is created.

On page 387, the text begins a discussion of programming techniques with SQL. The text starts out by telling us that standard SQL does not include commands that support selection (conditional branching) or looping. You can work around this by writing code in a regular programming language that also calls SQL commands. You may also be able to use standard programming techniques if your vendor supports them with extensions to SQL.

If SQL extensions exist for your database, you can create persistent stored modules (PSMs). Storing a module in memory can decrease the time required to use such a module. The major vendors discussed in the text offer version of SQL to do this.

  • MS SQL - provides Transact-SQL and .NET programming languages
  • Oracle - provides Procedural Language SQL (PL/SQL)
  • MySQL - provides a procedural version similar to Oracle's version

The text gives examples of writing PL/SQL programs in an Oracle system. A PL/SQL code block can be created, and users can be granted rights to execute it. Users can also not be granted rights, which allows restrictions on the use of a code block.

Code blocks that can be called by the system itself are called triggers.

  • They run when an anticipated event occurs in the database.
  • A table may have one or more triggers that run when changes are made to that table.
  • Triggers are commonly used to enforce integrity constraints.
  • Triggers may use DML commands, but may not be allowed to use DDL commands. The triggering event may be based on a DDL command.
  • Like other PL/SQL blocks, triggers can use variables and follow programming logic.

The text provides a generic example of the command to create a trigger:

CREATE OR REPLACE TRIGGER trigger_name
[BEFORE / AFTER] [DELETE / INSERT / UPDATE OF column_name] ON table_name
[FOR EACH ROW]
[DECLARE]
[variable_namedata type[:=initial_value] ]
BEGIN
PL/SQL instructions;

END;

The code above has lots of optional features. Note the language in the first line that says CREATE OR REPLACE, which will cause it to overwrite an existing trigger with the same name. That line is required, as is the associated table name, the BEGIN and END markers, and the PL/SQL commands that are bounded by BEGIN and END.

The text explains that the words FOR EACH ROW make the trigger a row-level trigger, which can run for each change to a different row in the associated table. If those words are not included in the creation command, the trigger will be a statement-level trigger, which is run once regardless of how many rows are changed in the associated table.

Note the two semicolons shown in the template above. The one at the end of the command is expected. The text warns us that each statement between BEGIN and END must also end in a semicolon. That is what the semicolon at the end of PL/SQL instructions; represents.

The text continues this discussion for several pages with several examples. On page 401, the text introduces stored procedures, which are like triggers as far as the code and commands that can be in them. The main difference seems to be that a stored procedure can be called by a user with rights to it, but it is not expected to run automatically. Several examples of this kind of object are shown in the text.

On page 407, the text points out a problem with PL/SQL objects. These programs are limited in that they can only return one value. Returning more than one value will cause an error. If you have to receive more than one value (such as a table), you need to declare a section of memory to receive it. For some reason, this holding area (which is in the server's memory, not your workstation's memory) is called a cursor. It doesn't make sense, but that's what the text calls it. At this discussion on another web site, the cursor is explained as being a pointer to the actual storage space in memory called a context area. The explanation in our text is a simpler model, and  it works as well as the more complicated one.

Cursors can be implicit or explicit. An implicit cursor is the little one that the program uses by default when it generates only one value. An explicit cursor must be created with a command like this one:

CURSOR cursor_name IS select-query;

In the command above, select-query stands for the actual command or program that is going to return the multivalued answer. The code below is an example from the web site in the link above:

DECLARE
   c_id customers.id%type;
   c_name customerS.No.ame%type;
   c_addr customers.address%type;
   CURSOR c_customers is
      SELECT id, name, address FROM customers;
BEGIN
   OPEN c_customers;
   LOOP
   FETCH c_customers into c_id, c_name, c_addr;
      EXIT WHEN c_customers%notfound;
      dbms_output.put_line(c_id || ' ' || c_name || ' ' || c_addr);
   END LOOP;
   CLOSE c_customers;
END;

Notice that the cursor is opened, fetched, and closed in the body of the PL part of the example. It was created before the BEGIN statement, along with three variables used in the program. The text shows a couple of examples of using a cursor. The link above will take you to a bit more.

In the code example above, the data in question is pulled by the SELECT query, and processed by the PL portion of the saved block. This is how we expect a stored procedure to work. On page 409, the text shows us an example of a code block that returns its payload in the PL section. That makes it a stored function. Other than that difference, the two types of code blocks are very similar.

The last topic in the chapter (huge sigh!) is embedded SQL. The topic is about putting SQL commands in a program written in a standard, high-level language. The text refers to the programming language used as the host language, since the embedded SQL will be in it. Not all DBMS systems allow embedded SQL. This list on Wikipedia (Thank you, Jimmy) lists lots of DBMS systems, and notes that many simply do not support this concept. The author warned us early in the chapter that there would be several examples of code from Oracle. This is a good reason that this is so. The basic format you may want to know is this:

EXEC SQL
SQL statement;
END-EXEC

The labels are markers for the start and end of the SQL code embedded in the program.

EXEC SQL
DELETE FROM EMPLOYEE WHERE EMP_NUM = :W_EMP_NUM;
END-EXEC

The code above demonstrates the use of a variable from the host code. W_EMP_NUM is preceded by a colon as a tag that tells SQL it is a variable from the program itself.