CIS 2210 - Database Management and Design

Chapter 7: Introduction to Structured Query Language (SQL)

Objectives:

This lesson discusses material from chapter 7. Objectives important to this lesson:

  1. Basic commands and functions
  2. Using SQL for data administration
  3. Using SQL for data manipulation
  4. Using SQL for queries
Concepts:

Chapter 7

Introduction

The chapter begins our actual discussion of SQL. The chapter begins with a statement about SQL having a variety of commands that provide for database management (usually done by administrators) as well as more normal database operations (usually done by users). Its commands can be sorted into two categories that address this division:

  • data definition language (DDL) - commands to create database and user objects, to apply constraints on the use of those objects, and to manage access rights to objects
  • data manipulation language (DML) - commands to insert, delete, change, and selectively retrieve data

As you can see, the divisions have to do with the structures and objects of the database itself (DDL), or with the data in the database (DML). Tables 7.1 and 7.2 show you several examples of commands that fall into each category. They won't mean much to you without a context for using them, so we will begin discussing them in the next section below.

A standard version of SQL has been adopted by the American National Standards Institute, and by ISO, an international standards group. The text cautions us that this version of SQL is usually a subset of the version being used in a vendor's DBMS product. There are often enhancements to the language in each version whose syntaxes vary from one vendor to another. This makes it unlikely that a company using one DBMS for complex work could move their code to another DBMS without recoding to match the new DBMS's syntax.

That being said, the text moves on to discuss standard SQL commands that fall under the data definition category.

Data Definition Commands

The text presents a small database with five tables in Figure 7.1, which is an Entity Relationship Diagram.

image of database described below

  • There is a Customer table with seven attributes. Each record is identified by a value in the CUS_CODE field (the primary key).
  • There is an Invoice table with three attributes. Each record is identified by a value in the INV_NUMBER field, and is linked to the Customer table by its CUS_CODE field values.
  • There is a Line table, which holds information about lines on invoices. It has five attributes, and its key is a combination of its INV_NUMBER and LINE_NUMBER fields.
  • There is a Product table with eight attributes. Its key field is P_CODE, which is a foreign key in the Line table.
  • There is a Vendor table with seven attributes. Its key is V_CODE, which is a foreign key in the Product table.

You should look at the relationships between these tables in the text. Realize that the instances in the Line table are actually parts of invoices. That table was created as a separate table from the Invoice table to resolve dependency problems.

For this discussion, the text uses mainly the Product and Vendor tables. The text presents a long discussion of the database which is probably pretty boring to all of us. Its purpose is to show you that the ERD and the business rules are needed to generate reasons for the tables to exist as they do. We can examine this data in class. For these notes, let's move ahead to section 7-2d of the chapter. This skips the creation of a database space, which has often not been possible with MySQL. We are planning to use MySQL for the platform in this class. If you want to download it and install it on your own Windows computer, you may find it useful to watch this video about doing so. I recommend that you watch it with the playback speed set to 1.5 times normal.

In section 7-2d, the author has already assumed a few rules about building tables that you need to consider yourself when creating your own. When downloading tables from our publisher, considerations like "rules about capitalization" will already have been made for you, and you need to inspect the table to learn what they are. When you create your own tables, you have to make rules for yourself. For example, will you capitalize the names of files, attributes, or any data element? The author of our text favors using all caps for table and attribute names, which I find a bit ugly. Issuing effective commands in SQL requires that you know the capitalization rules that are being used in the database you are trying to maintain or query. A command will fail if the name of an object in that command is not spelled or capitalized as it exists in your database. The chapter refers to Table 7.3 as a data dictionary, which is a useful summary of information about every table and every attribute in it. You will want to make notes in a data dictionary that includes this information about every table in your database as you make decisions that affect them. Why? It becomes a helpful reference for your database, assuming you create and update this file.

In this one-page data dictionary, we see two tables and each of their attributes. Each attribute's name is given, then a description of the data it will hold, and the data type that has been assigned to that field. Note that the range column in this reference table describes the allowed range of values for that field in this database, not the range actually allowed by the data type. The data type is typically more flexible than you will want your table to be.

  • Several attributes are VARCHAR fields. This data type is used for alphanumeric data. Its name means "variable length character data". Note that the data dictionary says the type of the P_CODE field is VARCHAR(10), meaning that the field can hold up to 10 characters. VARCHAR is less wasteful than the old CHAR type, which would always be as wide as it was allowed to be. VARCHAR fields only use as much storage space in the database as the actual data typed into them.
  • Most databases have several numeric data types, such as:
    • INTEGER fields can only hold whole numbers. Their data type may be called INT instead of INTEGER. INTEGERs can be signed or unsigned. In MySQL, the signed range is -2147483648 to 2147483647. The unsigned range is 0 to 4294967295.
    • If you are sure your integer values will be less than a much smaller value, your database may allow the SMALLINT data type. This is the same as the INTEGER data type but the MySQL signed range for it is -32768 to 32767. The unsigned range is 0 to 65535. It saves space in data files.
    • There are several numeric types that hold decimal values as well. The text mentions NUMBER and DECIMAL. MySQL uses DECIMAL, FLOAT, and DOUBLE. Take a look at the web page in the link above for a discussion of the ranges of those (and other) numeric types.
  • DATE and TIME types vary from one language to another. In MySQL, there are types for DATE, DATETIME, TIMESTAMP, TIME, and YEAR. DATETIME is a combination of date and time data. Look at this reference page in the MySQL online manual for a discussion of each of them.

At the text explains, you typically choose a numeric data type instead of a character data type when you know that you will have to do math with the data. This includes auto-incrementation of numbers, such as assigning record numbers when new instances are added to a table. It also includes fields that may be searched with comparison operators, such as looking for all records in which a field's value is less than a supplied value. On the other hand, when you know that you will never do math with data that looks like a string of digits, like ID numbers, you will want to make the field that holds such data a character type.

It is time that we saw some commands, so, assuming you are going to create the VENDOR table, this is a MySQL command that would create its structure. After this command is issued, the table will still be empty:

CREATE TABLE VENDOR (
V_CODE INTEGER NOT NULL UNIQUE,
V_NAME VARCHAR(35) NOT NULL,
V_CONTACT VARCHAR(15) NOT NULL,
V_AREACODE CHAR(3) NOT NULL,
V_PHONE CHAR(8) NOT NULL,
V_STATE CHAR(2) NOT NULL,
V_ORDER CHAR(1) NOT NULL,
PRIMARY KEY (V_CODE));

See what I mean about too many capitals? It's hard to read for people who have learned that capital letters mean "shouting". It is standard to use capital letters for key words in the SQL language, and key words make up most of this command.

  • The command begins with three words that you can think of as being in a verb-noun-adjective format.
  • CREATE TABLE VENDOR means to CREATE a TABLE called VENDOR.
  • This phrase is followed by an opening (left) parenthesis, and a line return for readability. The line return does not end the command.
  • The second line is still part of the same command, as is everything until the closing semicolon. That is what marks the end of a command.
  • Inside the parentheses, we have a series of phrases, separated by commas. Each phrase is about one of the attributes of the table.
  • The first word inside the parentheses has to be the name of a field/attribute/characteristic. The author called the first field V_CODE. The second word is the data type for the field. The author states that it is an INTEGER field. It is further described as having two constraints: NOT NULL and UNIQUE. We have discussed the meaning of these constraints. They are a strong clue that this is the primary key field for this table. The phrase ends with a comma and a line return.
  • The next two fields are declared to be VARCHAR, variable length character, fields. The numbers in parentheses after the keyword VARCHAR represent the maximum length of each field. Notice that every field in this table is declared to be NOT NULL. Each field for a vendor instance must be filled in.
  • The next four phrases declare four more fields, this time of type CHAR. This is an older data type that does not truncate the space it takes when it is not filled. In each of these cases, the author intends to use the full length declared for the field, so making them VARCHAR fields would not be useful.
  • The last phrase in this command says PRIMARY KEY (V_CODE));, which means that we are declaring the V_CODE field to be the primary key of the table. This is the way you do it in MySQL If the primary was a composite key, the other attributes would have been named inside the parentheses as well, with commas separating the attributes in the list.
  • Some readers will be confused by the three characters ending the phrase: ));. The first of these closing parentheses closes the set surrounding the name of the primary key field. The second closing parenthesis closes the set that opened back on line 1 of the command. The semicolon closes the command itself, as I warned you in the fourth bullet above.

That is a good example of a CREATE TABLE command. If you are used to thinking about functions being handed arguments in parentheses, you will understand that the CREATE command is a function that needs the list of attributes and their data types in its argument list. Likewise, each data type name is a function that needs a field width in its argument list. Finally, PRIMARY KEY is a function that needs the name of an attribute as its argument.

Page 257 has a large shaded note about the storage engines that are provided with MySQL. It, unfortunately, left out the text of the command that sets the storage engine. Take a look at this page in the online manual. You can see that the CREATE command allows you to add a phrase to specify the engine. The example from the manual looks like this:

CREATE TABLE t1 (i INT) ENGINE = INNODB;

In this example, the table is called t1, the attribute is called i, and the storage engine is InnoDB, which is the default engine as of MySQL version 5.5.5. This is the engine that produces transaction safe tables, which are preferred. The short version of this news is that you don't have to use the ENGINE command option if you want referential integrity. Since we have it, the note on page 259 is useful, telling us that we can apply referential integrity constraints when we create tables. An example from the text that would be in the CREATE command for the PRODUCT table:

FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE

This means that the V_CODE attribute, found in the PRODUCT table is a FOREIGN KEY from the VENDOR table, and that updates to the PRODUCT table should CASCADE, which means that they will be checked against values stored in that field in the VENDOR table. That is a lot of instructions in a small space. The bottom line is that you can't add a new instance to the PRODUCT table unless the value for V_CODE for that object is found in the VENDOR table. Think about that, and you will see that it is a good idea. It is also necessary to know this constraint exists. If we want to add a new line in the PRODUCT table that references a new vendor, we had better update the VENDOR table first. The note on page 260 explains that MySQL covers more options on the UPDATE command than the other databases mentioned in the text.

The text contains many notes about differing syntaxes in different versions of SQL. This adds to the bulk of the chapter without adding much of interest to those who only have one option available to them. Let me caution you to read the parts about the database we are using, but to check them against the online manual for MySQL. A small typo in the book will keep a command from running properly, and sometimes things change from one version to the next.

The MySQL code to create the PRODUCT table will look like this:

CREATE TABLE PRODUCT (
P_CODE VARCHAR(10) NOT NULL UNIQUE,
P_DESCRIPT VARCHAR(35) NOT NULL,
P_INDATE DATE NOT NULL,
P_QOH SMALLINT NOT NULL,
P_MIN SMALLINT NOT NULL,
P_PRICE NUMBER(8,2) NOT NULL,
P_DISCOUNT NUMBER(5,2) NOT NULL,
V_CODE INTEGER,
PRIMARY KEY (P_CODE),
FOREIGN KEY (V_CODE) REFERENCES VENDOR (V_CODE) ON UPDATE CASCADE);

This time we have an attribute whose data type is DATE, two whose types are SMALLINT, and two whose types are NUMBER. Those two represent money values, so they are given arguments for total field width, then a comma, then the number of decimal places. Example: P_PRICE NUMBER(8,2) NOT NULL. It is a numeric field, eight characters wide, and the last two characters follow a decimal point.

The text discusses the Oracle version of this table. Oracle supports constraints that specify a default value for a field, and that check to make sure a value is in a set of acceptable values. MySQL does not support these constraints in table creation. It is nice to know that Oracle supports these options, but the database we are using in class does not.

On page 263, the text discusses creating indexes for tables, which make searches for data faster when you search for something in an indexed field. The primary key for a table automatically has an index, but that is the only automatic one. If you want to create an index, you can use the syntax on this page to do so in MySQL. The first example on the page is this:

CREATE INDEX P_INDATEX ON PRODUCT(P_INDATE);

The CREATE command is followed by the word INDEX, which is followed by the name you are giving the index, which is followed by the word ON, which is followed by the name of a table, which is followed by a set of parentheses enclosing the list of columns on which to base the index. The index becomes a new database object associated with the named table. There are several ideas in that sentence, so you should review the syntax in it before creating such a command.

Data Manipulation Commands

The text moves ahead with the idea of manipulating data, which is is the more frequent use of SQL.

The most common use of the INSERT command is to insert a row (a new record) into a table. This is how SQL loads a table with data: with a series of INSERT commands. The text points out, as we have discussed above, that the rows of the VENDOR table must be entered before the rows of the PRODUCT table, since PRODUCT has a referential integrity constraint based on data in the VENDOR table. This is an example of a command to enter a row in the VENDOR table:

INSERT INTO VENDOR
VALUES (21225,'Bryson, Inc.','Smithson','615','223-3234','TN','Y');

Compare this to the structure of the table, listed above. The command begins with the reserved words INSERT INTO, followed by the name of the table. The line return shown in the example is not necessary, but it makes the command more readable, and makes it easier to edit a script of commands like it. The keyword VALUES comes next, followed by a set of parentheses around a comma delimited list of the values to be put in the attributes in that row. One of the rules about relations is that the order of the attributes is not important, but it is important to the INSERT command. The order of the data arguments must match the order in which the matching attributes exist in the table. Note that the integer value is entered without quotes, but each character value (VARCHAR and CHAR types) must be enclosed in single quotes. The last character in the command line, as usual, is a semi-colon.

After entering a couple of rows of data, the text illustrates a simple version of the SELECT command. The command is this:

SELECT * FROM VENDOR;

The command begins with the keyword SELECT, which means we are asking SQL to retrieve data. The next word is an asterisk, which is a wild card in SQL. Normally, the word SELECT is followed by a comma delimited list of the columns whose data you want to retrieve. The asterisk in this position means "all columns". The next word is the keyword FROM which is followed by the name of a table. This is a common command, easy to understand, and easy to use. Its result is to cause SQL to display the contents of the named table. There are various ways to dress up this command, some of which we will see shortly.

The text returns to the INSERT command to show examples of inserting lines into the PRODUCTS table. In this case, we have a difference in database operation. When entering the data for the P_INDATE field, which is in DATE format, we need to know how our database expects that format to look. The example in the text uses quoted strings that look like this: '12-MAR-2019'. We are told that this format is used by Access and Oracle. It is like the British format of day-month-year. MySQL prefers a string like this for the same date: '2019-03-12'. This is more like the official Swedish format of year-month-day. The advantage of this format when used in text databases is that this column, when indexed alphabetically, appears in chronological order.

When entering data for an empty field, such as an optional field, use the keyword NULL. It tells the reader and the database that the field was not left empty by accident.

The text offers a way to enter data only in specific attributes with the INSERT command. This is the text example:

INSERT INTO PRODUCT(P_CODE, P_DESCRIPT) VALUES ('BRT-345','Titanium drill bit');

Note that the provided values would be inserted only into the named fields.

Now for a cautionary tale. It is possible to work on a database for hours, lose power, and lose all changes you made in that time. How could this happen? The changes made in a work session are saved to the database files when any one of three events happens:

  • you close the database
  • you close the application you are using to edit the database
  • you use the COMMIT command

If you want to continue working, but save changes so far, the COMMIT command is the easy answer. Its syntax is very simple: COMMIT;. That's all, just the word COMMIT and a semicolon to end the command. This saves all changes made since the last COMMIT command was issued (or since the work session began if you have not used the command yet). It is advisable that you issue a COMMIT command before you start anything that you fear may go wrong. If it does, you can undo all changes since the last COMMIT with the command ROLLBACK. Think of them as save and revert commands. You can't revert past your last save, but you can undo almost everything after it. Almost? ROLLBACK will undo data manipulation commands, but not data definition commands. The text notes that Access does not support the ROLLBACK command.

Another kind of change to databases is handled by the UPDATE command, which is used to change data in existing records. This command has three phrases, illustrated by this example:

UPDATE PRODUCT
SET P_INDATE = '18-JAN-2016'
WHERE P_CODE = '13-Q2/P2';

In this example, the command UPDATE is followed by the name of a table. The keyword SET is followed by the name of a field, an equal sign, and the data to be placed in that field. The keyword WHERE is followed by the name of a field, an equal sign in this case, and data to identify the record in which the change is to be made. It's like saying, in this table, change this field to say x, in every record that matches this criterion. The SET phrase establishes the change to be made, and the WHERE phrase establishes the records in which to do it. The command usually uses two different fields in the SET and WHERE phrases. The text warns us that leaving out a WHERE phrase would result in the change taking place in all records in the table. This is rarely a good idea.

The DELETE command is not used to delete tables, it is used to delete rows already in tables. The text provides an example:

DELETE FROM PRODUCT
WHERE P_CODE = 'BRT-345';

This command would find rows in the PRODUCT table that match the rule in the WHERE statement, then delete those rows from the table. The text warns that leaving out the WHERE statement would cause all rows in the table to be deleted. MySQL has a safety feature that prevents it from running UPDATE and DELETE commands unless they have a WHERE statement that includes a value for the primary key. This means that you have to specify individual rows. This can be turned off, but it is not recommended.

The INSERT command can also be used to insert rows from one table into another table. The text does not offer an example, but it explains that we can make an INSERT command use a SELECT command as a nested query (which runs first) like this:

INSERT INTO PRODUCT
SELECT column names
FROM other_table;

This is all one command, so we only need one semicolon at the end. Logically, it is two commands. The SELECT command would run first, then its output would be processed as the input of the INSERT command.

SELECT Commands

The text expands its discussion of SELECT commands on page 273. This is the most used command in data retrieval operations. In the examples so far, the text has discussed using an equal sign in WHERE statements, such as WHERE P_CODE = 'BRT-345'. WHERE statements are more flexible when we use comparison operators, which allow them to select rows having ranges of acceptable values. Comparison operators work on text values as well as on numeric values because the comparison is based on the ASCII value of the text entry. The chapter offers an example:

SELECT P_CODE, P_DESCRIPT, P_QOH, P_MIN, P_PRICE
FROM PRODUCT
WHERE P_CODE < '1558-QW1';

In this example, the query would return five columns of data from the PRODUCT table, but only from rows in which the P_CODE value is less than the specified text, which means those that occur before the specified value in an ASCII-sorted list. When doing this with dates, the syntax depends on your database. The text offers the same command for three kinds of databases:
Oracle SELECT P_DESCRIPT, P_QOH, P_MIN, P_PRICE, P_INDATE
FROM PRODUCT
WHERE P_INDATE >= '20-Jan-2019';
Access SELECT P_DESCRIPT, P_QOH, P_MIN, P_PRICE, P_INDATE
FROM PRODUCT
WHERE P_INDATE >= #20-Jan-2019#;
MySQL SELECT P_DESCRIPT, P_QOH, P_MIN, P_PRICE, P_INDATE
FROM PRODUCT
WHERE P_INDATE >= '2019-01-20';

 

You can also do math with the values being compared in the WHERE clauses, such as looking for dates that are less than or equal to the current date minus 90 days. The text does not give you this code for MySQL. It should look like this:

WHERE P_INDATE <= CURDATE() - 90;

We can even throw in Boolean logic operators to combine more than one WHERE condition. The standard operators are AND, OR, and NOT. The text offers this example which is good in all addressed databases:

SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
FROM PRODUCT
WHERE V_CODE = 21344 OR V_CODE = 24288;

This query would return four columns of data from the PRODUCT table meeting either of the two conditions in the WHERE statement. (Rows meeting both conditions would be returned as well.)

On page 279, the text lists several "special operators", that can be used to look for situations not as easily covered by the examples so far:

Special Operator Purpose

BETWEEN

Used to check whether an attribute value is within a range

IS NULL


Used to check whether an attribute value is null

LIKE

Used to check whether an attribute value matches a given string pattern

IN

Used to check whether an attribute value matches any value within a value list

EXISTS

Used to check whether a subquery returns any rows

 

On page 283, the text bring up more data definition commands. Several begin with the words ALTER TABLE. These are for making changes to existing data tables.

To change the structure of a table, you can use the ALTER TABLE command. This is an example of adding a new column to a table:

ALTER TABLE table_name
ADD new_column data_type(width) ;

The syntax allows you to set a value for all rows in the table, if desired. If some rows should have different values, you can use UPDATE commands to change those rows.

ALTER TABLE table_name
ADD new_column data_type(width) INIT = 'value' ;


You can also remove a column from a table with the ALTER command:

ALTER TABLE table_name
DROP column_name;


Another variation is to change the type or width of a column with the MODIFY clause:

ALTER TABLE table_name
MODIFY column_name data_type(new_width) ;

MODIFY can also be used to change a column to not accept nulls. You should be aware that some systems will allow you to increase, but not decrease the width of a column. (We might lose data.) Some systems will not allow changes in data types. If your system does not allow changes to tables, you can still create a new table, and copy data with the INSERT INTO command shown above.


As noted later in the chapter, you can remove a table from your database with the DROP TABLE command. The syntax is simple:

DROP TABLE table_name;


Ordering Your Output

The output of most commands, SELECT commands in particular, can be sorted as you wish with some simple additions to the command in question. You know that the WHERE clause in a SELECT command is often optional. Another clause that is optional is the ORDER BY clause. This is an example of a SELECT command that uses it:

SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
FROM PRODUCT
ORDER BY P_PRICE;

In this example, the command is to retrieve data from four fields in the PRODUCT table, from all rows (since there is no WHERE clause this time), then arrange the rows according to the values in the P_PRICE column. Unless told otherwise, SQL will read a numeric field like P_PRICE and sort the rows in ascending order. To put the rows in descending order, we would have to add the keyword DESC to the command, like this:

SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
FROM PRODUCT
ORDER BY P_PRICE DESC;

When ordering rows of data, it is sometimes necessary to have primary, secondary, and tertiary sorts. The text gives us an example of this with names, which often have first, middle, and last components. Your text refers to this as a cascading order sequence.

SELECT EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_AREACODE,
EMP_PHONE
FROM EMPLOYEE
ORDER BY EMP_LNAME, EMP_FNAME, EMP_INITIAL;

In this case, the first sort is by last name, the second sort is by first name, and the third sort is by middle initial. This works because each subsequent sort is done on data that is already sorted by the previous sort, and the results of the previous sort are maintained. The examples above are provided without WHERE clauses only for simplicity. A WHERE clause, if needed, should precede the ORDER BY clause. This should make sense. Sorting your output can't happen until you have output.

On page 292, the text shows another way to use SELECT. Using the last query as an illustration, you could enter a command like this:

SELECT EMP_LNAME FROM EMPLOYEE;

This would give you a list of the last names in each record in the table. If you only wanted to see a list of the last names with no repeats, that is, a list of all the different last names in the table, you might be able to do this:

SELECT DISTINCT EMP_LNAME FROM EMPLOYEE;

The keyword DISTINCT is meant to show each new name only once. It won't tell you how many people have each name, but it will tell you all the different names that people have. Obviously, this could be done with any attribute.

If you are interested in counts, totals and other statistical data, SQL has a set if aggregate functions, functions that summarize data in tables.

Function Usage
AVG The arithmetic mean (average) for a specified column:
SELECT AVG(P_PRICE) FROM PRODUCT;
COUNT The number of rows containing non-null values:
SELECT COUNT(P_PRICE) FROM PRODUCT;
MAX

Maximum value encountered in a given column:
SELECT MAX(P_PRICE) FROM PRODUCT;

MIN Minimum value encountered in a given column:
SELECT MIN(P_PRICE) FROM PRODUCT;
SUM Total of all values in a given column:
SELECT SUM(P_PRICE) FROM PRODUCT;

The text leads us to more interesting applications of such functions with the note that tells us we can substitute a nested query for the name of a table. We can change the example about distinct names above to get a count of those names:

SELECT COUNT(DISTINCT EMP_LNAME) FROM EMPLOYEE;

So, we can summarize particular columns in a table with the functions above. You can also summarize sections of a table by using ORDER BY clauses. Assume that you have data for several different offices, states, or regions in a table. You can tell it to give you an aggregate answer for each such grouping by telling it to ORDER BY the field that holds the designator for the separate groups. The text offers an example from the PRODUCT table. Each product has an associated vendor identified by a V_CODE entry. Can you tell what this command would do?

SELECT V_CODE, COUNT(P_CODE)
FROM PRODUCT
GROUP BY V_CODE
ORDER BY V_CODE;

The command says to retrieve the V_CODE column from the PRODUCT table, GROUP BY the vendor number, calculate how many products there are for each group, and to output the data sorted by vendor number. The tricky part is the count for each group. Without the GROUP BY phrase, you would not get the desired result. In the example in the text on page 299, the output looks a little odd, because there are two rows in the table that have no V_CODE data. Other than that, the query returns a count of the products we carry from each of the vendors.

As if that were not complex enough, the GROUP BY command has an optional clause called HAVING. HAVING uses the same syntax as a WHERE clause in a regular SELECT query. HAVING is linked to the GROUP BY section of the query, which may help you to keep HAVING and WHERE separated in your mind. So, what does it do? It allows you to select only rows having particular features for your aggregation. Yes, that is probably not clear. An example that contains lots of features:

SELECT V_CODE, SUM(P_QOH * P_PRICE) AS TOTCOST
FROM PRODUCT
GROUP BY V_CODE
HAVING (SUM(P_QOH * P_PRICE) > 500)
ORDER BY SUM(P_QOH * P_PRICE) DESC;

Once again, we are using the PRODUCT table. We want the V_CODE column, and a SUM of the "quantity ordered" times the "price of the part", for each V_CODE group, and we will call that calculate column TOTCOST in the output. That's in the first three lines, and it produces an output. On line 4, we are limiting the row selection from that output to V_CODEs whose SUM of quantity orders times price per part is greater than 500. Finally, we are ordering the output by the amount spent with the vendor.

The chapter ends with a return to an idea we have seen before, the JOIN query. A JOIN is any query that pulls data from two or more tables. The logic required to do so is sometimes different from what you type in a query on one table. The text describes a query on column in the PRODUCT and VENDOR tables. The two tables are related on the V_CODE field which is found in both of them. That brings up a new syntax problem.  Let's examine the query example:

SELECT P_DESCRIPT, P_PRICE, V_NAME, V_CONTACT, V_AREACODE, V_PHONE
FROM PRODUCT, VENDOR
WHERE PRODUCT.V_CODE = VENDOR.V_CODE;

  • The SELECT statement asks for the columns to be placed in the output, as usual.
  • The FROM statement asks for both tables.
  • The WHERE statement wants to pull data from rows that match, that have the same V_CODE in each table. In order to do this, the we have to tell the SQL engine to look for records in which the V_CODE in the PRODUCT table is same as the V_CODE in the VENDOR table. That's what the WHERE statement means this time. The new part of the notation is to qualify each of the two attributes that must match, to state which table we want to look at. Why do we care? Because the two attributes have the same name.
    WHERE PRODUCT.V_CODE = VENDOR.V_CODE
    This means where the V_CODE from the PRODUCT table equals the V_CODE from the VENDOR table. PRODUCT.V_CODE uses table.attribute notation to make it clear which two items must match. In SQL, whenever there is room for doubt about which item from what table you are specifying, use table.attribute notation to be specific.

The text also recommends using the ORDER BY clause in this kind of query, because the order of the records in the output can be random, making it harder to find the data you want if you do not apply a specific order.

We will ignore the short section on joining tables using an alias for a table name, since the author has not told us how to establish the alias. Likewise, let's skip the section on recursive joins, which MySQL does not support.