The chapter begins our actual discussion of SQL. The chapter begins with a statement about SQL having a variety of commands that provide for database management (usually done by administrators) as well as more normal database operations (usually done by users). Its commands can be sorted into two categories that address this division:
As you can see, the divisions have to do with the structures
and objects of the database itself (DDL), or with the data in the database (DML).
Tables 7.1 and 7.2 show you several examples of commands that fall into
each category. They won't mean much to you without a context for using
them, so we will begin discussing them in the next section below.
A standard version of SQL has been adopted by the American National Standards Institute, and by ISO, an international standards group. The text cautions us that this version of SQL is usually a subset of the version being used in a vendor's DBMS product. There are often enhancements to the language in each version whose syntaxes vary from one vendor to another. This makes it unlikely that a company using one DBMS for complex work could move their code to another DBMS without recoding to match the new DBMS's syntax.
That being said, the text moves on to discuss standard SQL commands that fall under the data definition category.
Data Definition Commands
The text presents a small database with five tables in Figure
7.1, which is an Entity Relationship Diagram.
You should look at the relationships between these tables in the text. Realize that the instances in the Line table are actually parts of invoices. That table was created as a separate table from the Invoice table to resolve dependency problems.
For this discussion, the text uses mainly the Product and Vendor tables. The text presents a long discussion of the database which is probably pretty boring to all of us. Its purpose is to show you that the ERD and the business rules are needed to generate reasons for the tables to exist as they do. We can examine this data in class. For these notes, let's move ahead to section 7-2d of the chapter. This skips the creation of a database space, which has often not been possible with MySQL. We are planning to use MySQL for the platform in this class. If you want to download it and install it on your own Windows computer, you may find it useful to watch this video about doing so. I recommend that you watch it with the playback speed set to 1.5 times normal.
In section 7-2d, the author has already assumed a
few rules about building tables that you need to consider yourself
when creating your own. When downloading tables from our publisher,
considerations like "rules about capitalization" will already have been made
for you, and you need to inspect the table to learn what they are. When
you create your own tables, you have to make rules for yourself. For
example, will you capitalize the names of files, attributes, or any
data element? The author of our text favors using all caps for table
and attribute names, which I find a bit ugly. Issuing effective
commands in SQL requires that you know the capitalization rules that
are being used in the database you are trying to maintain or query. A
command will fail if the name of an object in that command is not
spelled or capitalized as it exists in your database. The chapter
refers to Table 7.3 as a data dictionary, which is a useful
summary of information about every table and every attribute in it. You
will want to make notes in a data dictionary that includes this
information about every table in your database as you make decisions
that affect them. Why? It becomes a helpful reference for your database, assuming you create and update this file.
In this one-page data dictionary, we see two tables and each of
their attributes. Each attribute's name is given, then a description of
the data it will hold, and the data type that has been assigned to that
field. Note that the range column in this reference table
describes the allowed range of values for that field in this database,
not the range actually allowed by the data type. The data type is typically more flexible than you will want your table to be.
At the text explains, you typically choose a numeric data type instead of a character data type when you know that you will have to do math with the data. This includes auto-incrementation of numbers, such as assigning record numbers when new instances are added to a table. It also includes fields that may be searched with comparison operators, such as looking for all records in which a field's value is less than a supplied value. On the other hand, when you know that you will never do math with data that looks like a string of digits, like ID numbers, you will want to make the field that holds such data a character type.
It is time that we saw some commands, so, assuming you are going to create the VENDOR table, this is a MySQL command that would create its structure. After this command is issued, the table will still be empty:
CREATE TABLE VENDOR (
See what I mean about too many capitals? It's hard to read for people who have learned that capital letters mean "shouting". It is standard to use capital letters for key words in the SQL language, and key words make up most of this command.
That is a good example of a CREATE TABLE command. If you are used to thinking about functions being handed arguments in parentheses, you will understand that the CREATE command is a function that needs the list of attributes and their data types in its argument list. Likewise, each data type name is a function that needs a field width in its argument list. Finally, PRIMARY KEY is a function that needs the name of an attribute as its argument.
Page 257 has a large shaded note about the storage engines that are provided with MySQL. It, unfortunately, left out the text of the command that sets the storage engine. Take a look at this page in the online manual. You can see that the CREATE command allows you to add a phrase to specify the engine. The example from the manual looks like this:
CREATE TABLE t1 (i INT) ENGINE = INNODB;
In this example, the table is called t1, the attribute is called i, and the storage engine is InnoDB, which is the default engine as of MySQL version 5.5.5. This is the engine that produces transaction safe tables, which are preferred. The short version of this news is that you don't have to use the ENGINE command option if you want referential integrity. Since we have it, the note on page 259 is useful, telling us that we can apply referential integrity constraints when we create tables. An example from the text that would be in the CREATE command for the PRODUCT table:
FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE
This means that the V_CODE attribute, found in the PRODUCT table is a FOREIGN KEY from the VENDOR table, and that updates to the PRODUCT table should CASCADE, which means that they will be checked against values stored in that field in the VENDOR table. That is a lot of instructions in a small space. The bottom line is that you can't add a new instance to the PRODUCT table unless the value for V_CODE for that object is found in the VENDOR table. Think about that, and you will see that it is a good idea. It is also necessary to know this constraint exists. If we want to add a new line in the PRODUCT table that references a new vendor, we had better update the VENDOR table first. The note on page 260 explains that MySQL covers more options on the UPDATE command than the other databases mentioned in the text.
The text contains many notes about differing syntaxes in different versions of SQL. This adds to the bulk of the chapter without adding much of interest to those who only have one option available to them. Let me caution you to read the parts about the database we are using, but to check them against the online manual for MySQL. A small typo in the book will keep a command from running properly, and sometimes things change from one version to the next.
The MySQL code to create the PRODUCT table will look like this:
CREATE TABLE PRODUCT (
This time we have an attribute whose data type is DATE, two whose types are SMALLINT, and two whose types are NUMBER. Those two represent money values, so they are given arguments for total field width, then a comma, then the number of decimal places. Example: P_PRICE NUMBER(8,2) NOT NULL. It is a numeric field, eight characters wide, and the last two characters follow a decimal point.
The text discusses the Oracle version of this table. Oracle supports constraints that specify a default value for a field, and that check to make sure a value is in a set of acceptable values. MySQL does not support these constraints in table creation. It is nice to know that Oracle supports these options, but the database we are using in class does not.
On page 263, the text discusses creating indexes for tables, which make searches for data faster when you search for something in an indexed field. The primary key for a table automatically has an index, but that is the only automatic one. If you want to create an index, you can use the syntax on this page to do so in MySQL. The first example on the page is this:
CREATE INDEX P_INDATEX ON PRODUCT(P_INDATE);
The CREATE command is followed by the word INDEX,
which is followed by the name you are giving the index, which
is followed by the word ON, which is followed by the name of a table,
which is followed by a set of parentheses enclosing the list of
on which to base the index. The index becomes a
new database object associated with the named table. There are several
ideas in that sentence, so you should review the syntax in it before
creating such a command.
Data Manipulation Commands
The text moves ahead with the idea of manipulating data, which is is the more frequent use of SQL.
The most common use of the INSERT command is to insert a row (a new record) into a table. This is how SQL loads a table with data: with a series of INSERT commands. The text points out, as we have discussed above, that the rows of the VENDOR table must be entered before the rows of the PRODUCT table, since PRODUCT has a referential integrity constraint based on data in the VENDOR table. This is an example of a command to enter a row in the VENDOR table:
INSERT INTO VENDOR
Compare this to the structure of the table, listed above. The
command begins with the reserved words INSERT INTO, followed by
the name of the table. The line return shown in the example is
not necessary, but it makes the command more readable, and makes it
easier to edit a script of commands like it. The keyword VALUES
comes next, followed by a set of parentheses around a comma
delimited list of the values to be put in the attributes in that
row. One of the rules about relations is that the order of the
attributes is not important, but it is important to the INSERT
command. The order of the data arguments must match the order in which
the matching attributes exist in the table. Note that the integer
value is entered without quotes, but each character
value (VARCHAR and CHAR types) must be enclosed in single
quotes. The last character in the command line, as usual, is a semi-colon.
After entering a couple of rows of data, the text illustrates a simple version of the SELECT command. The command is this:
SELECT * FROM VENDOR;
The command begins with the keyword SELECT, which means we are asking SQL to retrieve data. The next word is an asterisk, which is a wild card in SQL. Normally, the word SELECT is followed by a comma delimited list of the columns whose data you want to retrieve. The asterisk in this position means "all columns". The next word is the keyword FROM which is followed by the name of a table. This is a common command, easy to understand, and easy to use. Its result is to cause SQL to display the contents of the named table. There are various ways to dress up this command, some of which we will see shortly.
The text returns to the INSERT command to show examples of inserting lines into the PRODUCTS table. In this case, we have a difference in database operation. When entering the data for the P_INDATE field, which is in DATE format, we need to know how our database expects that format to look. The example in the text uses quoted strings that look like this: '12-MAR-2019'. We are told that this format is used by Access and Oracle. It is like the British format of day-month-year. MySQL prefers a string like this for the same date: '2019-03-12'. This is more like the official Swedish format of year-month-day. The advantage of this format when used in text databases is that this column, when indexed alphabetically, appears in chronological order.
When entering data for an empty field, such as an optional
field, use the keyword NULL. It tells the reader and the database that the field was not left empty by accident.
The text offers a way to enter data only in specific attributes with the INSERT command. This is the text example:
INSERT INTO PRODUCT(P_CODE, P_DESCRIPT) VALUES ('BRT-345','Titanium drill bit');
Note that the provided values would be inserted only into the named fields.
Now for a cautionary tale. It is possible to work on a database for hours, lose power, and lose all changes you made in that time. How could this happen? The changes made in a work session are saved to the database files when any one of three events happens:
If you want to continue working, but save changes so far, the COMMIT command is the easy answer. Its syntax is very simple: COMMIT;. That's all, just the word COMMIT and a semicolon to end the command. This saves all changes made since the last COMMIT command was issued (or since the work session began if you have not used the command yet). It is advisable that you issue a COMMIT command before you start anything that you fear may go wrong. If it does, you can undo all changes since the last COMMIT with the command ROLLBACK. Think of them as save and revert commands. You can't revert past your last save, but you can undo almost everything after it. Almost? ROLLBACK will undo data manipulation commands, but not data definition commands. The text notes that Access does not support the ROLLBACK command.
Another kind of change to databases is handled by the UPDATE command, which is used to change data in existing records. This command has three phrases, illustrated by this example:
In this example, the command UPDATE is followed by the name of a table. The keyword SET is followed by the name of a field, an equal sign, and the data to be placed in that field. The keyword WHERE is followed by the name of a field, an equal sign in this case, and data to identify the record in which the change is to be made. It's like saying, in this table, change this field to say x, in every record that matches this criterion. The SET phrase establishes the change to be made, and the WHERE phrase establishes the records in which to do it. The command usually uses two different fields in the SET and WHERE phrases. The text warns us that leaving out a WHERE phrase would result in the change taking place in all records in the table. This is rarely a good idea.
The DELETE command is not used to delete tables, it is used to delete rows already in tables. The text provides an example:
DELETE FROM PRODUCT
This command would find rows in the PRODUCT table that match the rule in the WHERE statement, then delete those rows from the table. The text warns that leaving out the WHERE statement would cause all rows in the table to be deleted. MySQL has a safety feature that prevents it from running UPDATE and DELETE commands unless they have a WHERE statement that includes a value for the primary key. This means that you have to specify individual rows. This can be turned off, but it is not recommended.
The INSERT command can also be used to insert rows from one table into another table. The text does not offer an example, but it explains that we can make an INSERT command use a SELECT command as a nested query (which runs first) like this:
INSERT INTO PRODUCT
is all one command, so we only need one semicolon at the
end. Logically, it is two commands. The SELECT command would run
first, then its output would be processed as the input of the INSERT
The text expands its discussion of SELECT commands on page 273. This is the most used command in data retrieval operations. In the examples so far, the text has discussed using an equal sign in WHERE statements, such as WHERE P_CODE = 'BRT-345'. WHERE statements are more flexible when we use comparison operators, which allow them to select rows having ranges of acceptable values. Comparison operators work on text values as well as on numeric values because the comparison is based on the ASCII value of the text entry. The chapter offers an example:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_MIN, P_PRICE
You can also do math with the values being compared in the WHERE clauses, such as looking for dates that are less than or equal to the current date minus 90 days. The text does not give you this code for MySQL. It should look like this:
WHERE P_INDATE <= CURDATE() - 90;
We can even throw in Boolean logic operators to combine more than one WHERE condition. The standard operators are AND, OR, and NOT. The text offers this example which is good in all addressed databases:
SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
This query would return four columns of data from the PRODUCT
table meeting either of the two conditions in the WHERE statement. (Rows meeting both conditions would be returned as well.)
On page 279, the text lists several "special operators", that can be used to look for situations not as easily covered by the examples so far:
On page 283, the text bring up more data definition commands. Several begin with the words ALTER TABLE. These are for making changes to existing data tables.
To change the structure of a table, you can use the ALTER TABLE command. This is an example of adding a new column to a table:
ALTER TABLE table_name
The syntax allows you to set a value for all rows in the table, if desired. If some rows should have different values, you can use UPDATE commands to change those rows.
ALTER TABLE table_name
ALTER TABLE table_name
ALTER TABLE table_name
MODIFY can also be used to change a column to not accept nulls. You should be aware that some systems will allow you to increase, but not decrease the width of a column. (We might lose data.) Some systems will not allow changes in data types. If your system does not allow changes to tables, you can still create a new table, and copy data with the INSERT INTO command shown above.
DROP TABLE table_name;
The output of most commands, SELECT commands in particular, can be sorted as you wish with some simple additions to the command in question. You know that the WHERE clause in a SELECT command is often optional. Another clause that is optional is the ORDER BY clause. This is an example of a SELECT command that uses it:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
In this example, the command is to retrieve data from four fields in the PRODUCT table, from all rows (since there is no WHERE clause this time), then arrange the rows according to the values in the P_PRICE column. Unless told otherwise, SQL will read a numeric field like P_PRICE and sort the rows in ascending order. To put the rows in descending order, we would have to add the keyword DESC to the command, like this:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
When ordering rows of data, it is sometimes necessary to have primary, secondary, and tertiary sorts. The text gives us an example of this with names, which often have first, middle, and last components. Your text refers to this as a cascading order sequence.
SELECT EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_AREACODE,
In this case, the first sort is by last name, the second sort
is by first name, and the third sort is by middle initial. This works
because each subsequent sort is done on data that is already sorted by
the previous sort, and the results of the previous sort are maintained. The examples above are provided without WHERE
clauses only for simplicity. A WHERE clause, if needed, should precede
the ORDER BY clause. This should make sense. Sorting your output can't happen until you have output.
On page 292, the text shows another way to use SELECT. Using the last query as an illustration, you could enter a command like this:
SELECT EMP_LNAME FROM EMPLOYEE;
This would give you a list of the last names in each record in the table. If you only wanted to see a list of the last names with no repeats, that is, a list of all the different last names in the table, you might be able to do this:
SELECT DISTINCT EMP_LNAME FROM EMPLOYEE;
The keyword DISTINCT is meant to show each new name only once. It won't tell you how many people have each name, but it will tell you all the different names that people have. Obviously, this could be done with any attribute.
If you are interested in counts, totals and other statistical data, SQL has a set if aggregate functions, functions that summarize data in tables.
The text leads us to more interesting applications of such functions with the note that tells us we can substitute a nested query for the name of a table. We can change the example about distinct names above to get a count of those names:
SELECT COUNT(DISTINCT EMP_LNAME) FROM EMPLOYEE;
So, we can summarize particular columns in a table with the functions above. You can also summarize sections of a table by using ORDER BY clauses. Assume that you have data for several different offices, states, or regions in a table. You can tell it to give you an aggregate answer for each such grouping by telling it to ORDER BY the field that holds the designator for the separate groups. The text offers an example from the PRODUCT table. Each product has an associated vendor identified by a V_CODE entry. Can you tell what this command would do?
SELECT V_CODE, COUNT(P_CODE)
The command says to retrieve the V_CODE column from the PRODUCT table, GROUP BY the vendor number, calculate how many products there are for each group, and to output the data sorted by vendor number. The tricky part is the count for each group. Without the GROUP BY phrase, you would not get the desired result. In the example in the text on page 299, the output looks a little odd, because there are two rows in the table that have no V_CODE data. Other than that, the query returns a count of the products we carry from each of the vendors.
As if that were not complex enough, the GROUP BY command has an optional clause called HAVING. HAVING uses the same syntax as a WHERE clause in a regular SELECT query. HAVING is linked to the GROUP BY section of the query, which may help you to keep HAVING and WHERE separated in your mind. So, what does it do? It allows you to select only rows having particular features for your aggregation. Yes, that is probably not clear. An example that contains lots of features:
SELECT V_CODE, SUM(P_QOH * P_PRICE) AS TOTCOST
Once again, we are using the PRODUCT table. We want the V_CODE column, and a SUM of the "quantity ordered" times the "price of the part", for each V_CODE group, and we will call that calculate column TOTCOST in the output. That's in the first three lines, and it produces an output. On line 4, we are limiting the row selection from that output to V_CODEs whose SUM of quantity orders times price per part is greater than 500. Finally, we are ordering the output by the amount spent with the vendor.
The chapter ends with a return to an idea we have seen before, the JOIN query. A JOIN is any query that pulls data from two or more tables. The logic required to do so is sometimes different from what you type in a query on one table. The text describes a query on column in the PRODUCT and VENDOR tables. The two tables are related on the V_CODE field which is found in both of them. That brings up a new syntax problem. Let's examine the query example:
SELECT P_DESCRIPT, P_PRICE, V_NAME, V_CONTACT, V_AREACODE,
The text also recommends using the ORDER BY clause in this kind of query, because the order of the records in the output can be random, making it harder to find the data you want if you do not apply a specific order.
We will ignore the short section on joining tables using an alias for a table name, since the author has not told us how to establish the alias. Likewise, let's skip the section on recursive joins, which MySQL does not support.