The chapter begins our actual discussion of SQL. The chapter begins with a statement about SQL having a variety of commands that provide for database management (usually done by administrators) as well as more normal database operations (usually done by users). Its command can be sorted into two categories that address this division:
As you can see, the divisions have to do with the structures and objects
of the database itself, or with the data in the database. Tables 7.1 and
7.2 show you several examples of commands that fall into each category.
They won't mean much to you without a context for using them, so we will
begin discussing then in the next section below.
A standard version of SQL has been adopted by the American National Standards Institute, and by ISO, an international standards group. The text cautions us that this version of SQL is usually a subset of the version being used in a vendor's DBMS product. There are often enhancements to the language in each version whose syntaxes vary from one vendor to another. This makes it unlikely that a company using one DBMS for complex work could move their code to another DBMS without recoding to match the new DBMS's syntax.
That being said, the text moves on to discuss standard SQL commands that fall under the data definition category.
Data Definition Commands
The text presents a small database with five tables in Figure 7.1, which is an Entity Relationship Diagram.
You should look at the relationships between these tables in the text. Realize that the instances in the Line table are actually parts of invoices. That table was created as a separate table from the Invoice table to resolve dependency problems.
For this discussion, the text uses mainly the Product and Vendor tables. The text presents a long discussion of the database which is probably pretty boring to you. Its purpose is to show you that the ERD and the business rules are needed to generate reasons for the tables to exist as they do. We can examine this data in class. For these notes, let's move ahead to section 7-2d of the chapter. This skips the creation of a database space, which has often not been possible with MySQL. We are planning to use MySQL for the platform in this class. If you want to download it and install it on your own Windows computer, you may find it useful to watch this video about doing so. I recommend that you watch it with the playback speed set to 1.5 times normal.
In section 7-2d, the author, the author has already assumed a few rules about building tables that you need to decide for yourself when creating your own. When downloading tables from our publisher, such things as rules about capitalization will already have been made for you, and you need to inspect the table to learn what they are. When you create your own tables, you have to make rules for yourself. For example, will you capitalize the names of files, attributes, or any data element? The author of our text favors using all caps for table and attribute names, which I find a bit ugly. Issuing effective commands in SQL requires that you know the capitalization rules that are being used in the database you are trying to maintain or query. A command will fail if the name of an object in that command is not spelled or capitalized as it exists in your database. The chapter refers to Table 7.3 as a data dictionary, which is a useful summary of information about every table and every attribute in it. You will want to make notes in a data dictionary that includes this information about every table in your database as you make decisions that affect them.
In this one-page data dictionary, we see two tables each of their attributes. Each attribute's name is given, then a description of the data it will hold, and the data type that has been assigned to that field. Note that the range column in this reference table describes the allowed range of values for that field in this database, not the range actually allowed by the data type.
At the text explains, you typically choose a numeric data type instead of a character data type when you know that you will have to do math with the data. This includes auto-incrementation of numbers, such as assigning record numbers when new instances are added to a table. It also includes fields that may be searched with comparison operators, such as looking for all records in which a field's value is less than a supplied value. On the other hand, when you know that you will never do math with data that looks like a string of digits, like ID numbers, you will want to make the field that holds such data a character type.
It is past time that we saw some commands, so, assuming you are going to create the VENDOR table, this is a MySQL command that would create its structure. After this command is issued, the table will still be empty:
CREATE TABLE VENDOR (
See what I mean about too many capitals? It's hard to read for people who have learned that capital letters mean "shouting". It is standard to use capital letters for key words in the SQL language, and key words make up most of this command.
That is a good example of a CREATE TABLE command. If you are used to thinking about functions being handed arguments in parentheses, you will understand that the CREATE command is a function that needs the list of attributes and their data types in its argument list. Likewise, each data type name is a function that needs a field width in its argument list. Finally, PRIMARY KEY is a function that needs the name of an attribute as its argument.
Page 257 has a large shaded note about the storage engines that are provided with MySQL. It, unfortunately, left out the text of the command that sets the storage engine. Take a look at this page in the online manual. You can see that the CREATE command allows you to add a phrase to specify the engine. The example from the manual looks like this:
CREATE TABLE t1 (i INT) ENGINE = INNODB;
In this example, the table is called t1, the attribute is called i, and the storage engine is InnoDB, which is the default engine as of MySQL version 5.5.5. This is the engine that produces transaction safe tables, which are preferred. The short version of this news is that you don't have to use the ENGINE command option if you want referential integrity. Since we have it, the note on page 259 is useful, telling us that we can apply referential integrity constraints when we create tables. An example from the text that would be in the CREATE command for the PRODUCT table:
FOREIGN KEY (V_CODE) REFERENCES VENDOR ON UPDATE CASCADE
This means that the V_CODE attribute, found in the PRODUCT table is a FOREIGN KEY from the VENDOR table, and that updates to the PRODUCT table should CASCADE, which means that they will be checked against values stored in that field in the VENDOR table. That is a lot of logic. The bottom line is that you can't add a new instance to the PRODUCT table unless the value for V_CODE for that object is found in the VENDOR table. Think about that, and you will see that it is a good idea. It is also necessary to know this constraint exists. If we add a new line in the PRODUCT table that references a new vendor, we had better update the VENDOR table first. The note on page 260 explains that MySQL covers more option on the UPDATE command than the other databases mentioned in the text.
The text contains many notes about differing syntaxes in different versions of SQL. This adds to the bulk of the chapter without adding much of interest to those who only have one option available to them. Let me caution you to read the parts about the database we are using, but to check them against the online manual for MySQL. A small typo in the book will keep a command from running properly, and sometimes things change from one version to the next.
The MySQL code to create the PRODUCT table will look like this:
CREATE TABLE PRODUCT (
This time we have an attribute whose data type is DATE, two whose type is SMALLINT, and two whose types are NUMBER. Those two represent money value, so they are given arguments for total field width, then a comma, then the number of decimal places. Example: P_PRICE NUMBER(8,2) NOT NULL. It is a numeric field, eight characters wide, and the last two characters follow a decimal point.
The text discusses the Oracle version of this table. Oracle supports constraints that specify a default value for a field, and that check to make sure a value is in a set of acceptable values. MySQL does not support these constraints in table creation. It is nice to know that Oracle supports these options, but the database we are using in class does not.
On page 263, the text discusses creating indexes for tables, which make searches for data faster when you search for something in an indexed field. The primary key for a table automatically has an index, but that is the only automatic one. If you want to create an index you can use the syntax on this page to do so in MySQL. The first example on the page is this:
CREATE INDEX P_INDATEX ON PRODUCT(P_INDATE);
The CREATE command is followed by the word INDEX, which is followed by the name you are giving the index, which is followed by the word ON, which is followed by the name of a table, which is followed by a set of parentheses enclosing the list of columns on which to base the index. The index becomes a new database object associated with the named table.
Data Manipulation Commands
The text moves ahead with the idea of manipulating data, which is the more frequent use of SQL.
The most common use of the INSERT command is to insert a row into a table. This is how SQL loads a table with data: with a series of INSERT commands. The text points out, as we have discussed above, that the rows of the VENDOR table must be entered before the rows of the PRODUCT table, since PRODUCT has a referential integrity constraint based on data in the VENDOR table. This is an example of a command to enter a row in the VENDOR table:
INSERT INTO VENDOR
Compare this to the structure of the table, listed above. The command begins with the reserved words INSERT INTO, followed by the name of the table. The line return shown in the example is not necessary, but it makes the command more readable, and makes it easier to edit a script of commands like it. The keyword VALUES comes next, followed by a set of parentheses around a comma delimited list of the values to be put in the attributes in that row. One of the rules about relations is that the order of the attributes is not important, but it is important to the INSERT command. The order of the data arguments must match the order in which the matching attributes exist in the table. Note that the integer value is entered without quotes, but each character value (VARCHAR and CHAR types) must be enclosed in single quotes.
After entering a couple of rows of data, the text illustrates the SELECT command, in a simple form. The command is this:
SELECT * FROM VENDOR;
The command begins with the keyword SELECT, which means we are asking SQL to retrieve data. The next word is an asterisk, which is a wild card in SQL. Normally, the word SELECT is followed by a comma delimited list of the columns whose data you want to retrieve. The asterisk in this position means "all columns". The next word is the keyword FROM which is followed by the name of a table. This is a common command, easy to understand, and easy to use. Its result is to cause SQL to display the contents of the named table. There are other ways to dress up this command, which we will see shortly.
The text returns to the INSERT command to show examples of inserting lines into the PRODUCTS table. In this case, we have a difference in database operation. When entering the data for the P_INDATE field, which is in DATE format, we need to know how our database expects that format to look. The example in the text uses quoted strings that look like this: '12-MAR-2018'. We are told that this format is used by Access and Oracle. It is like the British format of day-month-year. MySQL prefers a string like this for the same date: '2018-03-12'. This is more like the Swedish format of year-month-day. The advantage of this format, when used in text databases, is that this column, when indexed alphabetically, appears in chronological order.
When entering data for an empty field, such as an optional field, use the keyword NULL.
The text offers a way to enter data only in specific attributes with the INSERT command. This is the text example:
INSERT INTO PRODUCT(P_CODE, P_DESCRIPT) VALUES ('BRT-345','Titanium drill bit');
Note that the provided values would be inserted only into the named fields.
Now for a cautionary tale. It is possible to work on a database for hours, lose power, and lose all changes you made in that time. How could this happen. The changes made in a work session are saved to the database files when one of three events happen:
If you want to continue working, but save changes so far, the COMMIT command is the easy answer. Its syntax is very simple: COMMIT;. That's all, just the word COMMIT and a semicolon to end the command. This saves all changes made since the last COMMIT command was issued (or since the work session began). It is advisable that you issue a COMMIT command before you start anything that you fear may go wrong. If it does, you can undo all changes since the last COMMIT with the command ROLLBACK. Think of theme as save and undo commands. You can't undo past your last save, but you can undo almost everything after it. Almost? ROLLBACK will undo data manipulation commands, but not data definition commands. The text notes that Access does not support the ROLLBACK command.
Another kind of change to databases is handled by the UPDATE command, which is used to change data in existing records. This command has three phrases, illustrated by this example:
In this example, the command UPDATE is followed by the name of a table. The keyword SET is followed by the name of a field, an equal sign, and the data to be placed in that field. The keyword WHERE is followed by the name of a field, an equal sign, and data to identify the record in which the change is to be made. It's like saying, in this table, change this field to say x, in every record that matches this criterion. The SET phrase establishes the change to be made, and the WHERE phrase establishes the records in which to do it. The command usually uses two different fields in the SET and WHERE phrases. The text warns that leaving out a WHERE phrase would result in the change taking place in all records in the table. This is rarely a good idea.
The DELETE command is not used to delete tables, it is used to delete rows already in tables. The text provides an example:
DELETE FROM PRODUCT
This command would find rows in the PRODUCT table that match the rule in the WHERE statement, then delete those rows from the table. The text warns that leaving out the WHERE statement would cause all rows in the table to be deleted. MySQL has a safety feature that prevents it from running UPDATE and DELETE commands unless they have a WHERE statement that includes a value for the primary key. This means that you have to specify individual rows. This can be turned off, but it is not recommended.
The INSERT command can also be used to insert rows from one table into another table. The text does not offer an example, but it explains that we can combine make an INSERT command use a SELECT command as a nested query (which runs first) like this:
INSERT INTO PRODUCT
This is all one command, so we only need one semicolon at the end.
The text expands its discussion of SELECT command on page 273. This is the most used command in data retrieval operations. In the examples so far, the text has discussed using an equal sign in WHERE statements, such as WHERE P_CODE = 'BRT-345'. WHERE statements are more flexible when we use comparison operators, which allows them to select rows having ranges of acceptable values. Comparison operators work on text values as well as on numeric values because the comparison is based on the ASCII value of the text entry. The chapter offers an example:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_MIN, P_PRICE
In this example, the query would return five columns of data from the PRODUCT table, but only from rows in which the P_CODE value is less than the specified text, which means those that occur before the specified value in an ASCII-sorted list. When doing this with dates, the syntax depends on your database. The text offers the same command for three databases:
You can also do math with the values being compared in the WHERE clauses, such as looking for dates that are less than or equal to the current date minus 90 days. The text does not give you this code for MySQL. It should look like this:
WHERE P_INDATE <= CURDATE() - 90;
We can even throw in Boolean logic operators to combine more than one WHERE condition. The standard operators are AND, OR, and NOT. The text offers this example which is good in all addressed database:
SELECT P_DESCRIPT, P_INDATE, P_PRICE, V_CODE
This query would return four columns of data from the PRODUCT table meeting either of the two conditions in the WHERE statement.
On page 279, the text lists several "special operators", which can be used to look for situations not as easily covered by the condition examples so far:
On page 283, the text bring up more data definition commands. Several begin with the words ALTER TABLE. These are for making changes to existing data tables.
To change the structure of a table, you can use the ALTER TABLE command. This is an example of adding a new column to a table:
ALTER TABLE table_name
The syntax allows you to set a value for all rows in the table, if desired. If some rows should have different values, you can use UPDATE commands to change those rows.
ALTER TABLE table_name
You can also remove a column from a table with the ALTER command:
ALTER TABLE table_name
Another variation is to change the type or width of a column with the MODIFY clause:
ALTER TABLE table_name
MODIFY can also be used to change a column to not accept nulls. You should be aware that some systems will allow you to increase, but not decrease the width of a column. Some systems will not allow changes in data types. If your system does not allow changes to tables, you can still create a new table, and copy data with the INSERT INTO command shown above.
As noted later in the chapter, you can remove a table from your database with the DROP TABLE command. The syntax is simple:
DROP TABLE table_name;
Ordering Your Output
The output of most commands, SELECT commands in particular, can be sorted as you wish with some simple additions to the command in question. You know that the WHERE clause in a SELECT command is often optional. Another clause that is optional is the ORDER BY clause. This is an example of a SELECT command that uses it:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
In this example, the command is to retrieve data from four fields in the PRODUCT table, from all rows (since there is no WHERE clause this time), then arrange the rows according to the values in the P_PRICE column. Unless told otherwise, SQL will read a numeric field like P_PRICE and sort the rows in ascending order. To put the rows in descending order, we would have to add the keyword DESC to the command, like this:
SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE
When ordering rows of data, is it sometimes necessary to have primary, secondary, and tertiary sorts. The text gives us an example of this with names, which typically have first, middle, and last components. Your text refers to this as a cascading order sequence.
SELECT EMP_LNAME, EMP_FNAME, EMP_INITIAL, EMP_AREACODE,
In this case, the first sort is by last name, the second sort is by first name, and the third sort is by middle initial. This works because each subsequent sort is done on data that is already sorted by the previous sort. The examples above are provided without WHERE clauses only for simplicity. A WHERE clause, if needed, should precede the ORDER BY clause.
On page 292, the text shows another way to use SELECT. Using the last query as an illustration, you could enter a command like this:
SELECT EMP_LNAME FROM EMPLOYEE;
This would give you a list of the last names in each record in the table. If you only wanted to see a list of the last names with no repeats, that is, a list of all the different last names in the table, you might be able to do this:
SELECT DISTINCT EMP_LNAME FROM EMPLOYEE;
The keyword DISTINCT is meant to show each new name only once. It won't tell you how many people have each name, but it will tell you all the different names that people have. Obviously, this could be done with any attribute.
If you are interested in counts, totals and other statistical data, SQL has a set if aggregate functions, functions that summarize data in tables.
The text leads us to more interesting applications of such functions with the note that tells us we can substitute a nested query for the name of a table. We can change the example about distinct names above to get a count of those names:
SELECT COUNT(DISTINCT EMP_LNAME) FROM EMPLOYEE;
So, we can summarized particular columns in a table with the functions above. You can also summarize sections of a table by using ORDER BY clauses. Assume that you have data for several different offices, states, or regions in a table. You can tell it to give you an aggregate answer for each such grouping by telling it to ORDER BY the field that holds the designator for the separate groups. The text offers an example from the PRODUCT table. Each product has an associated vendor identified by a V_CODE entry. Can you tell what this command would do?
SELECT V_CODE, COUNT(P_CODE)
The command says to retrieve the V_CODE column from the PRODUCT table, GROUP BY the vendor number, calculate how many products there are for each group, and to output the data sorted by vendor number. In the example in the text on page 299, the output looks a little odd, because there are two rows in the table that have no V_CODE data. Other than that, the query returns a count of the products we carry from each of the vendors.
As if that were not complex enough, the GROUP BY command has an optional clause called HAVING. HAVING uses the same syntax as a WHERE clause in a regular SELECT query. HAVING is linked to the GROUP BY section of the query, which may help you to keep HAVING and WHERE separated in your mind. So, what does it do? It allows you to select only rows having particular features for your aggregation. Yes, that is probably not clear. An example that contains lots of features:
SELECT V_CODE, SUM(P_QOH * P_PRICE) AS TOTCOST
Once again, we are using the PRODUCT table. We want the V_CODE column, and a SUM of the quantity ordered times the price of the part, for each V_CODE group, and we will call that column TOTCOST in the output. That's in the first three lines, and it produces an output. On line 4, we are limiting the row selection from that output to V_CODEs whose SUM of quantity orders times price per part is greater than 500. Finally, we are ordering the output by the amount spent with the vendor.
The chapter ends with a return to an idea we have seen before, the JOIN query. A JOIN is any query that pulls data from more than one table. The logic required to do so is sometimes different from what you type in a query on one table. The text describes a query on column in the PRODUCT and VENDOR tables. The two tables are related on the V_CODE field which is found in both of them. Let's examine the query example:
SELECT P_DESCRIPT, P_PRICE, V_NAME, V_CONTACT, V_AREACODE, V_PHONE
The text also recommends using the ORDER BY clause in this kind of query, because the order of the records in the output can be random, making it harder to find the data you want, if you do not apply a specific order.
We will ignore the short section on joining tables using an alias for a table name, since the author has not told us how to establish the alias. Likewise, the section on recursive joins, which MySQL does not support.