MCTS Exam Cram: Implementing Database Objects
Terms you'll need to understand:
- SELECT command
- INSERT command
- UPDATE command
- DELETE command
- COMMIT statement
- ROLLBACK statement
- FROM clause and WHERE clause
- GROUP BY clause and HAVING clause
- LEFT JOIN or RIGHT JOIN
- INNER JOIN
- OUTER JOIN
- BETWEEN keyword
- ISNULL statement
- NULL value
- TABLESAMPLE clause
- CONTAINS command and CONTAINSTABLE command
- FREETEXT command and FREETEXTTABLE command
Techniques you'll need to master:
- Performing queries to extract data and produce reports
- Inserting data, updating data, and disabling objects during insertion
- Writing procedures that utilize transactions
- Setting up and using full-text queries
Although it is important to get tables and other objects in place, all that is really just the beginning of a database. With database applications, you put raw data in and attempt to gain information in return. Although every system is different, there are five basic functions that every system must support:
- Adding new data to the system
- Changing the existing data
- Removing obsolete or unwanted data
- Allowing user viewing of queries and data
- Reporting the data in a meaningful and useful manner
The server side does not necessarily perform the implementation of all these processes. In fact, most front-end applications provide the basics of these operations. The 70-431 exam does not focus a great deal on the actual writing of complete processes; its focus is more on individual situations where, as an administrator, you might perform the activities using some form of scripting.
This chapter is organized into individual scenarios that you may see on the exam and the operations you would perform to accommodate each situation.
Data Querying and Reporting
After data is in a database, it is likely to need to be accessed, changed, and reported on. To perform these basic operations, you need to apply the programming constructs of SQL, specifically Microsoft's implementation, referred to as Transact-SQL (T-SQL). Traditional applications can be completely centered on the four basic SQL commands: SELECT, INSERT, UPDATE, and DELETE. Essentially, these statements handle every operation that needs to be performed against the data. The most common of these constructs—the SELECT statement—is the basis for getting data out of the system.
SELECT statements can be complex and can include the use of options that can join many tables together and functions that can calculate and summarize data at the same time. However, a SELECT statement can be as simple as one line of code that retrieves the requested data. The complete SELECT syntax is involved, with many optional portions. You can find the complete syntax reference in SQL Server Books Online, under "SELECT, SELECT (described)." Many of the options are used only under special circumstances.
Listing the Contents of a Table
You will often be retrieving all the data from a particular table. Even if the final query is not intended to get all the data, you can often begin the data analysis by examining all the rows and columns of data in a particular table. The following example retrieves the customer data from the ONE database created in Chapter 2, "Creating Database Objects":
SELECT * FROM YUKONTWO.ONE.dbo.Customers
Note that the * is used to obtain all columns from the Customers table. It is also worth noting the use of the four-part name ( Server.Owner.Database.Object ). This name includes the server name YUKONTWO, database name ONE, the owner name dbo, and the name of the table itself, Customers.
Four-part names are used to perform queries when the one-part name of a table does not sufficiently qualify the table being queried. If you are executing a query within the scope of the server itself with the ONE database in use, and you are the owner or the owner is dbo, the four-part name is not necessary. There are therefore several valid variations on queries for the Customers table. Each of the following will produce the same results:
SELECT * FROM Customers SELECT * FROM dbo.Customers SELECT * FROM ONE.dbo.Customers SELECT * FROM ONE..Customers
Although queries often go after all the data in a table, there are a considerable number of options available for a query statement. You can choose some of the columns of a table, provide record-level conditions to limit the number of rows returned, put the output into groups, provide group-level conditions to selectively choose the groups, put the output into sorted order, or produce calculated results. You can also get into some complex queries through the use of JOIN, UNION, and subquery operations.
You do not need to formulate complex queries for the 70-431 exam, so this chapter covers only the basic theory of the use of the queries, which is what the exam focuses on.
You can optionally supply column headers to give a user-friendly listing of the data. By default, the column headers that are displayed in the result set are the same as the columns specified in the column select list, such as CUSTNMBR and CNTCPRSN.
Making a Report More Presentable
Why not change a result set's column header to something more readable? You can change the name of a result set column by specifying the keyword AS. (This is the traditional SQL-92 ANSI standard.) Changing the column name by adding an equals sign (=) or implied assignment is also an alternative syntax choice. (Of course, you would normally use only one of these three techniques, and the industry standard is SQL-92 ANSI.) The following example illustrates the use of column aliasing:
SELECT CUSTNMBR AS 'Customer Number', 'Customer Name' = CUSTNAME, CNTCPRSN 'Contact Person', FROM Employees SELECT CUSTNMBR AS 'Employee ID', CUSTNAME AS 'Customer Name', CNTCPRSN AS 'Contact Person', FROM Employees
Notice that the previous column aliases have been enclosed in single quotation marks. This enclosure is necessary when the column alias includes a space. The alias name must be enclosed within brackets when the alias is a reserved SQL Server keyword.
Sometimes you need to combine two columns together to show the two columns as one. When you do this, you are using a method called string concatenation. You can think of concatenation as joining strings together, just as you can combine words into phrases. The operator used to perform the concatenation is the plus sign (+).
Using TRIM to Remove White Space
To create a single name column, you combine the last name and first name values. If there are leading or training blanks within the data, you might want to polish the output a little bit, by using the functions LTRIM (left trim) and RTRIM (right trim). These functions remove leading spaces (LTRIM) or trailing spaces (RTRIM). The resulting code would then look like this:
SELECT LTRIM(RTRIM(Address1)) + ' ' + LTRIM(RTRIM(Address2)) + ' ' + LTRIM(RTRIM(Address3)) 'Full Address' FROM Customers
You can create your own function, which you could name TRIM, to eliminate both left and right spaces. This would be a handy feature to have in the product, and here is how it would look:
CREATE FUNCTION dbo.TRIM(@CHARSTRING NVARCHAR(255)) RETURNS NVARCHAR(255) AS BEGIN RETURN (RTRIM(LTRIM(@CHARSTRING))) END
Returning TOP Rows
The TOP clause limits the number of rows returned in a result set to a specified number or percentage at the top of a sorted range. Here are two examples:
- SELECT TOP 50 returns the top 50 rows.
- SELECT TOP 50 PERCENT returns the top 50% of the rows.
As an alternative to TOP, you can also limit the number of rows to return by using SET ROWCOUNT N . The difference between this keyword and TOP is that the TOP keyword applies to the single SELECT statement in which it is specified. For example, SET ROWCOUNT stays in effect until another SET ROWCOUNT statement is executed (for example, SET ROWCOUNT 0 to turn off the option).
You can optionally specify that the TOP keyword is to use the WITH TIES option. In this case, any number of records can possibly be displayed. WITH TIES displays all records that are equivalent to the last matching element. If you are looking for the top 10 employees and two employees tie for 10th, 11 or more records are displayed. If the tie is for 9th or a higher position, only 10 records are listed.
Of course, after you begin placing data in the desired order, you then need to group the output and perform calculations based on the groups. As discussed in the following section, grouping allows the production of subtotals and also provides more usable output in applications that require grouped output.
Displaying Groups in Output
You can use the GROUP BY clause of the SELECT statement to create groups within data. You can then use these groups to display data in a more orderly fashion or produce more meaningful results through the use of aggregate functions.
The GROUP BY clause specifies the groups into which output is to be shown and, if aggregate functions are included, calculations of summary values are performed for each group. When GROUP BY is specified, either each column in any non-aggregate expression in the select list should be included in the GROUP BY list, or the GROUP BY expression must match exactly the select list expression.
It is important to note that if the ORDER BY clause is not specified, groups returned using the GROUP BY clause are not in any particular order. It is recommended that you always use the ORDER BY clause to specify a particular ordering of data. Data will still be collected into groups. See the following example:
SELECT Country, Count(DISTINCT City) AS 'Number of Cities' FROM Customers GROUP BY Country
In this example, countries are collected together and are placed in the order chosen by SQL Server (usually ascending). The number of unique cities is counted and displayed beside the related country. By supplying the ORDER BY clause, as in the following example, you sort data into descending sequence, placing the country with the greatest number of unique cities at the top:
SELECT Country, Count(DISTINCT City) AS 'Number of Cities' FROM Customers GROUP BY Country ORDER BY Count(DISTINCT City) DESC
You might not want all groups to be included in the output. To exclude groups from the recordset, you can utilize the HAVING clause, which operates against the groups of data in the same way that the WHERE clause acts against the individual rows. In the example shown in Figure 3.1, the listing has been narrowed down through the elimination of countries with fewer than three unique cities.
Figure 3.1 An example of GROUP, HAVING, and ORDER used together.
The HAVING clause is similar to the WHERE clause. In a SELECT statement, these clauses control the rows from the source tables that are used to build the result set. WHERE and HAVING are filters: They specify a series of search conditions, and only those rows that meet the terms of the search conditions are used to build the result set. To address how these clauses are used, you must understand the conditions that can be applied within these clauses.
Querying a Sampling of the Data Stored
In some cases, you don't want to run a query against all the data in a table. In such a situation, you can use the new TABLESAMPLE clause to limit the number of rows that any query processes. Unlike TOP, which returns only the first rows from a result set, TABLESAMPLE returns rows selected randomly by the system from throughout the set of rows processed by the query.
You can specify the conditions of the random selection by percentage or by number of rows, as in the two following examples:
SELECT * FROM Customers TABLESAMPLE(10 PERCENT) SELECT * FROM Customers TABLESAMPLE(10 ROWS)
Selecting Rows Based on NULL Values
A NULL value is a value given to a field that that has no value. Many people confuse NULL values with zero-length strings or the value zero, but they are not the same. NULL is basically a fancy word for a value that is unknown. In SQL Server, you can select the desired NULL values or reject them by using ISNULL, as shown in the following query:
SELECT * FROM ONE.dbo.Customers WHERE ISNULL(StreetAddress, '#') = '#'
The ISNULL function operates on the basis of substitution. In the previous example, if a value for StreetAddress is not known for a particular customer, then within the query, it is replaced by and treated as the '#' character, which matches the condition of the WHERE clause and returns customers with NULL addresses. NULL values frequently show up as the result of JOIN operations that formulate derived tables.
Relating Data from Multiple Tables
Joins and derived tables figure prominently in the 70-431 exam. Joins are the backbone of relational databases; they actually put the "relation" in relational databases. They can be used in all the main SQL statements (SELECT, INSERT, UPDATE, and DELETE). They are important. Also, derived tables tend to be overlooked, and they're perceived as complicated even though they're not; so they are also likely to show up on the 70-431 exam.
Joining tables is a natural occurrence in a relational database system. Many of the queries performed on a regular basis involve multiple tables. Whenever you query data from two tables, you need to find some way to relate the two tables. Connecting one table to another requires a common element of the tables. You would use a JOIN operation in this manner whenever you want to see a result set that includes columns from several tables.
You can use three basic join types, as well as a union operation, to connect tables:
- Inner join— An inner join shows results only where there are matches between the elements. An inner join leaves out all the records that don't have matches.
- Outer join —An outer join can show all the records from one side of a relationship, records that match where they are available, and NULL values for records that do not have matches. An outer join shows all the same records as an inner join, plus all the records that don't match.
- Cross join— The cross join is less often used than the other two types of joins because it returns all possible combinations of rows between the two sides of the join. The number of records in the result set is equal to the number of records on one side of the join multiplied by the number of records on the other side of the join. No correlation is attempted between the two records; all the records from both sides are returned.
Again, cross joins are less frequently used than inner and outer joins. With an outer join, you are guaranteed to have all records returned from one side or the other. With inner joins, the only rows returned are those that match in the joined tables. It is easier to contemplate the overall processes if you consider join operations first. What you put in the WHERE clause and other clauses is applied after the joins are processed. So bear in mind that when a join returns a specified set of records, the SQL statement may or may not return all those records, depending on what you have specified in the WHERE clause.
Now let's look at each of the different join operators: INNER JOIN and OUTER JOIN.
Outputting Only Matches: INNER JOIN
The INNER JOIN statement is the easiest and most often used join operator. For this reason, when an inner join operation is performed, the INNER portion of the syntax is optional. A rowset is returned from the operation for the records that match up based on the criteria provided through the ON clause. A one-to-many relationship is handled inside an application with the inner join. To show all orders for a particular customer, you could use the following join operation:
SELECT Orders.* FROM Orders JOIN Customers ON Orders.CustomerID = Customers.CustomerID
The results from this query may appear a little on the unusual side because no sorting is performed. Typically, an order is specified or information is grouped to make the output more usable.
Returning Output Even When No Match Exists: OUTER JOIN
You can use an outer join when you want to return all of one entire list of rows from one side of the join. There are three types of outer joins: left, right, and full. The terms left and right are used based on the positioning of the tables within the query. The first table is the right; the second is the left. You may find it easiest to draw a picture to represent the table when you are first learning outer joins. A right outer join, often abbreviated as right join, returns all the rows belonging to the table on the right side and only the matching rows on the table on the left side. Conversely, a left outer join returns all the rows from the table on the left side. A full outer join returns all the rows from both sides that have correlations.
Left and right outer joins, for all intents and purposes, are the same operations; they are simply a matter of the position of the tables within the queries. Therefore, the following examples use only the left outer join syntax, which is the one typically used. In the current ANSI standard, RIGHT is the table name on the right side of the JOIN keyword, and LEFT is the table being joined. The following query produces a listing of all customers and their orders:
SELECT * FROM Customers AS C LEFT JOIN Orders AS O ON C.CustomerID = O.CustomerID
Customers that have never placed an order would still be in the listing, accompanied by NULL for the OrderID.
A cross join, also known as a Cartesian join, is rarely used because it connects every element in one table with every other element of another table. This has some bearing in statistical operations but is not relevant in most business processing.
Applying Conditional Data Filtering
You apply filtering to data to determine the data to be selected based on conditional requirements. Essentially, all conditions come down to one of three possible outcomes. If two values are compared, the result is positive, negative, or equal (greater than, less than, or equal to). Actually, filters always evaluate to a Boolean result, either True or False. BETWEEN 10 and 20 is either True or False. Even numeric tests, such as month > 0 and Price > 10.00, are either True or False.
BETWEEN, IN, and LIKE
With the help of the BETWEEN keyword, you can specify ranges when using the WHERE clause. Simply put, BETWEEN provides a range of values within which the data should lie; otherwise, the data does not meet the condition. BETWEEN is inclusive, meaning that the range includes the lower value specified and the upper value specified. For example, the following query would have the values 10 and 20 as a possibility in the results:
SELECT * FROM Products WHERE UnitPrice BETWEEN 10 AND 20
If the intent is to exclude the value 20, the query would be written like this:
SELECT * FROM Products WHERE UnitPrice BETWEEN 10.00 AND 19.99
You can also incorporate something known as a list when using the WHERE clause. Essentially, a list specifies the exact values a column may or may not take. If the record does not contain the value for the column specified in the IN list, it is not selected. IN determines whether a given value matches a set of values listed. Here is an example:
SELECT * FROM Customers WHERE Country IN ('UK', 'USA')
This example limits the values of Country to only UK and USA. Customers who live in the countries mentioned in the IN list are the only ones listed.
You can retrieve rows that are based on portions of character strings by using the LIKE predicate. The LIKE predicate determines whether a given character string matches a specified pattern. The data types a LIKE statement can work with are char, varchar, nvarchar, nchar, datetime, smalldatetime, and text. A pattern specified in the LIKE predicate can include regular characters and wildcard characters. During pattern matching, regular characters must exactly match the characters specified in the column value. Wildcard characters can be matched with any character or set of characters, according to the wildcard character used, as shown in Table 3.1.
Table 3.1. Wildcard Characters Allowed in T-SQL
Any single character within the specified range (for example, [f-j]) or set (for example, [fghij])
Any single character
Any number of zero or more characters
[ ^ ]
Any single character not in the specified range or set
If an application repeatedly calls the LIKE predicate and performs numerous wildcard searches, you should consider using the MS Search facility if it is installed and in use on the server. Consider the value of the response time over the storage resources that the MS Search service and full-text search capabilities require. MS Search service is required to use a full-text search. Full-text searching enables a variety of powerful wildcard searches. You should avoid LIKE searches that have a % wildcard at both the beginning and the end. The following example shows how the LIKE clause uses the % wildcard to select all customers whose CustomerID begins with the letter A:
SELECT CustomerID, ContactName FROM Customers WHERE CustomerID LIKE 'A%'
You can also use the NOT keyword with the LIKE predicate to simply retrieve a query that does not contain records matching the specified elements in the LIKE clause. With character matching, it is sometimes more efficient to exclude characters by using NOT.