Thursday, January 26, 2012
perkthim 5
Relational Model: Data Integrity
While the Relational Model: Data Structure section discusses the form for representing data, the data integrity portion of the relation model defines mechanisms for ensuring that stored data is valid. At a minimum this requires the following:
1. Attribute Integrity
This means that values for attributes (columns) should come only from their respective underlying domains. For example, an integer field cannot have double values and a Currency field cannot have varchar (string). Note that the relational model also permits an attribute to contain NULL which is a marker to indicate a missing or unknown value. NULL is not a value, it is only a placeholder.
2. Entity Integrity
This is a straightforward concept. Every row in a table represents an entity (i.e., an instance of a table type), and each entity must be uniquely identifiable. In other words, no two rows can have the same exact set of values. From this property of entity uniqueness comes the principle that in every table there exists a set of columns whose values are never duplicated and can therefore be used as the table's primary key. More than one possible set of columns may always contain unique values - each of these sets is referred to as a candidate primary key and one is usually picked randomly.
A primary key is a minimal set of columns whose values uniquely identify a row in a table. The primary key is the only way to identify a specific row in a specific table. A consequence of the primary key description is that a primary key can never ever contain NULL. A similar argument holds for composite primary keys (keys consisting of two or more columns) - columns that are part of a primary key can never every contain NULL.
3. Referential Integrity
Simply stated, referential integrity requires that rows that exist in separate tables but are inter-related, be unambiguously interrelated by corresponding column values.
perkthim 4
Third Normal Form - 3NF
Recall that an entity type is 3NF when it is 2NF and all its attributes are directly dependent on its primary key. A better way to word this rule is that the attributes (columns) of an entity type (table) must depend on all parts of the primary key. Therefore, 3NF is only an issue for tables with composite primary keys:
In table [OrderPayment2NF], the payment-type description (i.e., Cash, Visa, etc) depends only on the payment type and not on the combination or Order ID and payment type attributes. To resolve this problem, [PaymentType3NF] was introduced as shown above. Note now that in [PaymentType3NF], PaymentDescription fully depends on PayementType.
perkthim 3
Second Normal Form - 2NF
The schema presented in First Normal Form 1NF can be further improved to the second normal form 2NF as shown below:
Recall that an entity type is 2NF when it is 1NF and all of its non-key attributes (columns) are fully dependent on its primary key. This is definitely not the case with OrderItem1NF - item information in OrderItem1NF does not depend on an order for that item. For example, if customer A orders 1 computer and customer B orders 2 printers, the fact that the items are called "computer" and "printer" and that the unit prices are $999 and $399 does not depend on the order and are constant. Item name and price depends on the concept of the item and not the concept of an order, and therefore should not be stored in the OrderItem2F table but rather in the new table Item2F. OrderItem2F retains the TotalPrice as this is a calculated column.
perkthim 2
Normal Forms
In database design there are a number of forms (or properties, or constraints) that a table scheme may possess. A table may assume a specific form to achieve certain design goals such as minimizing dependency. These forms are called normal forms. The following normal forms are discussed:
• First Normal Form 1NF
• Second Normal Form - 2NF
• Third Normal Form - 3NF
Each of these forms includes its predecessor. For example, a 2NF table is also a 1NF table, and a 3NF table is also a 2NF table, which is also a 1NF table. By definition, a table is 1NF. While it is generally desirable for database tables to have a high normal form, the situation is not as simple as it seems. Forcing all table schemes to be in a particular normal form may require some compromises. The following table summarizes the most common normalization rules that describe how to put entity types into a series of increasing levels of normalization.
The important thing to remember is that you want to store data in one place and one place only. With respect to terminology, a database schema is at the level of normalization of its least normalizes entity types. For example, if all entity types are at 2NF or higher, then the database schema is at 2NF
First Normal Form 1NF
In table Order0NF you can see that there are several repeating attributes - the ordered item information repeats nine times, and the contact information is repeated twice (once for shipping and once for billing). But what happens if an order has more than nine items? Do you create additional records for them? What about the majority of orders that have one or two items? Do you want to write redundant code to process the nine copies of order information?
The following figure represents a worked data schema where the [Order] schema is put into first normal form:
Table [OrderItem1NF] allows us to have as many order items as we like. This increases the flexibility of the schema while reducing storage requirements for small orders. Table [ContactInformation1NF] also offers a similar benefit when the billing and shipping addresses are the same. Table [OrderPayment1NF] was introduced to allow customers to make 0 or more payments against an order from [Order1NF] as table [Order0NF] presented previously could not accept more than two payments.
An important thing to notice here is the application of primary and foreign keys in the new schema. When a new table is introduced into the schema as the result of first normalization efforts, it is common to use the primary key of the original table as part of the primary key of the new table. For example, [OrderItem1NF] also includes the OrderID column within its schema. But because, many order items can have the same OrderID, ItemSequence was added to form a composite primary key for table [OrderItem1NF].
A different approach was taken with table [ContactInformation1NF]. The column ContactID, a surrogate column that has no business meaning, was made the primary key, while OrderID was needed as a foreign key to maintain the relationship back to table [Order1NF].
A good rule of thumb is that if two tables are strongly related (i.e., [Order1NF] and [OrderItem1NF]) then it makes sense to include the primary key of the original table as part of the primary key of the new table. If the two tables are not as strongly related (i.e., [Order1NF] and [ContactInformation1NF]), then a surrogate key makes more sense.
perkthim1
With this table schema, all data relevant to an order is stored in one row (assuming orders of up to nine items only - an unreasonable restriction). Because all data is in one row, data can be very easily and quickly accessed. For example, it is very easy to calculate the total amount by adding all ItemXPrice values. Now consider an alternative schema shown below:
Note on reading notation:
The notation used to link the tables together is UML notation, although other notation types can be used. Irrespective of what notation type is used, reading the diagram is the same. To read the notation: start with the table name, then any symbols attached to the connecting line or any description attached to the connecting line, then any multiplicity attached to the end of the connecting line, then the end table name. In other words: [StartTableName] + Symbol | Description + Multiplicity + [EndTableName]. The following examples illustrate:
• [Order2NF] and [OrderItem2NF]: An order has 1 or more order items.
• [OrderItem2NF] and [Item2NF]: An order item relates to only item only
• [Order2NF] and [ContactInformation]: An order is billed to contact information.
• [Order2NF] and [ContactInformation]: An order is shipped to contact information.
• [Order2NF] and [OrderPayment]: An order is made against zero or more payment.
• etc.
To calculate the total amount you need to read data from a row in the [Order] table, data from all the rows in the [OrderItem] table and data from the corresponding rows in the [Item] table for each order item. For this kind of query the previous schema provides better performance.
Note on redundancy
It is important to be clear about the meaning of redundancy. The redundancy that should be avoided is the repeated representation of the same fact. For example, in the following table which is 1NF, the fact that warehouse 1 is located in Stree1 is repeated twice (attributes in green are keys). This means that any change to a warehouse address must be made to all records that store the address:
Note: formally, the concept of facts is knows as functional dependence. If a table has two columns A and B, then B is functionally dependent on A if values of A uniquely determine values of B (in other words, the same value of A necessarily means the same value of B).
Let's look at another example that is not so obvious. Consider the [OrderItem] table which represents items for any given order:
The redundancy here is the price for each item. In many businesses, an item will always have one price and the preceding table contains redundant item prices. But what if item prices were negotiated on each order or there were special promotions? Then it would be possible for the relation above to have the following contents:
For the business model that this relation represents, there is no fact redundancy. The price of an item must be explicitly stored with each item.
This example presents a crucial point in understanding normal forms: If you want to unambiguously represent various facts in a database, you must structure the relations so that they have no redundant representations of facts. The obvious consequence is that there is no mechanical method of normalizing relations. You must first know which facts you want to represent and then define the relations accordingly.
Subscribe to:
Comments (Atom)







