Normalization is a Dubious Concept
Database normalization is a dubious concept for 2 reasons:
- The cost-benefit value of any normal form can only be determined by an information owner, not by a database expert, and
- The normal form of most tables can only be determined by a subject matter expert (SME).
As an example of reason 1, the table below contains an error which you can see in the last row: an address in Grand Junction, Tennessee has the same ZIP code as the address above it in Grand Junction, Colorado. This is clearly a mistake, but without investigating further we cannot know whether the error is with the State or the Zip.
Breaking City and State off into two separate tables as shown here eliminates any potential for this kind of inconsistency, however the advantage might come at a cost because performance can suffer when applications and reports have to query 2 joined tables instead of 1.
So what is more important, better performance or lower risk of error? Who should decide? Only information owners have the appropriate perspective and incentive to make that kind of decision wisely (see Sophotaxis). Architects and engineers can provide valuable cost-benefit advisement, but final decisions should be made by owning stake holders.
The example above is pretty simple, but large business databases can have thousands of similar cost-benefit scenarios that are far more complex. The more complex the situation, the greater the need for ownership perspective and expertise in the specific business issues at hand.
As an example of reason 2, consider a simple table containing only one column with phone numbers. Many engineers would say that this table is in first normal form. In most situations it would work fine just as it is, or as a column in a larger table. But for a company such as a telephone service provider, where users might want to group or sort phone numbers by area code or exchange, this table would not be in first normal form. In other words, it would not satisfy even the minimum theoretical standard for use in a modern database system. Instead, in this particular case, the numbers should be broken out into each meaningful component as shown in the lower image.
This shows that even in very simple cases the normal form of a table can only be determined by someone who understands the full-range of potential uses for the information. Business databases are filled with situations like this, but unfortunately most decisions about organizing information are made by technologists rather than business experts. That is why businesses struggle with databases and create systems that cannot adapt well to changing requirements.
Information should be organized in a way that makes the most sense to its owner, not according to some predetermined normal form which cannot be reliably or purposely achieved. The normal forms are usually seen as guidelines to improve performance and protect consistency. But any decision made for performance can potentially degrade the usefulness and value of the information, and the range of decisions needed to protect consistency can only be determined by a business expert.
A far more useful and comprehensive tool for thinking about the way information should be organized is the principle of sophotaxis, which I explain in The Ancient Secrets of Information Management.
For related discussion see:
Post comment as a guest