On Information

Concepts in Information Management – blog by Ronald Fuller

Normalization is a Dubious Concept

Database normalization is a dubious concept for 2 reasons:

  1. The cost-benefit value of any normal form can only be determined by an information owner
  2. Only a subject matter expert (SME) can identify the normal form of a set of tables

 

As an example of reason 1, the table below contains an error which you can see in the last row:  an address in Grand Junction, Tennessee has the same ZIP code as the address above it in Grand Junction, Colorado.  This is clearly a mistake, but without investigating further we cannot know whether the error is with the State or the Zip.

Address1.JPG

 

Breaking City and State off into two separate tables as shown below eliminates any potential for this kind of inconsistency,  however the advantage might come at a cost because performance can suffer when applications and reports have to work with 2 tables instead of 1.

Address2

So what is more important, better performance or lower risk of error?  Who should decide?  Only information owners have the appropriate perspective and incentive to make that kind of decision wisely (see Sophotaxis).  Architects and engineers can provide valuable cost-benefit advisement, but final decisions should be made by owners.

 

The example above is pretty simple, but large business databases can have thousands of similar cost-benefit scenarios that can be far more complex.  The more complex the situation, the greater the need for ownership perspective and expertise in the specific business issues at hand.

 

phone.JPG

For an example of reason 2, consider a simple table containing only one column with phone numbers.  Many engineers would agree that this table is in first normal form.  In most situations it would work fine just as it is, or as a column in a larger table.  But for a company such as a telephone service provider, where users would want to group or sort phone numbers by area code or exchange, this table would not be in first normal form.  In other words, it would not satisfy even the minimum essential standard for use in a modern database system.  Instead, in this particular case, the numbers should be broken out into each meaningful component as shown in the lower image.

 

This shows that even in very simple cases the normal form of a table can only be determined by someone who understands the full-range of potential uses for the information.  Business databases are filled with situations like this, but unfortunately most decisions about organizing information are made by technologists rather than business experts.  That is why businesses struggle with databases that cannot adapt well to changing requirements.

 

Information should be organized in a way that makes the most sense to its owner, not according to some predetermined normal form which cannot even be reliabely achieved.  The normal forms are guidelines to improve performance and protect consistency.  But any decision made for performance can potentially degrade the usefulness of the information, and the range of decisions needed to protect consistency can only be determined by a business expert.

 

A far more useful and comprehensive tool for thinking about the way information should be organized is the principle of sophotaxis, which I explain in The Ancient Secrets of Information Management.

 

For related discussion see:

Add comment

People in this conversation

Comments (1)

Subscribe:
  • Guest - Robert

    Hello, the information you publish here is very interesting. Accepting standardization is still a pain for people, and it is doubtful what will happen next. Take a look sur le site de CM.com, see what they call normal, their way of seeing, and compare. Thank you