Big data can provide powerful insights into large data sets. Some scholars and practitioners have even suggested that big data tools and techniques might replace relational databases for ordinary business use, but this claim offers only false hope for organizations that struggle with relational databases. Here's why:
Big data systems work with information organized into small 2-part chunks known as key-value pairs (or some related format). For example: Last Name = Fuller; City = Redmond; Car = Honda Accord; Order status = complete.
Organizing information this way is great for things like analyzing trends and detecting patterns. But big data formats cannot be used for ordinary business reporting unless each record is tagged with additional information to tell which other records it is related to. For example: this address belongs to that person; this item goes with that order, and so forth. Applying these kinds of tags to information in a big data format requires exactly the same kind of discipline and pre-planning as it would if it were organized for a relational database. Big data offers nothing new in this regard.
Even when a big data record set includes complete information about the relationships between each pair, big data technologies do not offer anywhere near the flexibility of relational databases for reporting purposes. So any claim that big data presents a plausible alternative to relational databases for general business use is uninformed and false.
Here are 3 ways to lower costs and improve outcomes in any BI or analytics project, or for that matter any information management effort. They require varying degrees of commitment ranging from easy, free, and doable right now, to a need for significant change in organization and culture:
- Have each information owner (meaning the business person who decides the requirements for an information resource) actually look at the proposed way the information will be organized; in other words, the actual tables and relationships with sample data (very important). Some might think this sounds like asking them to look at programming code – Not so! Programming code is purely technical in nature. Deciding how information is organized is purely business in nature; the only input that should be needed from IT is cost-benefit advisement on issues like performance. The way information is organized determines how it can be used. The operational capabilities of any organization are literally determined by the way its information is organized. A business owner cannot possibly make a complete list of every possible use case for an information resource, but when they look directly at a proposed format and see how it is organized they can easily determine whether that resource will meet their foreseeable needs even before they try to express any requirement. Of course, what an information owner considers foreseeable can change over time, but it will always change more slowly than the constantly changing list that must be maintained when an architect or engineer is in charge of determining requirments from their own perspective. The time a business expert spends doing this will be returned many times over. It will reduce scope creep, save development cycles and produce better outcomes every time, guaranteed.
- Assign an information manager to every business unit. Information managers can be drawn from the same talent pool as business analysts. They are business-oriented professionals often trained at university business schools to manage information and determine how it should be organized. There is no reason these people should work in any IT department except as information managers for business units within IT. Properly-placed information managers will eliminate the need for business analysts and will be 3x to 10x more effective and productive. Information managers should report to the same office as business managers, usually a GM or VP. An information manager should be responsible to decide how information produced by that business unit will be organized according to the priorities and requirements of the business. When information needs to be organized across multiple systems and business units, information managers should coordinate with their cross-department peers and respond to policies set by senior information managers who report directly to the COO or CFO. Information managers for large business units may require a staff, as will those for some smaller organizations depending on the rate-of-change and complexity of the information they manage.
- Have everyone in your organization take a course in fundamental logic. This is not too much to ask – logic was once a core focus of classical education. In fact it was one of the main reasons universities were invented in the first place. Logic remained a central pillar of university curricula for hundreds of years until around the 1940's or so, but since then has been severely de-emphasized at great detriment to the discipline of information management. Today a person can earn a PhD in nearly any subject including business administration or computer science without taking a single introductory course in formal logic. Almost any person at any level of a modern organization can create new information resources, so logic education and logic-aware management are absolutely essential for any organization that wants to build an effective culture and capacity to manage information.
The Fundamentals of Information Management is the title of a presentation I gave to various audiences at Microsoft Corp. in 2014 and 2015.
Not every discipline is based on fundamental principles. When someone talks about the fundamentals of marketing, or the fundamentals of business communications, for example, they are usually talking about introductory concepts or things you should know if you want to succeed in those endeavors. Other disciplines however, such as chemistry and music, are deeply rooted in fundamental principles or laws of nature. Achieving a level of professional competence in that kind of field requires not just mastery of concepts and tasks, but a sound understanding of the underlying principles.
When fundamental principles are discovered within some discipline, or when techniques based on those principles are developed, the result is often significant advancement and rapid growth of that discipline. For example: in the field of chemistry the existence of chemical compounds and reactions were known long before the structure of the atom and its fundamental properties were understood, but alchemy was a primitive and unpredictable craft. The publication of the periodic table laid the foundation for the scientific discipline of chemistry and allowed it to advance very quickly.
Music is also based on fundamental principles, and people have created music since prehistoric times. The ancient Greeks understood that music had mathematical properties, but early notation systems did not fully reflect that understanding. As a result, early written music could be used to help musicians remember a tune they had already heard but was not a reliable way for someone to learn one they had never actually heard. Modern music notation brought significant improvement because it allows the mathematical elements of a composition to be described in precise detail, and allows even the expressive elements to be described in some detail. The standardization of modern notation led to an enormous and rapid expansion of creative output during the golden age of classical music which has continued ever since.
The field of Information management is not generally recognized as having a basis in truly fundamental principles. Sources using the phrase "fundamentals of information management" usually include content ranging from tips on data governance, to best practices for business intelligence, to lists of terms and definitions for various information technology topics. Regardless of the merit of these sources or how mportant such topics may be, none of them actually reflect any connection to truly fundamental principles.
The most persuasive evidence I have seen pointing to a fundamental basis for information management comes from the profession of accounting, which is the discipline of managing financial information. Modern accounting has evolved from the practice of double-entry bookkeeping which has been used since at least 1299 AD, and was first described in a published work in 1494 by the Franciscan monk Luca Pacioli, who was the most influential mathematician of the Renaissance and a friend to Leonardo DaVinci.
The preface to Ancient Double-Entry Bookkeeping (1914), which is the earliest modern English translation of Pacioli's original treatise, indicates that the author recognized a basis in fundamental principles for double-entry bookkeeping. He wrote:
It is a significant fact that the rules and principles elucidated by Pacioli are contained in a book given over to mathematics. One cannot help but believe that the derivation of double-entry bookkeeping is an explanation of the algebraic equation used with such skill by the ancient Greek mathematicians, applied practically to the scientific recording of business transaction.
The algebraic equation he refers to is logic of Aristotle and Euclid. As a database professional, this struck me as significant because the theory underlying modern database technology, known as the relational model, is based on principles of mathematical logic discovered in the late 19th and early 20th centuries which also have roots in the logic of Aristotle and Euclid.
With this extraordinary historical information I decided to look further; and in January, 2014 I discovered something remarkable that has enormous positive implications for business and the future of commerce: The system of interconnected books (which I'll refer to as relational bookkeeping) used in double-entry bookkeeping is precisely consistent with the relational model. This means every object and process can be described in terms of the relational model and can be operated upon using a relational language:
- Journals and inventories are relations (transaction tables)
- The repertory or finding key is a relation (reference table)
- Book entries are n-tuples (rows)
- Columns are domains (attributes)
- Ledger accounts are filtered views of journals, therefore they are also relations
- Journal entries refer to ledger accounts by way of page reference numbers, which are foreign keys to each account
- Balance sheets and other financial statements are filtered or aggregated views of the ledger, capital and cash accounts
- The rules of double-entry bookkeeping provide consistency and referential integrity
- A Chart of accounts is a system catalog (metadata dictionary)
In other words, Italian merchants in the Middle Ages invented fully-functional, manually-operated relational database management systems nearly 700 years before the mathematical foundations of such systems were to be formally described.
This astonishing fact stands as evidence that computers and technology have not fundamentally changed the basic nature of information management as has been assumed since the 1970's. It reestablishes that information management is a business discipline, rather than a technology or even a hybrid discipline; and that the business discipline of information management is, indeed, based on fundamental principles. And not just any fundamental principles but principles considered by some to be the most fundamental of all principles. Kurt Gödel, the best friend of Albert Einstein and whom their colleagues saw as his intellectual equal, described mathematical logic as:
a science prior to all others, which contains the ideas and principles underlying all sciences" (1945)
Compare this language to the following statements about double-entry bookkeeping by economist Werner Sombart in 1919:
Double-entry bookkeeping came from the same spirit which produced the systems of Galileo and Newton, and the subject matter of modern physics and chemistry.
Double-entry bookkeeping is based on ... the basic principle of quantification which has delivered up to us all the wonders of nature, and which appeared here for the first time in human history in all its clarity.
Sombart and others were sharply criticized for supposedly overstating the nature and significance of double-entry bookkeeping. But that was before anyone recognized its congruence with modern mathematical logic – notwithstanding Sombart's claims quoted above, which is understandable, since relational bookkeeping was practiced for more than half a millennium before mathematical logic was developed and Sombart did not reveal that he recognized any connection – but it is very clear that Gödel and Sombart were talking about the same thing. With this new understanding it is hardly possible to overstate the significance of relational bookkeeping. It represents the ultimate intersection of theory and practice – the ne plus ultra of practical application emerging from the pre-primordial fabric of the cosmos; with obedience to the same principles of natural logic underlying not only the behavior of matter and energy, but our ability to contemplate and manipulate them.
Sombart goes even further and suggests a spiritual connection:
...we cannot regard double-entry bookkeeping without wonder and astonishment, as being one of the most artistic representations of the fantastic spiritual richness of European man".
Along those lines the following excerpt shows that people have recognized the significance of logic in our relationship with the universe for a very long time:
John 1:1 In the beginning was the Word, and the Word was with God, and the Word was God
Orig. Greek En archē ēn ho Lógos, kai ho Lógos ēn pros ton Theón, kai Theós ēn ho Lógos
Logos is the ancient philosophical concept of divine reason, which Heraclitus described as both the source and fundamental order of the Cosmos. It is also the word Aristotle used to describe his system of formal reasoning which became the basis of modern mathematical logic. Christians believe Logos in the passage above refers to Jesus Christ, "in whose name", Luca Pacioli wrote, "our transactions must always be made". Muslim scholars embraced classical logic during the Golden Age of Islam when it was the most scientifically advanced civilization on earth. Hindu clerics also independently described principles of formal logic in religious texts known as Sutras.
The purpose of my course and lectures on the Fundamentals of Information Management (FoIM) is to explain how logic is not just a tool for technical specialists to query information in databases, but must also be recognized as a tool for information owners to express requirements for information resources. This will allow business organizations to finally gain effective management control over their information and achieve new capabilities that have been impossible without a solid understanding of the underlying principles.
Information is the sine qua non of all commerce – a status not even money can claim. Money is, after all, a form of information.
A business resource is anything that brings value to a business. Classical economists described business resources in terms of factors of production. Land, labor and capital are the primary factors because they do not become part of any finished product and are not consumed or significantly changed by the production process. Resources such as raw materials and energy are secondary because they are derived from the primary factors. From the classical perspective even things like entrepreneurship, intellectual property and the time value of money are derived from labor and capital, so they too are considered secondary formulations of the primary factors.
So where does information fit in? Information is obviously an important business resource, but is it a primary factor or secondary? Or is it something else?
Information is consumed in the production process but not in the sense that it is depleted or reduced; in fact new information is created by every act of production and commerce. Further, information is non fungible, which means it cannot be substituted one unit for another such as a kilowatt of electricity, an ounce of gold, or a computer. Information cannot be replaced the way a building or an executive can be. No business resource can be effectively utilized without information.
For these reasons information must be acknowledged as superior to every other business resource. It is more primary than the primary factors. Information is the sine qua non of all commerce – a status not even money can claim. Money is, after all, a form of information.
As late as 1946 there were in the combined professional, technical and scientific press of the United States only seven articles on the subject of information
So why did the classical economists not have anything to say about information as a factor of production? My guess is that information is so essential to every aspect of commerce that until the mid 20th century it was not even recognized as a distinct resource class. In 1963 a professor of management noted “As late as 1946 there were in the combined professional, technical and scientific press of the United States only seven articles on the subject of information" (see here).
Information is like the air we breath – nothing can happen without it, but it is easy to ignore until you have reason to notice.
Managing information is the most difficult and costly operational challenge facing most businesses. At the root of the problem is a failure to recognize the distinction between information resources and technology resources. To their detriment, businesses treat them as the same thing. My evidence for this claim is that no distinction is ever made in the requirements expressed for either, or in the way each is managed. They are delivered and maintained by the same people and no distinction is recognized at any point in the lifecycle processes of either type of resource. Information resources are mistakenly treated as components of automated systems.
As a result, some of the most important management decisions at every level of enterprise organizations are unwittingly delegated to technical specialists instead of business experts. Efforts to address the resulting problems without addressing the root cause only make the problems worse. It is a vicious circle that creates thick layers of artificial complexity in the form of initiatives, roles and processes which lead to additional costs and complexity. The only way to solve the problem is to recognize that information resources are not the same thing as the technology-based tools used to access and maintain them. Businesses must develop a capacity to determine and express requirements for information resources separately from those of automated systems.
An information resource is information organized for some purpose. It can take the form of anything from a memorized telephone number to the Library of Congress or the entire internet. The following table lists various types of information resources, how they are organized, and what they are useful for:
|Information resource:||Organized by:||Useful for:|
|File cabinet||Drawers with alpha or numeric sorting||Manual document retrieval|
|Novel||Sentences, paragraphs, chapters||Entertainment, relaxation|
|Library||Subject, author||Finding publications|
|Relational database||Tables, columns, rules, relationships||Flexible storage, retrieval and analysis|
|XML file||Tags, nested hierarchies||Transporting and sharing data|
|Big data||Key-value pairs||High-volume capture and processing|
|Semantic ontology||Triples (subject, predicate, object)|| Making information discoverable
The way information is organized determines how it can be used, so decisions about the organization of information should be carefully considered by the owner and managers of the resource. Unfortunately, owners and managers usually only provide high-level guidance, and the actual decisions about the way information gets organized are instead delegated to an architect or technical specialist. This is a costly mistake with long-term consequences. The outcome is almost always an information resource that cannot be used the way its owners intend without being modified for every newly desired use.