Modeling Information: Part 2 - Logical leap

Tuesday, January 18th, 2011 at 11:28 pm

I love alliteration!

In my previous post I talked about the common programmer’s sin of pushing the logical model into the physical model.  In this post, I want to make the argument that because of this most of the time a physical database is no different than what we traditionally call “unstructured data.”  Further, what we call the physical to logical divide is actually a continuous spectrum.   This will probably be awfully obvious, but I think this is a crucial leap in order to understand a general model for information.

The Mythical Unstructured Data

As software developers, we often think of everything that sits outside a database as unstructured data.  For example, a ton of e-mails, or a document repository.  Even sound and image data is considered unstructured.

We say they are unstructured because we can’t easily program something to gleam any real useful information from them.  Although we can see the bytes and raw data,  there’s no way a program could understand what the words or pictures mean.

Now back to our original example.  I present our larger-than-life data model:

We say they are unstructured because we can’t easily program something to gleam any real useful information from them. Since we don’t understand what ContactKind really is, it is impossible to write something to interpret it.  There’s all kinds of unstructured data hiding in there actually.  We don’t know what Group is or what Contact is.  Again, it might as well be:

I think the term unstructured data is thrown around too much.  Programmer’s sin #2!  Unstructured data is everywhere, especially in our databases.

The Mythical Physical Logical Divide

Since unstructured data is everywhere, even our databases, this should dispel the physical logical divide myth.

I think the physical logical relationship can be summed up at a very high level:

There’s a whole lot missing from this diagram. But the idea is that the physical model can be as obscure as our unstructured model depending on context and point of view.

In the next post, we’ll look closer at this “spectrum” and what it means for data types.