Modeling Information: Part 2 - Logical leap

Tuesday, January 18th, 2011 at 11:28 pm

I love alliteration!

In my previous post I talked about the common programmer’s sin of pushing the logical model into the physical model.  In this post, I want to make the argument that because of this most of the time a physical database is no different than what we traditionally call “unstructured data.”  Further, what we call the physical to logical divide is actually a continuous spectrum.   This will probably be awfully obvious, but I think this is a crucial leap in order to understand a general model for information.

The Mythical Unstructured Data

As software developers, we often think of everything that sits outside a database as unstructured data.  For example, a ton of e-mails, or a document repository.  Even sound and image data is considered unstructured.

We say they are unstructured because we can’t easily program something to gleam any real useful information from them.  Although we can see the bytes and raw data,  there’s no way a program could understand what the words or pictures mean.

Now back to our original example.  I present our larger-than-life data model:

We say they are unstructured because we can’t easily program something to gleam any real useful information from them. Since we don’t understand what ContactKind really is, it is impossible to write something to interpret it.  There’s all kinds of unstructured data hiding in there actually.  We don’t know what Group is or what Contact is.  Again, it might as well be:

I think the term unstructured data is thrown around too much.  Programmer’s sin #2!  Unstructured data is everywhere, especially in our databases.

The Mythical Physical Logical Divide

Since unstructured data is everywhere, even our databases, this should dispel the physical logical divide myth.

I think the physical logical relationship can be summed up at a very high level:

There’s a whole lot missing from this diagram. But the idea is that the physical model can be as obscure as our unstructured model depending on context and point of view.

In the next post, we’ll look closer at this “spectrum” and what it means for data types.

Modeling Information: Part 1 - The programmer’s trick

Sunday, December 28th, 2008 at 8:37 pm

My latest project at work has lead me down an unexpected road of wrapping my head around modeling information from a very high level.  It’s made me realize a lot of my own past mistakes of overlooking the obvious.  As a self-taught programmer, I’ve picked up some very fundamental bad habits.  Since I’m a programmer, I’m going to start there with a simple concrete problem.  Through the next parts in this series, we’ll hopefully work our way toward an ultimate generic model for information.

(more…)

Visual Studio Macro for Sentence Case

Wednesday, May 7th, 2008 at 4:10 pm

I can’t believe I couldn’t find this, so I wrote one:

    Sub SentenceCase()
        Dim textSelection As EnvDTE.TextSelection
        Dim newString As String = ""
        Dim firstCase As Boolean = False

        textSelection = CType(DTE.ActiveDocument.Selection(), EnvDTE.TextSelection)
        For Each ch As Char In textSelection.Text.ToCharArray()
            If Not firstCase And Char.IsLetter(ch) Then
                ch = Char.ToUpper(ch)
                firstCase = True
            ElseIf firstCase And Char.IsLetter(ch) Then
                ch = Char.ToLower(ch)
            End If
            newString = newString + ch
        Next

        textSelection.Text = newString
    End Sub

New Blag

Monday, April 14th, 2008 at 12:45 am

Previously I had Serendipity, but I decided to go mainstream with Wordpress. The interface is a lot more fun to use out of the box which hopefully means I’d update this more. It was ridiculously easy to switch since it will import blogs via RSS feed.

I signed up for Akismet since in the first thirty minutes of publishing the blog, I already had a spam post. So far so good.

Not only that, I’m trying out TwitterTalk about mainstream. I realize that since I don’t have a lot of time to blog, putting up a stream of out-of-context short thoughts could be worth it.

SubSonic Benchmarks

Thursday, March 20th, 2008 at 11:18 pm

I’ve started using an open source project at work called SubSonc. It’s a combination ORM and code generation tool for creating your DAL in .NET. It looks at your database and creates classes with CRUD functions for every table. Out of the box with no configuration. The best part is that they are implemented as partial classes so you can add your own business logic to them very easily. Finally, it doesn’t put the hate on stored procedures - all of them are made available via the generated DAL.

In the past, I had stored procedures for everything. After writing 50 stored procedures I started to worry. I decided to rewrite a small web project to use Subsonic for it’s DAL. I share with you the results in numbers.

(more…)