Apr 10, 2011

The Case For Modeling Part 1

The regular readers of this blog know that my database modeling book was cancelled when Microsoft pulled the rug out from under M and the repository. I did a lot of good writing about modeling in general, so I wanted to put some of it up here on the blog, since the cancellation is official now. This is the first part of Chapter 1.

While I don’t want to spend a lot of time on a history lesson, some background information is necessary to make sure we are all on the same page. If you, the reader, have spent a lot of time watching the world of modeling pass us all by, then you might be able to safely skip the rest of this chapter.

What led to this

The rest of you: stick around. It’s a story of wonder and woe. You’ll laugh, you’ll cry. It’s better than Cats. You’ll read it again and again.

In all seriousness, the path that led to Microsoft’s latest software modeling effort is very interesting. It cuts to the core of business computer programming - how do I model my business process in code?

It is a problem that developers have faced since we were punching tape. In the end, it is all zeros and ones, so how do we make tools that make it easier to describe moving a box from point A to point B? That’s the heart of software modeling.

To break it down, let’s start with a description of the actual problem we face - moving from the model to the code. Then we will inventory some of the tools that have been tried over the years - many of which have led to SQL Modeling. Finally, we’ll look over CASE and the 800 pound gorilla: UML.

A description of the problem

Imagine you are the new CIO of a distribution company. Your company, let’s call it M Logistics, has no software. Yes, that’s what I said. They have no software at all. They use this weird stuff made out of ground up trees. They call it paper. You call it inefficient.

You are going to need to get some software in there, and you are going to need to do it fast. There are no off-the-shelf solutions available (more on that later) so you are going to have to build it.

Get out the whiteboard. Starting with a design is the best way, most will admit. There are questions that need to be asked.

What goes into a distribution company, information wise? The truck shows up with boxes. You unload the boxes, sort then in different ways perhaps. Then you put them in different trucks and send them off to a store.

“Fair enough,” you might think. “We will start with a truck containing boxes.” The drawing probably looks like Figure 1-1.

Figure 1-1 The whiteboard - truck contains boxes.

Great! Good start. Over the next few hours you map out the rest of the relationships between objects in your scope - or your domain model - and the whiteboard will be filled with goodness.

Oh, but wait. You can’t ship the whiteboard to the central office, so you will probably need to make a document. A word processor for the text will solve that problem, and some diagramming software for the model, like Figure 1-2.

Figure 1-2: The document - truck contains boxes

Nice work! Now you can package it up and send it to Central Office. All we have to do now is wait for approval.

Moving to code

The day has come and approval from Central Office has arrived (along with your new business cards, sweet!) It is time to start coding. Fire up Visual Studio, reference the carefully reviewed document, and start sketching out the domain model like the following listing.

namespace Warehouse
{
    public class Truck
    {
        public Box Boxes { get; set; }
    }
    public class Box
    {
    }
}

That’s a good start. We can work on the rest later. Now we all know that the C# class isn’t everything, we need a database to back it up. It is time to fire up the RDBMS of choice and model up a database. Depending on your tool, it should look like Figure 1-3

Figure 1-3: The database - trucks contain boxes

We are certainly getting somewhere. We have a domain model. We have a database. We … need some way to hook them together. Alright, enter your ORM of choice, or maybe roll up your sleeves and roll your own.

Eventually the decision is to use Entity Framework. Fine - still you have quite a few lines of code, mapping the database to the domain model:

  public partial class Truck : EntityObject
    {
        public static Truck CreateTruck(global::System.Int32 id)
        {
            Truck truck = new Truck();
            truck.Id = id;
            return truck;
        }

        [EdmScalarPropertyAttribute(EntityKeyProperty=true, IsNullable=false)]
        [DataMemberAttribute()]
        public global::System.Int32 Id
        {
            get
            {
                return _Id;
            }
            set
            {
                if (_Id != value)
                {
                    OnIdChanging(value);
                    ReportPropertyChanging("Id");
                    _Id = StructuralObject.SetValidValue(value);
                    ReportPropertyChanged("Id");
                    OnIdChanged();
                }
            }
        }
        private global::System.Int32 _Id;
        partial void OnIdChanging(global::System.Int32 value);
        partial void OnIdChanged();
    
        [EdmRelationshipNavigationPropertyAttribute("Model1", "TruckBox", "Box")]
        public EntityCollection<Box> Boxes
        {
            get
            {
                return ((IEntityWithRelationships)this).RelationshipManager.
GetRelatedCollection<Box>("Model1.TruckBox", "Box");
            }
            set
            {
                if ((value != null))
                {
                    ((IEntityWithRelationships)this).RelationshipManager.
InitializeRelatedCollection<Box>("Model1.TruckBox", "Box", value);
                }
            }
        }
    }

And now, finally, we can get around to actually writing the software. That is a lot of steps. Something is rotten in Denmark, methinks.

The gap is wider than you think

There are two things that are clearly wrong here.

First, you had to concern yourself with object/relational mapping. Aren’t we past that? Isn’t the concept well enough understood that we don’t need to put the code in books anymore?

Second, you touched the basic model five times. That’s far too many. Why can’t we use the model we start with as the domain model, the database and the ORM?

Mapping code to models

The first of these problems, object / relational mapping or O/RM, has been solved many times. We saw above how it is a problem that has to be solved every single time we build a data driven piece of software.

The solutions range from open source products with a wide community following to expensive rarely used commercial products that solve very specific issues.

The core problem is that gap

The reason that there are so many different solutions is because of the range of that gap between the objects and the database. The example at M Logistics is a fairly straightforward one, but there are numerous more sophisticated examples.

In a many to many situation, should the ORM model the joining table? What if it has data associated with it?
What happens if there are several different databases?
Also, sometimes some of the data silos in an entity model aren’t databases at all.
We haven’t even talked about the pattern of the domain model. Repository? Lazy loading?

The more issues in an O/RM the more sophisticated the software doing the mapping must be. The more sophisticated the software is the fewer people will want to use it. This leads to fragmentation in the software market. It also leads to confused users.

Various solutions

There are all kinds of object / relational mappers out there. Eric Nelson has a great inventory in his entity framework talk, which points out the following items:

NHibernate - a rewrite of Hibernate, which is originally an open source Java tool. Well used and mature.
EntitySpaces - a commercial tool that moves a lot of SQL functionality into the C# code.
LLBLGenPro - another commercial tool that generates code for you to do the mapping.
DevForce - a newer O/RM mostly for the web space.
XPO - a DevExpress product, eXpress Persistant Objects. It is a complete abstraction layer.
Lightspeed - a Mindscape product that has a stunning visual interface.
Open Access - the Telerik entry that depends on convention to avoid reflection.

This list is just in the Microsoft development space, too. There are lots of O/RMs for Java and other platforms. NHibernate, in fact, is a port of Hibernate, the well-known Java O/RM.

Microsoft’s tries

One would imagine that since all of these products are developed to fix a weakness in the Microsoft development space that eventually Microsoft would step in and fix it themselves. Well, they have tried. In fact, they have been trying all along, but let’s just start with the .NET era.

Typed datasets were the first .NET implementation of an ORM back in 2002, and they are still pretty cool for small projects. They effectively represented an in-memory copy of the data that would be committed to the RDB on demand. They became unwieldy for large projects, though.

ObjectSpaces was going to be the ‘next big thing’ as part of .NET 1.1 and 2.0 but never saw the light of day. I am not sure what happened there. Maybe we can get the scoop sometime.

The Microsoft Business Framework was, I think part of what would be the Dynamics group and grew out of the SharePoint models. I think. Anyway, it doesn’t matter because it didn’t ship either.

Then there was WinFS. I you hadn’t heard about that I’ll tell you over a beer sometime. Never shipped.

Finally, in 2007, lambda calculus was added to the CLR and LINQ was introduced. LINQ did actually ship, and it is currently the best way to query a model of a database in your code for the whole Microsoft stack. This book isn’t about LINQ, but if you don’t know it you should learn it. It is a very powerful technology.

The nice thing about LINQ is that it is datasource agnostic – exactly what we needed. It can talk directly to a SQL Server database, yes, but it can also talk to another object model (like a normalized Data Access Layer) or an Entity Model built in EDMX.

So that is where we leave the history lesson on O/RM. The next significant move by Microsoft is SQL Modeling, which is the topic of the rest of the book. First, though, let’s talk about the second topic – modeling the software.

Modeling with CASE and the gorilla in the room

A lot of people have tried to solve the modeling problem too. Remember how the model got redone 5 times? Most people agree that that needs to get better.

The Unified Modeling Language

The 800 pound gorilla in the room is the UML. A gang of four people who have forgotten more about software design than I will ever know got together one day. They left their egos at the door and took their four separate design patterns and built this system for diagramming software.

This system is the Unified Modeling Language or UML, and it is the most complete, most comprehensive possible way to describe how a piece of software works. I can’t even do it justice here, so I will just direct you to Martin Fowler’s great books on the subject – especially UML Distilled. I don’t leave home without it.

But the UML has two problems. First, it isn’t code. Since it is so very comprehensive, it can be converted to code, but you have to completely fill out the model, which usually takes longer than just writing the code! (As an aside, I often write the domain model in the language of choice, and then convert it to the UML.)

The other problem with the UML is that it doesn’t inherently have any knowledge of the database. The project still has the O/RM problem from the last section. Though the UML is expressive enough to infer a schema, it is sometimes too expressive, and the schema is only as good as the tools.

Many tools have attempted to solve this, from the simple Microsoft Visio to the mighty IBM Rational Rose. None of them do it well, because that is not really what the UML is designed for – it is not an O/RM. You are forced to write yet another model, like the venerable Object Role Model, or even just an Entity Model, and then manually do the mapping.

There is clearly room for a simpler yet better modeling pattern here. We may have created a system in the UML that has become too unwieldy to use. Yet just writing on whiteboards prevents remote team access, and is tough to distribute for review.

<whatever> Driven Design

A current trend is to use part of the architectural patterns as the design medium. For instance, Test Driven Design, or TDD, is basically the use of unit tests to describe expected outcomes of user stories, and then code until the tests pass, and then refactor.

TDD is a great pattern, and I use it a lot. However, it is tough to review, and nigh impossible to generate documentation from for a formal document. It almost has to be used in conjunction with something else.

Another alternative is Design by Contract, sometimes called Contract Driven Design or CDD. It works well in large teams, can be used with TDD, and can generate the UML if you need it to. It has a steep learning curve though, and is often overkill for large projects.

There are a host of others; you probably already have your favorite. Few of them do the O/RM well, and most of them are too tough to use for 10,000 foot view types of modeling.

Especially for forms over data types of applications (which constitutes well over half of the projects we all develop) there needs to be a new solution. Somewhere there is a solution that is easy to learn, distribute, and review, and that translates well to code and the database. SQL Modeling is that solution.