Friday, September 07, 2007

Using O/R Mapping in a software architecture - Batch processing

With a pretty good idea of how the data model of my little Learnbox application should look, I guess I can start with the implementation. Right? Hm... No, better to start with the architecture first. Better to make a plan of what I need in terms of structures and relationships. Then the implementation should be a piece-a-cake.

Features

I want to build the Learnbox application incrementally in several release. So let´s start with a feature subset. Which features would be easy to implement first and still worthwhile for my friend Vera? I think it would be nice if she could start setting up card sets as soon as possible. For each card set (e.g. for learning French) she needs to find the questions and ansers to put on index cards. Then, later, when she has completed her index card collection she wants to learn the stuff and actually put the card set in a Learnbox.

But first she needs a means to set up card sets at all. A small card set editor. Its features have been described in my first posting on the Learnbox:

  • I think, Vera wants to start with managing card sets. She needs to be able to create a new card set, list all card sets she already created, delete a card set, edit a card set. The usual CRUD functionality.
  • Editing a card set means altering its properties (e.g. its name) and its content. The content of a card set are its index cards. So she needs to be able to add, delete, alter index cards and list all index cards in the card set. Again the usual CRUD functionality.

Now, implementing a card set editor will take some time. Should Vera need to wait with her collecting index cards? No, she should be able to start right away. So in addition to the card set editor I´ll define an easy file format for card sets to later import from within the editor.

Card set file format specification:

  • Card sets are stored each in a text file with Windows default encoding.
  • The file name is the name of the card set.
  • Each text line contains a question-answer-pair separated by a tab character.

This kind of file can easily be edited with Excel and exported as a tab delimited .txt-file. Vera can start right away entering her index cards; she does not need to wait for me to finish the card set editor.

And I won´t start with the editor but with a small import program. That way Vera can not only compile her card sets in Excel, but import them already into a database. As my "customer" she thus sees pretty soon I´m really making progress with the implementation.

With this slight change of plan the feature table of the Learnbox software looks like this:

Feature# Feature description
F1 Import card set text file using a small standalone console application.
F2 Card set editor, card set CRUD
F2.1 List all card sets
F2.2 Import card set
F2.3 Edit card set properties and store changes
F2.4 Create new card set
F2.5 Delete card set
F3 Index card CRUD
F3.1 List all index cards in a card set
F3.2 Change index card and store changes
F3.3 Add new index card to card set
F3.4 Remove index card from card set

 

I´ll implement these features in two or three releases:

  • Rel 1: F1
  • Rel 2: F2 .. F2.5
  • Rel 3: F3 .. F3.4

This might look like a bit of overkill for such a simple application. But as I stated right at the beginning of my blog: I don´t just want to play with O/R Mapping in isolation. I want to see it in context and that means I want to use it in a quite realistic scenario. That not only means a certain topic like Learnbox to apply O/R Mapping to, but also architecture and a bit of process at least.

Release 1

What needs to be planned for a release? It sure will need some kind of UI, it sure needs a data model and some business logic. All this should be driven by the release´s features.

Data model

Let the features drive the data model first. How should the persistent data look like?

image

I think that´s easy and can readily be derived from the scenario´s description. Each card set is represented by a CardSet object. Each CardSet contains a collection of IndexCard objects. For the moment I set the index cards to be dependent on the card set as discussed in my previous posting.

    1         [OpenAccess.Depend]

    2         private IList<IndexCard> indexCards;

Why use an IList<T> as a container? Because Vanatec Open Access (VOA) does not suppport the generic collection classes. Since VOA needs to track changes to persistent collections it needs to replace any collection object with an enhanced version (see my previous posting on the enhancement process). This works by deriving an enhanced collection class from one of the old collection types like ArrayList. But Microsoft has sealed the new generic collections, so no enhanced version can be derived. VOA thus implements its own enhanced generic collections - which only "resemble" Microsoft´s by implementing the generic interfaces like IList<T>.

So whereever you need a collection of objects, use generic collection interfaces for maximum type safety - and don´t worry about VOA´s magic. It will replace any non-persistent implementation you choose for such persistent fields with its own upon loading a persistent object or when you add it to an IObjectScope.

    1     [OpenAccess.Persistent]

    2     public class CardSet

    3     {

    4         protected CardSet() { }

    5 

    6 

    7         public CardSet(string name)

    8         {

    9             this.name = name;

   10             this.indexCards = new List<IndexCard>();

   11         }

That´s how my data model is dealing with the persistent collection:

There is a protected parameterless constructor for VOA to use upon loading an object from the database. The indexCards field is not initialized by this ctor, because VOA will assign it its own persistent IList<T> collection.

Whenever I create a persistent object myself, though, I assign a List<T> collection to the collection field, so any index cards I add find a home. However, VOA will replace this collection transparently as this code shows:

    1             using (IObjectScope os = Database.Get(...).GetObjectScope())

    2             {

    3                 os.Transaction.Begin();

    4 

    5                 lb.datamodel.CardSet cs;

    6                 cs = new lb.datamodel.CardSet("Peter");

    7                 Console.WriteLine(cs.IndexCards.GetType().Name);

    8 

    9                 os.Add(cs);

   10                 Console.WriteLine(cs.IndexCards.GetType().Name);

The type listed in line 7 is List`1, but the type listed in line 10 is VOA´s TrackedGenericIList`1. Fortunately this does not make a difference functionality-wise.

User interface

The user interface for the first release is simple: Vera just needs a way to enter a filename and import the file. Just one function/event is necessary to accomplish this feat:

image

Since this tiny program is only a first and temporary manifestation of the card set application, I think a .NET console application will servce the UI purpose.

Architecture I - functional decomposition

No application is too small to do without an explicit and conscious architecture. So what are the building blocks of release 1? Data needs to be stored in a database, data needs to be read from text files, someone needs to coordinate the import, some user interaction is necessary. The following picture shows the functional decomposition of release 1 into modules for each of these requirements:

image

Distinct functionality has been identified and separated, and technical issues (e.g. text file access) have been separated from logical issues (e.g. coordinating the import).

The UI module is just there to let the user interact with the real "business logic" sitting in the Import module. The Import uses the TxtAdapter to read index card data from the tab delimited text files Vera saves with Excel, creates CardSet and IndexCard objects from them as defined in the Data model, and then stores them in the Persistent memory.

Architecture II - Relationships

Knowing the players in this game - the modules, the functional units - is but a first step towards an architecture. Without knowing their relationships, their dependencies it´s hard to tell, what exactly they should do. For the UI and the Data model I was able to determine their contract from the features the release is supposed to implement. But for the other modules that´s not really possible - or it would be like devination. I´m not a big fan of bottom up design. Don´t try to define the functionality of, say, a data access layer by asking yourself "Hm, what could this module possibly do? What kind of functions would be nice?" That´s too inconcrete, too fuzzy. You probably would end up implementing unnecessary functions.

Instead I´m always modelling relationships between software artifacts first and then step through them in client-services pairs to see, what each client artifact whould like its service artifacts to look like. Only that way I´m sure to arrive at minimal contracts for each module.

The value stream for the single feature of release 1 looks like this:

image

It describes explicitly and without any cycles which module relies on which other modules to accomplish its task. A value stream differs from an UML collaboration diagram in that service software artifacts are not reused if more than one client depends on them (see Data model in the above value stream). That way possible bad cycles are detected right away while drawing the value stream and when you step through the diagram it´s much easier to get the contracts right, because you can focus much better.

The contracts, i.e. classes and functions each service module needs to provide its clients are also described in the value stream. That´s contract-first design (CfD) which makes it easier to work in parallel on the different modules. But CfD does not only increase productivity, it also makes testing easier. However, in this blog I don´t want to dwell on such issues of software modelling. For the Learnbox releases I won´t employ CfD fully; there will be no separate contracts and all modules will be developed in the same VS.NET solution.

At this point I find it much more important to have a feature-oriented cross-cut through the release to show, how the different functional units collaborate to achieve a certain result: to fullfil a requirement.

Of special interest to me are the modules concerned with persistence: the Data model and the Persistent memory. The Data model defines the persistent classes, the domain information. Those classes are more or less "intelligent" and are the "common vocabulary" all/many other modules share. The Persistent memory on the other hand is the location where instances of Data model classes are stored. Very purposedly I call this module "Persistent memory" instead of the usual "Data access layer", because I think it should be as visible as regular main memory aka the heap. Instances of volatile classes are usually not managed by some low level layer of an architecture. Any layer in a layered architecture can instanciate volatile classes and manage such objects. Heap memory is ubiquitous.

And I think, persistent classes should not be treated differently just because they are non-volatile. Since persistent memory is not woven into the .NET Fx, though, it needs to be set up explicitly - that´s why there is a Persistent memory module. Nevertheless persistent memory - in general - should be as visible/accessible as volatile memory.

image

As long as I used ADO.NET to store/retrieve data I did not think that way. What came out of some data access module did not look like my "ordinary objects" so it seemed right to treat it differently and confine access to it to a low level layer. But O/R Mapping blurs the distinction between volatile and persistent objects. That´s why I suggest, we should rethink where we put persistent object handling.

Architecture III - Contracts for persistence

All contracts should be minimal with regard to the features which drove their definition. For the TxtAdapter this means, it does not sport some fancy random access index card access, but just a simple contract following the reader pattern used in ADO.NET:

    1 using (CardSetFileReader r = new CardSetFileReader(filename))

    2 {

    3     CardSet cs = new CardSet(r.CardSetName);

    4     ...

    8     while (r.ReadQA())

    9     {

   10         IndexCard ixc = new IndexCard();

   11         ixc.Question = r.CurrentQuestion;

   12         ixc.Answer = r.CurrentAnswer;

   13         cs.IndexCards.Add(ixc);

   14         ...

   17     }

   18 }

The TxtAdapter does not know of any persistent objects. But the Data model does - of course ;-) But it too is ignorant regarding how persistent objects are actually persisted and retrieved. It´s a strict separation of concerns.

But the Persistent memory knows how to manage persistent objects so their state kept in a non-volatile store. And one of my main questions is: How should the contract of a persistent memory look like? For an ADO.NET based data access layer this is easy: just publish a couple of routines to retrieve/store the data which then is disconnected from the data access layer.

O/R Mapping is different, though. It´s magic stems from (seemingly) being connected to a database all the time. And this fundamental difference - O/R Mapping being stateful vs ADO.NET being stateless - cannot be arbitrarly hidden, I think.

That´s why I defined an AddCardSet() method for the Persistent memory module. A developer should "feel", that persistence is not for free. As much as I like abstractions, I think here is a point, where a little less than perfect abstraction is in order. Any module can create a persistent object - but unless it´s registered with the Persistent memory it´s not guaranteed to really get persisted.

Or to be more precise: This is only true/necessary for root objects like CardSet instances. IndexCard instances are registered with CardSet objects and thus will be persisted even though they are not explicitly made known to the Persistent memory. That´s persistence by reachability.

What also needs to be made visible is when actually object state is made permanent. VOA uses transactions as brackets around code working on persistent objects in memory - and at whose end those objects get persisted. You don´t need to save each object individually. But you need to open/close the transaction explicitly. So for each feature I need to think about how work to do before I persist the results of this work. For the index card import I decided to wrap importing a whole text file in a single transaction:

    1 LearnboxMemory lbm = LearnboxMemory.Instance;

    2 try

    3 {

    4     lbm.BeginTransaction();

    5 

    6     using (CardSetFileReader r = new CardSetFileReader(filename))

    7     {

    8         ...

   23     }

   24 

   25     lbm.CommitTransaction();

   26 }

   27 catch

   28 {

   29     lbm.RollbackTransaction();

   30 }

For that kind of business logic I think that´s quite reasonable. But you need to get used to such thinking as you´ll see when we get to release 2.

I implemented the Persistent memory´s LearnboxMemory class as a singleton (the the Instance property in line 1 above) since I believe a singularity like a database should be matched by a single instance of its access service.

Otherwise the Persistent memory is very, very straightforward:

    1 public class LearnboxMemory : IDisposable

    2 {

    3     ...

   22     private IObjectScope os;

   23 

   24     public LearnboxMemory()

   25     {

   26         this.os = OpenAccess.Database.Get("...").GetObjectScope();

   27     }

   28 

   29     #region IDisposable Members

   30 

   31     public void Dispose()

   32     {

   33         this.os.Dispose();

   34     }

   35 

   36     #endregion

   37     #endregion

   38 

   39     #region transactions

   40     public void BeginTransaction()

   41     {

   42         os.Transaction.Begin();

   43     }

   44 

   45     public void CommitTransaction()

   46     {

   47         os.Transaction.Commit();

   48     }

   49 

   50     public void RollbackTransaction()

   51     {

   52         os.Transaction.Rollback();

   53     }

   54     #endregion

   55 

   56     #region storing cardset objects

   57     public void AddCardSet(CardSet cs)

   58     {

   59         os.Add(cs);

   60     }

   61     ...

The methods pretty much pass on all the work to VOA. Their main reason to exist is to hide VOA´s specific API from the rest of the application - regardless how simple it is.

Compare this to a usual ADO.NET data access layer... It would have cost me at least 2 to 3 times more code, I´d say.

So that´s the beauty of O/R Mapping:

  • The Data model shows barely any dependencies on O/R Mapping; except for the IList<T> I designed it like an ordinary volatile object model.
  • There is only minimal persistence logic.

However, the O/R Mapper´s relyance on availability of a database requires you to think about when to actually have your data persisted. There is no full automation possible - if you want to retain good performance.

2 comments:

Anonymous said...

Doesn't the use of attributes like Persistent break SOC and SRP by including Data Access related information within the business object? This is not neccessary with NHibernate.

Ralf Westphal - One Man Think Tank said...

@Anonymous: Well, you´re fundamentally right. But the question is: shouldn´t someone looking at a class not know, if instances of it get persisted or not?

Maybe it´s a matter of taste. But I find it important to see right away if a class is persistent or if it´s used across the wire and what not.

I want to see it, but I don´t want such aspects have an impact on the implementation. With an attribute this works just fine.