Friday, September 07, 2007

Using O/R Mapping in a software architecture - Batch processing

With a pretty good idea of how the data model of my little Learnbox application should look, I guess I can start with the implementation. Right? Hm... No, better to start with the architecture first. Better to make a plan of what I need in terms of structures and relationships. Then the implementation should be a piece-a-cake.

Features

I want to build the Learnbox application incrementally in several release. So let´s start with a feature subset. Which features would be easy to implement first and still worthwhile for my friend Vera? I think it would be nice if she could start setting up card sets as soon as possible. For each card set (e.g. for learning French) she needs to find the questions and ansers to put on index cards. Then, later, when she has completed her index card collection she wants to learn the stuff and actually put the card set in a Learnbox.

But first she needs a means to set up card sets at all. A small card set editor. Its features have been described in my first posting on the Learnbox:

  • I think, Vera wants to start with managing card sets. She needs to be able to create a new card set, list all card sets she already created, delete a card set, edit a card set. The usual CRUD functionality.
  • Editing a card set means altering its properties (e.g. its name) and its content. The content of a card set are its index cards. So she needs to be able to add, delete, alter index cards and list all index cards in the card set. Again the usual CRUD functionality.

Now, implementing a card set editor will take some time. Should Vera need to wait with her collecting index cards? No, she should be able to start right away. So in addition to the card set editor I´ll define an easy file format for card sets to later import from within the editor.

Card set file format specification:

  • Card sets are stored each in a text file with Windows default encoding.
  • The file name is the name of the card set.
  • Each text line contains a question-answer-pair separated by a tab character.

This kind of file can easily be edited with Excel and exported as a tab delimited .txt-file. Vera can start right away entering her index cards; she does not need to wait for me to finish the card set editor.

And I won´t start with the editor but with a small import program. That way Vera can not only compile her card sets in Excel, but import them already into a database. As my "customer" she thus sees pretty soon I´m really making progress with the implementation.

With this slight change of plan the feature table of the Learnbox software looks like this:

Feature# Feature description
F1 Import card set text file using a small standalone console application.
F2 Card set editor, card set CRUD
F2.1 List all card sets
F2.2 Import card set
F2.3 Edit card set properties and store changes
F2.4 Create new card set
F2.5 Delete card set
F3 Index card CRUD
F3.1 List all index cards in a card set
F3.2 Change index card and store changes
F3.3 Add new index card to card set
F3.4 Remove index card from card set

 

I´ll implement these features in two or three releases:

  • Rel 1: F1
  • Rel 2: F2 .. F2.5
  • Rel 3: F3 .. F3.4

This might look like a bit of overkill for such a simple application. But as I stated right at the beginning of my blog: I don´t just want to play with O/R Mapping in isolation. I want to see it in context and that means I want to use it in a quite realistic scenario. That not only means a certain topic like Learnbox to apply O/R Mapping to, but also architecture and a bit of process at least.

Release 1

What needs to be planned for a release? It sure will need some kind of UI, it sure needs a data model and some business logic. All this should be driven by the release´s features.

Data model

Let the features drive the data model first. How should the persistent data look like?

image

I think that´s easy and can readily be derived from the scenario´s description. Each card set is represented by a CardSet object. Each CardSet contains a collection of IndexCard objects. For the moment I set the index cards to be dependent on the card set as discussed in my previous posting.

    1         [OpenAccess.Depend]

    2         private IList<IndexCard> indexCards;

Why use an IList<T> as a container? Because Vanatec Open Access (VOA) does not suppport the generic collection classes. Since VOA needs to track changes to persistent collections it needs to replace any collection object with an enhanced version (see my previous posting on the enhancement process). This works by deriving an enhanced collection class from one of the old collection types like ArrayList. But Microsoft has sealed the new generic collections, so no enhanced version can be derived. VOA thus implements its own enhanced generic collections - which only "resemble" Microsoft´s by implementing the generic interfaces like IList<T>.

So whereever you need a collection of objects, use generic collection interfaces for maximum type safety - and don´t worry about VOA´s magic. It will replace any non-persistent implementation you choose for such persistent fields with its own upon loading a persistent object or when you add it to an IObjectScope.

    1     [OpenAccess.Persistent]

    2     public class CardSet

    3     {

    4         protected CardSet() { }

    5 

    6 

    7         public CardSet(string name)

    8         {

    9             this.name = name;

   10             this.indexCards = new List<IndexCard>();

   11         }

That´s how my data model is dealing with the persistent collection:

There is a protected parameterless constructor for VOA to use upon loading an object from the database. The indexCards field is not initialized by this ctor, because VOA will assign it its own persistent IList<T> collection.

Whenever I create a persistent object myself, though, I assign a List<T> collection to the collection field, so any index cards I add find a home. However, VOA will replace this collection transparently as this code shows:

    1             using (IObjectScope os = Database.Get(...).GetObjectScope())

    2             {

    3                 os.Transaction.Begin();

    4 

    5                 lb.datamodel.CardSet cs;

    6                 cs = new lb.datamodel.CardSet("Peter");

    7                 Console.WriteLine(cs.IndexCards.GetType().Name);

    8 

    9                 os.Add(cs);

   10                 Console.WriteLine(cs.IndexCards.GetType().Name);

The type listed in line 7 is List`1, but the type listed in line 10 is VOA´s TrackedGenericIList`1. Fortunately this does not make a difference functionality-wise.

User interface

The user interface for the first release is simple: Vera just needs a way to enter a filename and import the file. Just one function/event is necessary to accomplish this feat:

image

Since this tiny program is only a first and temporary manifestation of the card set application, I think a .NET console application will servce the UI purpose.

Architecture I - functional decomposition

No application is too small to do without an explicit and conscious architecture. So what are the building blocks of release 1? Data needs to be stored in a database, data needs to be read from text files, someone needs to coordinate the import, some user interaction is necessary. The following picture shows the functional decomposition of release 1 into modules for each of these requirements:

image

Distinct functionality has been identified and separated, and technical issues (e.g. text file access) have been separated from logical issues (e.g. coordinating the import).

The UI module is just there to let the user interact with the real "business logic" sitting in the Import module. The Import uses the TxtAdapter to read index card data from the tab delimited text files Vera saves with Excel, creates CardSet and IndexCard objects from them as defined in the Data model, and then stores them in the Persistent memory.

Architecture II - Relationships

Knowing the players in this game - the modules, the functional units - is but a first step towards an architecture. Without knowing their relationships, their dependencies it´s hard to tell, what exactly they should do. For the UI and the Data model I was able to determine their contract from the features the release is supposed to implement. But for the other modules that´s not really possible - or it would be like devination. I´m not a big fan of bottom up design. Don´t try to define the functionality of, say, a data access layer by asking yourself "Hm, what could this module possibly do? What kind of functions would be nice?" That´s too inconcrete, too fuzzy. You probably would end up implementing unnecessary functions.

Instead I´m always modelling relationships between software artifacts first and then step through them in client-services pairs to see, what each client artifact whould like its service artifacts to look like. Only that way I´m sure to arrive at minimal contracts for each module.

The value stream for the single feature of release 1 looks like this:

image

It describes explicitly and without any cycles which module relies on which other modules to accomplish its task. A value stream differs from an UML collaboration diagram in that service software artifacts are not reused if more than one client depends on them (see Data model in the above value stream). That way possible bad cycles are detected right away while drawing the value stream and when you step through the diagram it´s much easier to get the contracts right, because you can focus much better.

The contracts, i.e. classes and functions each service module needs to provide its clients are also described in the value stream. That´s contract-first design (CfD) which makes it easier to work in parallel on the different modules. But CfD does not only increase productivity, it also makes testing easier. However, in this blog I don´t want to dwell on such issues of software modelling. For the Learnbox releases I won´t employ CfD fully; there will be no separate contracts and all modules will be developed in the same VS.NET solution.

At this point I find it much more important to have a feature-oriented cross-cut through the release to show, how the different functional units collaborate to achieve a certain result: to fullfil a requirement.

Of special interest to me are the modules concerned with persistence: the Data model and the Persistent memory. The Data model defines the persistent classes, the domain information. Those classes are more or less "intelligent" and are the "common vocabulary" all/many other modules share. The Persistent memory on the other hand is the location where instances of Data model classes are stored. Very purposedly I call this module "Persistent memory" instead of the usual "Data access layer", because I think it should be as visible as regular main memory aka the heap. Instances of volatile classes are usually not managed by some low level layer of an architecture. Any layer in a layered architecture can instanciate volatile classes and manage such objects. Heap memory is ubiquitous.

And I think, persistent classes should not be treated differently just because they are non-volatile. Since persistent memory is not woven into the .NET Fx, though, it needs to be set up explicitly - that´s why there is a Persistent memory module. Nevertheless persistent memory - in general - should be as visible/accessible as volatile memory.

image

As long as I used ADO.NET to store/retrieve data I did not think that way. What came out of some data access module did not look like my "ordinary objects" so it seemed right to treat it differently and confine access to it to a low level layer. But O/R Mapping blurs the distinction between volatile and persistent objects. That´s why I suggest, we should rethink where we put persistent object handling.

Architecture III - Contracts for persistence

All contracts should be minimal with regard to the features which drove their definition. For the TxtAdapter this means, it does not sport some fancy random access index card access, but just a simple contract following the reader pattern used in ADO.NET:

    1 using (CardSetFileReader r = new CardSetFileReader(filename))

    2 {

    3     CardSet cs = new CardSet(r.CardSetName);

    4     ...

    8     while (r.ReadQA())

    9     {

   10         IndexCard ixc = new IndexCard();

   11         ixc.Question = r.CurrentQuestion;

   12         ixc.Answer = r.CurrentAnswer;

   13         cs.IndexCards.Add(ixc);

   14         ...

   17     }

   18 }

The TxtAdapter does not know of any persistent objects. But the Data model does - of course ;-) But it too is ignorant regarding how persistent objects are actually persisted and retrieved. It´s a strict separation of concerns.

But the Persistent memory knows how to manage persistent objects so their state kept in a non-volatile store. And one of my main questions is: How should the contract of a persistent memory look like? For an ADO.NET based data access layer this is easy: just publish a couple of routines to retrieve/store the data which then is disconnected from the data access layer.

O/R Mapping is different, though. It´s magic stems from (seemingly) being connected to a database all the time. And this fundamental difference - O/R Mapping being stateful vs ADO.NET being stateless - cannot be arbitrarly hidden, I think.

That´s why I defined an AddCardSet() method for the Persistent memory module. A developer should "feel", that persistence is not for free. As much as I like abstractions, I think here is a point, where a little less than perfect abstraction is in order. Any module can create a persistent object - but unless it´s registered with the Persistent memory it´s not guaranteed to really get persisted.

Or to be more precise: This is only true/necessary for root objects like CardSet instances. IndexCard instances are registered with CardSet objects and thus will be persisted even though they are not explicitly made known to the Persistent memory. That´s persistence by reachability.

What also needs to be made visible is when actually object state is made permanent. VOA uses transactions as brackets around code working on persistent objects in memory - and at whose end those objects get persisted. You don´t need to save each object individually. But you need to open/close the transaction explicitly. So for each feature I need to think about how work to do before I persist the results of this work. For the index card import I decided to wrap importing a whole text file in a single transaction:

    1 LearnboxMemory lbm = LearnboxMemory.Instance;

    2 try

    3 {

    4     lbm.BeginTransaction();

    5 

    6     using (CardSetFileReader r = new CardSetFileReader(filename))

    7     {

    8         ...

   23     }

   24 

   25     lbm.CommitTransaction();

   26 }

   27 catch

   28 {

   29     lbm.RollbackTransaction();

   30 }

For that kind of business logic I think that´s quite reasonable. But you need to get used to such thinking as you´ll see when we get to release 2.

I implemented the Persistent memory´s LearnboxMemory class as a singleton (the the Instance property in line 1 above) since I believe a singularity like a database should be matched by a single instance of its access service.

Otherwise the Persistent memory is very, very straightforward:

    1 public class LearnboxMemory : IDisposable

    2 {

    3     ...

   22     private IObjectScope os;

   23 

   24     public LearnboxMemory()

   25     {

   26         this.os = OpenAccess.Database.Get("...").GetObjectScope();

   27     }

   28 

   29     #region IDisposable Members

   30 

   31     public void Dispose()

   32     {

   33         this.os.Dispose();

   34     }

   35 

   36     #endregion

   37     #endregion

   38 

   39     #region transactions

   40     public void BeginTransaction()

   41     {

   42         os.Transaction.Begin();

   43     }

   44 

   45     public void CommitTransaction()

   46     {

   47         os.Transaction.Commit();

   48     }

   49 

   50     public void RollbackTransaction()

   51     {

   52         os.Transaction.Rollback();

   53     }

   54     #endregion

   55 

   56     #region storing cardset objects

   57     public void AddCardSet(CardSet cs)

   58     {

   59         os.Add(cs);

   60     }

   61     ...

The methods pretty much pass on all the work to VOA. Their main reason to exist is to hide VOA´s specific API from the rest of the application - regardless how simple it is.

Compare this to a usual ADO.NET data access layer... It would have cost me at least 2 to 3 times more code, I´d say.

So that´s the beauty of O/R Mapping:

  • The Data model shows barely any dependencies on O/R Mapping; except for the IList<T> I designed it like an ordinary volatile object model.
  • There is only minimal persistence logic.

However, the O/R Mapper´s relyance on availability of a database requires you to think about when to actually have your data persisted. There is no full automation possible - if you want to retain good performance.

Friday, July 06, 2007

A first stab at a persistent data model for the sample application - Mapping a 1:n relationship

In my previous posting I introduced a scenario to which I want to apply Vanatec OpenAccess (VOA) as an example for an O/R Mapper. I want to see how it can help me reducing the effort to invest into a data access layer. And I want to see how persistent objects can be handled in business logic and fronend. Developing a solution for a concrete scenario will make it easier for me to excersice the different facettes of OpenAccess, I think.

Mapping a 1:n relationship

Despite want I wrote at the end of that posting, though, I will not yet tackle the architecture of the Learnbox application. Rather I first want to explore what VOA offers to make 1:n relationships between domain model entities persistent. Take for example the relationship between a set of cards and its index cards:

image

Thinking object oriented I would use some kind of collection to keep all the references to IndexCard objects in a CardSet object, e.g.

    1     public class CardSet

    2     {

    3         public List<IndexCard> indexcards;

    4     }

    5 

    6     public class IndexCard

    7     {

    8     }

However the VOA documentation tells me, I cannot use generic collection types directly. Instead I need to revert to their respective interfaces, e.g. IList<T> for List<T>. The reason for this: VOA will fill an IList<T> field with its own implementation of the interface, i.e. some kind of prefab persistent collection. I think, that´s fine for me for the moment at least. So my first shot at the above small data model is this:

image

Now I can write some code to check how easy retrieving object along such relationships is. Here´s my initial VS solution:

image

And here´s the database model that VOA generated from my two persistent classes:

image

As you can see, VOA generated a special join table (card_set_index_card) to map the relationship between CardSpace and IndexCard objects which I expressed using an IList:

  • There are two foreign key columns, one for each table of the relationship.
  • And there is a sequence number field which keeps track of the position of an object in a relationship. Is it the first, second or last element in the IList<T>?
  • The parent table foreign key (card_set_id) and the sequence number together form the primary key for the join table. Thus the same object (represented by the child table foreign key (index_card_id)) can exist several times within the same IList<T>.

Interesting - but why did VOA generate such a table at all? Because a join table is the most general means to model relationships other than 1:1. Had I designed the database schema by hand, I would have put a foreign key into index_card referring to a card_set row directly. But that way each index card would have been bound to exactly one card set. That´s not what the object model says, though. The same IndexCard object can possibly belong to several CardSet parent objects. So the mapping VOA chose is not only the most general one, but also the most true. Well then, let´s stick with it at least for now.

In my code of course this all does not concern me. Adding and retrieving objects is as easy as can be:

    1 using System;

    2 using System.Collections.Generic;

    3 using System.Text;

    4 

    5 namespace test_datamodel

    6 {

    7     class Program

    8     {

    9         static void Main(string[] args)

   10         {

   11             // filling the database

   12             using (OpenAccess.IObjectScope os = OpenAccess.Database.Get("LearnboxDBConnection").GetObjectScope())

   13             {

   14                 os.Transaction.Begin();

   15 

   16                 lb.datamodel.CardSet cs;

   17                 lb.datamodel.IndexCard ic;

   18 

   19 

   20                 cs = new lb.datamodel.CardSet();

   21                 cs.Name = "German";

   22 

   23                 ic = new lb.datamodel.IndexCard();

   24                 ic.Question = "Haus, das";

   25                 ic.Answer = "house";

   26                 cs.IndexCards.Add(ic);

...

   38                 os.Add(cs);

   39 

   40 

   41                 cs = new lb.datamodel.CardSet();

   42                 cs.Name = "French";

...

   62                 os.Transaction.Commit();

   63             }

   64 

   65 

   66             // retrieving data

   67             using (OpenAccess.IObjectScope os = OpenAccess.Database.Get("LearnboxDBConnection").GetObjectScope())

   68             {

   69                 OpenAccess.IQuery q = os.GetOqlQuery("select * from CardSetExtent");

   70                 OpenAccess.IQueryResult qr = q.Execute();

   71                 foreach (lb.datamodel.CardSet cs in qr)

   72                 {

   73                     Console.WriteLine("{0}", cs.Name);

   74                     foreach (lb.datamodel.IndexCard ic in cs.IndexCards)

   75                     {

   76                         Console.WriteLine("  {0} : {1}", ic.Question, ic.Answer);

   77                     }

   78                 }

   79             }

   80         }

   81     }

   82 }

 
Here´s the output:
 
image
 

 And here´s the proof, VOA uses its own collection data type for the IList<T> field in CardSet: a TrackedGenericList.

image

Removing objects related to each other

Well, this all looks ok to me so far. But let´s exercise the relationship a little bit more: How to delete an IndexCard? How to reorder the IndexCards of a CardSet?

Deleting an IndexCard should be as easy as removing it from the list of its CardSet:

    1 using (OpenAccess.IObjectScope os = OpenAccess.Database.Get("LearnboxDBConnection").GetObjectScope())

    2 {

    3     os.Transaction.Begin();

    4 

    5     OpenAccess.IQuery q = os.GetOqlQuery("select * from CardSetExtent cs where cs.name='German'");

    6     OpenAccess.IQueryResult qr = q.Execute();

    7 

    8     lb.datamodel.CardSet cs = (lb.datamodel.CardSet)qr[0];

    9     cs.IndexCards.RemoveAt(0);

   10 

   11     os.Transaction.Commit();

   12 }

And indeed the IndexCard for "Haus, das" is no longer listed with with the "German" card set:

image

But is it also removed from the database? Since as long as I don´t retain a reference to the IndexCard elsewhere in my code, the object will be deleted by the garbage collection; it would be nice if the persistent memory behaved in the same way. So let´s have a look:

image

Upps, the query

OpenAccess.IQuery q = os.GetOqlQuery("select * from IndexCardExtent");


does still return the IndexCard object I removed from the collection. That means, removal from relationships does not mean data is removed completely, too. That must be done explicitly. No kind of garbage collection for persistent memory jumps in to help ;-)

Here´s the right way to do it: Get a reference to the object to delete, remove it from any relationships, then remove it from the database.

    1 lb.datamodel.IndexCard ic;

    2 ic = cs.IndexCards[0];

    3 cs.IndexCards.RemoveAt(0);

    4 os.Remove(ic);

That´s for dependent objects. But what about independent objects like the CardSet. It´s the parent of IndexCard objects which should not exist independent of a CardSet. So when I delete a CardSet all its IndexCards should be deleted too, right?

    1 OpenAccess.IQuery q = os.GetOqlQuery("select * from CardSetExtent cs where cs.name='German'");

    2 OpenAccess.IQueryResult qr = q.Execute();

    3 

    4 lb.datamodel.CardSet cs = (lb.datamodel.CardSet)qr[0];

    5 os.Remove(cs);

This code however just deletes the CardSet parent and not its children as a query on IndexCard objects shows. But since I did not plan the database schema myself there are surely no referential integrity constraints defined. So how can I set up a "cascading delete" constraint? Do I have to get my hands dirty and touch the database schema? Or can I define such a constraint in the persistent object model?

As it turns out, VOA lets me defined the IndexCard children of a CardSet as being dependent on it, i.e. they will be deleted when their parent is deleted. I just need to adorn the IList<T> field with the [OpenAccess.Depend()] attribute:

    1 [OpenAccess.Persistent]

    2 public class CardSet

    3 {

...

   13     [OpenAccess.Depend()]

   14     private IList<IndexCard> indexcards;

Great! But of course I need to be careful: If the same dependent object is referenced by more than one parent it will be deleted nevertheless. Hm...

Changing the order of objects in a list

The order of index cards in a compartment of a Learnbox is important. It mirrors my learning progress (or some priority I want to assign certain index cards). That´s why I introduced an Item object in my data model to keep track of the position of index cards in compartments. But as it seems I don´t need such a crutch. As the database schema shows, VOA keeps track of the position of list elements itself with the seq column in the join table.

But I want to quickly verify this. Here´s some code which changes the order of some index cards:

    1 OpenAccess.IQuery q = os.GetOqlQuery("select * from CardSetExtent cs where cs.name='German'");

    2 OpenAccess.IQueryResult qr = q.Execute();

    3 

    4 lb.datamodel.CardSet cs = (lb.datamodel.CardSet)qr[0];

    5 lb.datamodel.IndexCard ic;

    6 ic = cs.IndexCards[0];

    7 cs.IndexCards.RemoveAt(0);

    8 cs.IndexCards.Add(ic);

The first index card is removed and re-appended at the end of the list. And indeed the order is preserved by VOA as you can see:

image

"Haus, das" has been the first index card and now is the last one. I think I can update my data model now and make it a little bit simpler:

image

No need for an Item class anymore. IndexCard objects can be ordered in different ways in Compartments and CardSets without it. Also I removed the aggregation of IndexCards by CardSet. An IndexCard should not automatically be removed if its CardSet is deleted. If it´s used in one or more Learnboxes, why shouldn´t they retain it? Which brings me to business logic in general. I guess I need to think about which business logik rules I need and where to implement them. But more of that in my next posting.

Wednesday, June 06, 2007

Digging deeper into O/R Mapping needs a sample application

After a little pause I´m now back at exploring the world of O/R Mapping. However, when looking back at my first steps so far I´m in doubt if I want to continue in the same way. Up to now I approached O/R Mapping in general and OpenAccess in specific in a pretty generic way. I tried to get it up and running with just any simple enough code. My approach was pretty theoretical. But when I look ahead I don´t know whether I can or even should continue to try to understand how to use O/R Mapping by keeping my theroretical glasses on. It would mean I pick some concept like polymorphism or inheritance or stateless business logic and try to find out, how O/R Mapping fits it. Just the ordinary textbook approach, isn´t it? One topic after another would be covered. But a blog is not a textbook.

So my idea is, not to try to be comprehensive with regard to O/R Mapping, but close to every day practice. Less theory, more real life. Maybe then I won´t visit each and every detail of OpenAccess during my exploration of O/R Mapping, but whatever feature I stumble upon would be more relevant to me (and hopefully to you too).  Learning works best when immediately applying new knowledge to some real world scenario. That´s why I want to continue my exploration by building a real program for a sample scenario. The problem domain is not difficult to understand, but my guess is, it will provide for quite some adventures in "OpenAccess land".

The program will need to access a database and I want to experience, how to best do that using an O/R Mapper. The focus of my articles in this blog thus of course will be on those parts of the software having to do with data access or persistent objects. But other aspects of software development might also find a mentioning here or there. As stated in an earlier post, I´m very interested in learning what effect O/R Mapping has on software architecture, for example. I want to put architecture first and hope OpenAccess will fit in.

The sample scenario

As a sample scenario I´ve chosen a problem a friend of mine, Vera, approached me with recently. She got started with a psychotherapeutical training an now needs to learn a whole host of facts and definitions like "What´s the lifetime prevalance of an axiety disorder?" or "Name the first rank symptoms of schizophrenia according to Schneider" or "What does depersonalisation mean?". In order to memorize all this she wants to use a Learnbox which is a tool for space repetition learning. Here´s a picture of how such a Learnbox works:


(Source: http://sprachen.sprachsignale.de/hausa/huaarbeitsmethodik.html)

  1. You write whatever you want to learn on index cards. A question on one side, the answer on the other side.
  2. You set up a Learnbox like depicted: It´s a box divided into 5 partitions (or compartments), each able to hold an increasing number of index cards (e.g. 20, 60, 180, 540, 1620).
  3. You start learning by filling partition #1 with index cards.
  4. Then take them out one after another and see if you know the answer to the question on the index card. If you know the answer, insert the card into partition #2. If you don´t know the answer, put it back into partition #1 after the last card.
  5. This way work your way through the index cards in partition #1 until none or maybe just 2 or 3 are left.
  6. Fill up partition #1 with new index cards and go back to step 4.
  7. Repeat steps 4 to 6 until partition #2 is full, then work on the index cards in partition #2.

If you replace the partition numbers in the above description with n and (n+1), then you get the general algorithm for working with a Learnbox. Try it for yourself with vocabulary on some subject you´re interested in! It works. It´s fun.

My friend is perfectly happy with this technique - but she would like to share her index cards with fellow students. Also she would like to get rid of the necessity to carry around a physical Learnbox. Rather she would like to learn online or using a desktop program. That also would make it easier to exchange index cards with her fellow students: no rewriting needed, instead she could send them a file.

I think, that´s a great idea. Of course I could point her to a number of implementations (see section "Commercial Software" here for example) - but rather I´d like to help her myself. I think this scenario is realistic, not too difficult, not too simple, and allows for some interesting features. And of course it needs a database which I can access using OpenAccess.

A first shot at the requirements - th data model

Although I´m very excited to have found some practical/useful sample scenario for my O/R Mapping exploration and would like to start VS2005 right away, I think first some thinking is in order. What are the exact requirements? What´s the datamodel, how should the persistent classes look like?

Let´s start with the tangible stuff, the data. We are talking about Learnboxes with compartments filled with index cards. So how about an object data model like this:

I imagine there to be many LearnBox objects each containing a number of Compartment objects each containing a number of IndexCard objects. Sounds reasonable, doesn´t it? Since a LearnBox and its Compartments belong together their relationship is that of an aggregation; each Compartment, though, is just associated with its IndexCard objects.

IndexCards don´t exist on their own, though. I imagine them coming in sets: a set of index cards for learning French, another for learning psychotherapeutical terms, another one for math etc. Each LearnBox thus is an "instanciation" of such a set of index cards or is associated with a set, and the index cards in the compartments all belong to this set. This leads to a slightly more elaborate object data model:

A LearnBox knows its CardSet, but the CardSet does not know all LearnBoxes it´s associated with, I´d say. Also, when a CardSet is deleted, all its IndexCards are deleted, too.  Hm... should a user be allowed to delete a CardSet while it´s in use by a LearnBox? Should she be able to delete an IndexCard from a CardSet while it´s associated with a Compartment? I don´t know yet. I guess it depends on whether a LearnBox directly references IndexCards of a CardSet or contains copies of them of its own. The UML drawing does not reveal such detail yet. I´ve to think about this detail for a moment...

Functional requirements

The data model is just one side of a software´s medal. The other side are functional requirements. What´s the software supposed to do? What does Vera expect from the software?

  1. I think, she wants to start with managing card sets. She needs to be able to create a new card set, list all card sets she already created, delete a card set, edit a card set. The usual CRUD functionality.
  2. Editing a card set means altering its properties (e.g. its name) and its content. The content of a card set are its index cards. So she needs to be able to add, delete, alter index cards and list all index cards in the card set. Again the usual CRUD functionality.
  3. Once she has created and filled a card set she surely wants to memorize its content. So she needs to be able to "derive" a Learnbox from a card set. If she is into studying several topics she most likely wants to be able to manage several Learnboxes at the same time. So she needs to list them, delete them, and alter them.
  4. When altering a Learnbox or specifying it upon creation means specifying the number and sizes of the compartments and maybe also arranging the index cards. Maybe she wants to learn them in a certain order. Hm... that would necessitate an addition to the data model, I guess (see picture below).
  5. Finally Vera wants to just learn. She wants to pick up a Learnbox and be presented with index cards from some compartment. How index cards move through the Learnbox is controlled by some business logik behind the scenes; Vera should not be able to (easily) interfere with this.

In order to track the position of an IndexCard within a Compartment, an IndexCard is represented by an Item within the Compartment. Items of course "go down" with their Compartment.

Ok, that´s it, I guess. Reasonable requirements for a first release of my Learnbox software. A lot of CRUD functionality seems to be the right scenario for employing an O/R Mapper - and close to many real world applications. But before I start with the implementation, I need some larger picture of the code. I need some architectural framework to locate the data model in. Stay tuned for my next article - and maybe think about how you would model such an application.

Sunday, March 25, 2007

How the automatic persistence magic is woven - Part 2

I did not declare any static methods on my persistent classes (see previous posting), so someone else must have done it, so they appear in the compiled assembly. Look here, this is my persistent class as seen through Lutz Roeder´s Reflector:

    1 [Persistent]

    2 internal class Person : PersistenceCapable

    3 {

    4     // Fields

    5     public DateTime dob;

    6     public string firstname;

    7     [Depend]

    8     public Address homeAddress;

    9     public string lastname;

   10     private static readonly sbyte[] OpenAccessEnhancedFieldFlags;

   11     private static readonly string[] OpenAccessEnhancedFieldNames;

   12     private static readonly Type[] OpenAccessEnhancedFieldTypes;

   13     [NonSerialized]

   14     protected sbyte OpenAccessEnhancedFlags;

   15     private static readonly int OpenAccessEnhancedInheritedFieldCount;

   16     private static readonly Type OpenAccessEnhancedPersistenceCapableSuperclass;

   17     private static int OpenAccessEnhancedSlotCount;

   18     [NonSerialized]

   19     protected StateManager OpenAccessEnhancedStateManager;

   20     [Transient]

   21     public OnProgressDelegate progressing;

   22 

   23     // Methods

   24     static Person();

   25     private Person();

   26     public Person(string firstname, string lastname, DateTime dob, Address homeAddress);

   27     public override void OpenAccessEnhancedCopyField(int);

   28     public sealed override void OpenAccessEnhancedCopyFields(object, int[]);

   29     protected override void OpenAccessEnhancedCopyKeyFieldsFromObjectId(object);

   30     public override void OpenAccessEnhancedCopyKeyFieldsFromObjectId(PersistenceCapable.ObjectIdFieldConsumer, object);

   31     public override void OpenAccessEnhancedCopyKeyFieldsToObjectId(object);

   32     public override void OpenAccessEnhancedCopyKeyFieldsToObjectId(PersistenceCapable.ObjectIdFieldSupplier, object);

   33     public static DateTime OpenAccessEnhancedGetdob(Person);

   34     public static string OpenAccessEnhancedGetfirstname(Person);

   35     public static Address OpenAccessEnhancedGethomeAddress(Person);

   36     public static string OpenAccessEnhancedGetlastname(Person);

   37     public static int OpenAccessEnhancedGetManagedFieldCount();

   38     public sealed override object OpenAccessEnhancedGetObjectId();

   39     public sealed override PersistenceManager OpenAccessEnhancedGetPersistenceManager();

   40     public sealed override object OpenAccessEnhancedGetTransactionalObjectId();

   41     public sealed override bool OpenAccessEnhancedIsDeleted();

   42     public sealed override bool OpenAccessEnhancedIsDirty();

   43     public sealed override bool OpenAccessEnhancedIsNew();

   44     public sealed override bool OpenAccessEnhancedIsPersistent();

   45     public sealed override bool OpenAccessEnhancedIsTransactional();

   46     public sealed override void OpenAccessEnhancedMakeDirty(string);

   47     protected override object OpenAccessEnhancedMemberwiseClone();

   48     public override PersistenceCapable OpenAccessEnhancedNewInstance(StateManager);

   49     public override PersistenceCapable OpenAccessEnhancedNewInstance(StateManager, object);

   50     public override object OpenAccessEnhancedNewObjectIdInstance();

   51     public override object OpenAccessEnhancedNewObjectIdInstance(string);

   52     public sealed override void OpenAccessEnhancedPreSerialize();

   53     public override void OpenAccessEnhancedProvideField(int);

   54     public sealed override void OpenAccessEnhancedProvideFields(int[]);

   55     public override void OpenAccessEnhancedReplaceField(int);

   56     public sealed override void OpenAccessEnhancedReplaceFields(int[]);

   57     public override void OpenAccessEnhancedReplaceFlags();

   58     public override void OpenAccessEnhancedReplaceStateManager(StateManager);

   59     public static void OpenAccessEnhancedSetdob(Person, DateTime);

   60     public static void OpenAccessEnhancedSetfirstname(Person, string);

   61     public static void OpenAccessEnhancedSethomeAddress(Person, Address);

   62     public static void OpenAccessEnhancedSetlastname(Person, string);

   63 }

It´s considerably larger than my original definition which contained just a couple of public fields (lines 5..10, 21) and two ctors (25, 26). So who added the interface PersistenceCapable and the static methods and why?

 

The culprit is the so called enhancer of OpenAccess. It´s a tool called during the build process in VS2005. When you "enable a project" for OpenAccess (see the OpenAccess|Enable Project menu item in VS2005), the VOA wizard adds the following section to the project file:

    1 <ProjectExtensions>

    2     <VisualStudio>

    3       <UserProperties OpenAccess_EnhancementOutputLevel="1"

    4                       OpenAccess_UpdateDatabase="True"

    5                       OpenAccess_Enhancing="True"

    6                       OpenAccess_ConnectionId="DatabaseConnection1"

    7                       OpenAccess_ConfigFile="App.config" />

    8     </VisualStudio>

    9 </ProjectExtensions>

These properties are interpreted by OpenAccess after any post-build events the project might contain and if the OpenAccess_Enhancing property is set to true, then the VOA VS2005 integration calls its VEnhance.exe application located in the sdk/dotnet20 folder of the VOA installation directory.

For my sample project the command line for the enhancer would be

venhance.exe -assembly:simplecrud.dll

This would take the assembly created by the C# compiler and add code to make object persistence as transparent as demonstrated. For that, the enhancer needs to add code at two different locations: in any class marked as OpenAccess.Persistent and whereever fields of instances of those classes are accessed.

Persistent classes are enhanced by adding a number of static fields and methods as well as implementing the OpenAccess interface PersistenceCapable. The overall purpose of this is to make change tracking and lazy loading possible without falling back on reflection which would be slow.

Detecting changes on a persistent object´s data is as necessary for an O/R Mapper as it is for you when using ADO.NET. Without knowing if an object is new (has not been persisted yet) or just changed and on top of that, which fields have been changed since it was loaded, is essential for generating the right SQL command. An O/R Mapper like a DataAdapter needs to either issue an INSERT for a new object or an UPDATE for a modified object (or a DELETE for any deleted objects). Plus an UPDATE statement should not always overwrite all column values, but just the ones which have been modified.

In order to determine what to do, a DataAdapter checks the row state of each DataRow. Each DataRow does its own change tracking. But how to do this for regular objects? They usually don´t carry and meta information on their data. Basically there are two approaches:

  • either the O/R Mapper manages a copy of each persistent object´s data, or....
  • each object keeps two copies of its data, one for its current state and one for its state when it was loaded.

And there are two approaches to comparing the original state of an object with its current state:

  • The O/R Mapper can access an object´s state in a somewhat brute force way using reflection, or...
  • the O/R Mapper interacts with an object in a predefined way, e.g. through the methods of a common base class or an interface.

Using reflection seems to be the most convenient way to do change tracking - from an application developer´s point of view. O/R Mappers working like this usually don´t require you to do anything special in order to make the objects or a class persistent. Ideally you don´t even need to mark them with an attribute. Just take any object and throw it at the O/R Mapper and it tries to persist it.

This sounds great, but comes has a major disadvantages: It´s slow. It´s slow, because reflection is slow compared to direct method calls. And it´s slow because changes to objects can only be recognized by comparing their complete state to an internal state copy kept by the O/R Mapper. The O/R Mapper cannot ask such a persistent object for which parts of its state have changed.

O/R Mappers who value performance over flexibility thus do not rely on reflection, but require extra code on persistent classes. The basic form of this code needs to be known to them for early binding and strongly typed object access, thus a persistent class either needs to be derived from a base class provided by the O/R Mapper or implement an interface known to or provided by the O/R Mapper. In any case, additional code beyond whatever business functionality a persistent class is supposed to implement needs to be written.

Now, this code could be fully generated, e.g. from a database schema. For example LLBLGen Pro is offering this way to object persistence. The Wilson O/R Mapper on the other hand added an interface to be implemented on top of its changing tracking via reflection.  You can then implement it yourself or can again use some kind of code generation.

NDO and OpenAccess, though, also do not use reflection to access object state. They don´t require you to add special persistence code, either. They use code generation - but not source code generation. They generate binary code and insert it transparently into the assembly compiled from the original source code thus enhancing it. You (usually) never (need to) see the code necessary to make O/R Mapping perform well.

Bottom line: If you want high performance O/R Mapping, additional code is necessary to couple persistent objects to the O/R Mapper. This code can either be written by hand or be generated. It can be generated as source code or as binary code. OpenAccess does the latter. Is that good or bad? Well, it depends. It´s completely transparent to you and it´s safe. But it´s as it is; you cannot (and are not supposed to) interfere with this code. So if you really want to get your hands dirty tweaking persistence code, OpenAccess is not for you. There are no templates to adapt to any special needs of yours. Of course you can optimize OpenAccess´ workings with regard to object load behaviour optimization, but not with regard to the code it generates.

The question now is, where does this transparent persistence enabling code need to be added? Firstly to the class (see listing above), since it´s supposed to avoid the need for reflection to check for object modifications. But secondly at all locations where persistent objects´ fields are accessed (see the last listing of part 1). That´s necessary to track changes and see if an object needs to be loaded.

The necessity of this should be obvious. But what are the implications? I´d say the most important implication of code enhancement is that you want to isolate it. You should see to that as few assemblies as possible require it. That means, you want to put as many persistent classes into as few VS2005 projects as possible. Persistent objects mostly belong to the domain model of an application. That alone should be sufficient to encapsulate them in a component/assembly of their own. However in addition you need to hide all (persistent) field access behind methods, lest the enhancer needs not wrap field access in other assemblies as well.

Here´s how I split up my previous sample solution:

The persistent classes as well as the database are put into a VS2005 project of their own. Only this one needs to be enhanced, since all field access is limited to code in the persistent classes defined therein. I just made the public fields private and made them accessible through property methods. That´s it. The database I put together with the persistent classes, since its schema is defined by them.

However, also the referencing project - SimpleCRUD.Application in the above image - needs to reference the OpenAccess library, since it still contains code that uses it. It´s there that I still open the IObjectScope to interact with the database. In a later article I´ll tackle the question of how to isolate this functionality in a dedicated data access component.

Please note: Of course the App.Config created by the OpenAccess wizard during enabling of the persistent classes  project also needs to be included in the project where the IObjectScope is set up. The enhancer uses the App.Config of the project to enhance for finding the database whose schema to check against the persistent class definitions. That´s at compile time. But during runtime OpenAccess also needs to know where the database is located. So there needs to be another App.Config.

This of course means, the database managed by the enhancer needs to be made available during runtime. For global databases this is not a problem. Both App.Config files can reference the same database. But for local databases as above you need to think about how to move it from the compiletime location to the execution directory. I did this by setting the output directory of the SimpleCRUD.PersistentObjects project to the same folder as the referencing project´s output directory.

Wrap-up: OpenAccess is able to transparently load/store persistent objects. This is possible without the need for you to write/generate any code by automatically adding code to the persistent classes and any access to their fields in a post-build step using an enhancer. To limit the intrusion of the enhancer into your code, a specific VOA best practice thus is to bundle up your persistent classes in as few assemblies as possible. Also this should be motivation enough for you to follow the general best practice of hiding you fields and funnelling all access to them through methods/properties.

Please note: If you happen to access your assemblies in a VS2005 post-build event they are still not enhanced! Call the enhancer manually if, for example, you want to copy them to some repository. In that case switch off the enhancer flag in the MSBuild file using the OpenAccess wizard.