Sunday, March 25, 2007

How the automatic persistence magic is woven - Part 2

I did not declare any static methods on my persistent classes (see previous posting), so someone else must have done it, so they appear in the compiled assembly. Look here, this is my persistent class as seen through Lutz Roeder´s Reflector:

    1 [Persistent]

    2 internal class Person : PersistenceCapable

    3 {

    4     // Fields

    5     public DateTime dob;

    6     public string firstname;

    7     [Depend]

    8     public Address homeAddress;

    9     public string lastname;

   10     private static readonly sbyte[] OpenAccessEnhancedFieldFlags;

   11     private static readonly string[] OpenAccessEnhancedFieldNames;

   12     private static readonly Type[] OpenAccessEnhancedFieldTypes;

   13     [NonSerialized]

   14     protected sbyte OpenAccessEnhancedFlags;

   15     private static readonly int OpenAccessEnhancedInheritedFieldCount;

   16     private static readonly Type OpenAccessEnhancedPersistenceCapableSuperclass;

   17     private static int OpenAccessEnhancedSlotCount;

   18     [NonSerialized]

   19     protected StateManager OpenAccessEnhancedStateManager;

   20     [Transient]

   21     public OnProgressDelegate progressing;

   22 

   23     // Methods

   24     static Person();

   25     private Person();

   26     public Person(string firstname, string lastname, DateTime dob, Address homeAddress);

   27     public override void OpenAccessEnhancedCopyField(int);

   28     public sealed override void OpenAccessEnhancedCopyFields(object, int[]);

   29     protected override void OpenAccessEnhancedCopyKeyFieldsFromObjectId(object);

   30     public override void OpenAccessEnhancedCopyKeyFieldsFromObjectId(PersistenceCapable.ObjectIdFieldConsumer, object);

   31     public override void OpenAccessEnhancedCopyKeyFieldsToObjectId(object);

   32     public override void OpenAccessEnhancedCopyKeyFieldsToObjectId(PersistenceCapable.ObjectIdFieldSupplier, object);

   33     public static DateTime OpenAccessEnhancedGetdob(Person);

   34     public static string OpenAccessEnhancedGetfirstname(Person);

   35     public static Address OpenAccessEnhancedGethomeAddress(Person);

   36     public static string OpenAccessEnhancedGetlastname(Person);

   37     public static int OpenAccessEnhancedGetManagedFieldCount();

   38     public sealed override object OpenAccessEnhancedGetObjectId();

   39     public sealed override PersistenceManager OpenAccessEnhancedGetPersistenceManager();

   40     public sealed override object OpenAccessEnhancedGetTransactionalObjectId();

   41     public sealed override bool OpenAccessEnhancedIsDeleted();

   42     public sealed override bool OpenAccessEnhancedIsDirty();

   43     public sealed override bool OpenAccessEnhancedIsNew();

   44     public sealed override bool OpenAccessEnhancedIsPersistent();

   45     public sealed override bool OpenAccessEnhancedIsTransactional();

   46     public sealed override void OpenAccessEnhancedMakeDirty(string);

   47     protected override object OpenAccessEnhancedMemberwiseClone();

   48     public override PersistenceCapable OpenAccessEnhancedNewInstance(StateManager);

   49     public override PersistenceCapable OpenAccessEnhancedNewInstance(StateManager, object);

   50     public override object OpenAccessEnhancedNewObjectIdInstance();

   51     public override object OpenAccessEnhancedNewObjectIdInstance(string);

   52     public sealed override void OpenAccessEnhancedPreSerialize();

   53     public override void OpenAccessEnhancedProvideField(int);

   54     public sealed override void OpenAccessEnhancedProvideFields(int[]);

   55     public override void OpenAccessEnhancedReplaceField(int);

   56     public sealed override void OpenAccessEnhancedReplaceFields(int[]);

   57     public override void OpenAccessEnhancedReplaceFlags();

   58     public override void OpenAccessEnhancedReplaceStateManager(StateManager);

   59     public static void OpenAccessEnhancedSetdob(Person, DateTime);

   60     public static void OpenAccessEnhancedSetfirstname(Person, string);

   61     public static void OpenAccessEnhancedSethomeAddress(Person, Address);

   62     public static void OpenAccessEnhancedSetlastname(Person, string);

   63 }

It´s considerably larger than my original definition which contained just a couple of public fields (lines 5..10, 21) and two ctors (25, 26). So who added the interface PersistenceCapable and the static methods and why?

 

The culprit is the so called enhancer of OpenAccess. It´s a tool called during the build process in VS2005. When you "enable a project" for OpenAccess (see the OpenAccess|Enable Project menu item in VS2005), the VOA wizard adds the following section to the project file:

    1 <ProjectExtensions>

    2     <VisualStudio>

    3       <UserProperties OpenAccess_EnhancementOutputLevel="1"

    4                       OpenAccess_UpdateDatabase="True"

    5                       OpenAccess_Enhancing="True"

    6                       OpenAccess_ConnectionId="DatabaseConnection1"

    7                       OpenAccess_ConfigFile="App.config" />

    8     </VisualStudio>

    9 </ProjectExtensions>

These properties are interpreted by OpenAccess after any post-build events the project might contain and if the OpenAccess_Enhancing property is set to true, then the VOA VS2005 integration calls its VEnhance.exe application located in the sdk/dotnet20 folder of the VOA installation directory.

For my sample project the command line for the enhancer would be

venhance.exe -assembly:simplecrud.dll

This would take the assembly created by the C# compiler and add code to make object persistence as transparent as demonstrated. For that, the enhancer needs to add code at two different locations: in any class marked as OpenAccess.Persistent and whereever fields of instances of those classes are accessed.

Persistent classes are enhanced by adding a number of static fields and methods as well as implementing the OpenAccess interface PersistenceCapable. The overall purpose of this is to make change tracking and lazy loading possible without falling back on reflection which would be slow.

Detecting changes on a persistent object´s data is as necessary for an O/R Mapper as it is for you when using ADO.NET. Without knowing if an object is new (has not been persisted yet) or just changed and on top of that, which fields have been changed since it was loaded, is essential for generating the right SQL command. An O/R Mapper like a DataAdapter needs to either issue an INSERT for a new object or an UPDATE for a modified object (or a DELETE for any deleted objects). Plus an UPDATE statement should not always overwrite all column values, but just the ones which have been modified.

In order to determine what to do, a DataAdapter checks the row state of each DataRow. Each DataRow does its own change tracking. But how to do this for regular objects? They usually don´t carry and meta information on their data. Basically there are two approaches:

  • either the O/R Mapper manages a copy of each persistent object´s data, or....
  • each object keeps two copies of its data, one for its current state and one for its state when it was loaded.

And there are two approaches to comparing the original state of an object with its current state:

  • The O/R Mapper can access an object´s state in a somewhat brute force way using reflection, or...
  • the O/R Mapper interacts with an object in a predefined way, e.g. through the methods of a common base class or an interface.

Using reflection seems to be the most convenient way to do change tracking - from an application developer´s point of view. O/R Mappers working like this usually don´t require you to do anything special in order to make the objects or a class persistent. Ideally you don´t even need to mark them with an attribute. Just take any object and throw it at the O/R Mapper and it tries to persist it.

This sounds great, but comes has a major disadvantages: It´s slow. It´s slow, because reflection is slow compared to direct method calls. And it´s slow because changes to objects can only be recognized by comparing their complete state to an internal state copy kept by the O/R Mapper. The O/R Mapper cannot ask such a persistent object for which parts of its state have changed.

O/R Mappers who value performance over flexibility thus do not rely on reflection, but require extra code on persistent classes. The basic form of this code needs to be known to them for early binding and strongly typed object access, thus a persistent class either needs to be derived from a base class provided by the O/R Mapper or implement an interface known to or provided by the O/R Mapper. In any case, additional code beyond whatever business functionality a persistent class is supposed to implement needs to be written.

Now, this code could be fully generated, e.g. from a database schema. For example LLBLGen Pro is offering this way to object persistence. The Wilson O/R Mapper on the other hand added an interface to be implemented on top of its changing tracking via reflection.  You can then implement it yourself or can again use some kind of code generation.

NDO and OpenAccess, though, also do not use reflection to access object state. They don´t require you to add special persistence code, either. They use code generation - but not source code generation. They generate binary code and insert it transparently into the assembly compiled from the original source code thus enhancing it. You (usually) never (need to) see the code necessary to make O/R Mapping perform well.

Bottom line: If you want high performance O/R Mapping, additional code is necessary to couple persistent objects to the O/R Mapper. This code can either be written by hand or be generated. It can be generated as source code or as binary code. OpenAccess does the latter. Is that good or bad? Well, it depends. It´s completely transparent to you and it´s safe. But it´s as it is; you cannot (and are not supposed to) interfere with this code. So if you really want to get your hands dirty tweaking persistence code, OpenAccess is not for you. There are no templates to adapt to any special needs of yours. Of course you can optimize OpenAccess´ workings with regard to object load behaviour optimization, but not with regard to the code it generates.

The question now is, where does this transparent persistence enabling code need to be added? Firstly to the class (see listing above), since it´s supposed to avoid the need for reflection to check for object modifications. But secondly at all locations where persistent objects´ fields are accessed (see the last listing of part 1). That´s necessary to track changes and see if an object needs to be loaded.

The necessity of this should be obvious. But what are the implications? I´d say the most important implication of code enhancement is that you want to isolate it. You should see to that as few assemblies as possible require it. That means, you want to put as many persistent classes into as few VS2005 projects as possible. Persistent objects mostly belong to the domain model of an application. That alone should be sufficient to encapsulate them in a component/assembly of their own. However in addition you need to hide all (persistent) field access behind methods, lest the enhancer needs not wrap field access in other assemblies as well.

Here´s how I split up my previous sample solution:

The persistent classes as well as the database are put into a VS2005 project of their own. Only this one needs to be enhanced, since all field access is limited to code in the persistent classes defined therein. I just made the public fields private and made them accessible through property methods. That´s it. The database I put together with the persistent classes, since its schema is defined by them.

However, also the referencing project - SimpleCRUD.Application in the above image - needs to reference the OpenAccess library, since it still contains code that uses it. It´s there that I still open the IObjectScope to interact with the database. In a later article I´ll tackle the question of how to isolate this functionality in a dedicated data access component.

Please note: Of course the App.Config created by the OpenAccess wizard during enabling of the persistent classes  project also needs to be included in the project where the IObjectScope is set up. The enhancer uses the App.Config of the project to enhance for finding the database whose schema to check against the persistent class definitions. That´s at compile time. But during runtime OpenAccess also needs to know where the database is located. So there needs to be another App.Config.

This of course means, the database managed by the enhancer needs to be made available during runtime. For global databases this is not a problem. Both App.Config files can reference the same database. But for local databases as above you need to think about how to move it from the compiletime location to the execution directory. I did this by setting the output directory of the SimpleCRUD.PersistentObjects project to the same folder as the referencing project´s output directory.

Wrap-up: OpenAccess is able to transparently load/store persistent objects. This is possible without the need for you to write/generate any code by automatically adding code to the persistent classes and any access to their fields in a post-build step using an enhancer. To limit the intrusion of the enhancer into your code, a specific VOA best practice thus is to bundle up your persistent classes in as few assemblies as possible. Also this should be motivation enough for you to follow the general best practice of hiding you fields and funnelling all access to them through methods/properties.

Please note: If you happen to access your assemblies in a VS2005 post-build event they are still not enhanced! Call the enhancer manually if, for example, you want to copy them to some repository. In that case switch off the enhancer flag in the MSBuild file using the OpenAccess wizard.

No comments: