A site devoted to discussing techniques that promote quality and ethical practices in software development.

Sunday, March 11, 2007

Bridging the two cultures - unmanaged COM and .Net RCW

Microsoft and many literatures on COM Interop programming always seem to portrait an almost seamless integration between unmanaged COM and .Net worlds. In most situations, they are right. However, bridging these two cultures need extra attention than when you are using just one.

I am going to describe two very common scenarios that will cause the unwary a lot of problems.

Using a COM object model in .Net can result in memory leak like behaviour

Many unmanaged COM solutions utilise a staircase like object model to publish the behaviour. You may have a top level object called Application and within it one has Document objects and within Document object you may have Text, etc.

It is very common to find program accessing these sub objects with a dotted expression like this:

application.Document["blah"].Text.Line[2]

When you use them in say VB6 or C++/ATL clients, these environments obey the COM object life-cycle. That is as soon as the ref count for a COM object has dropped to 0, the framework deletes it. The memory/resources are reclaimed deterministically.

Now if you program a COM server with an object model like this in .Net using the RCW that the framework provides you, you will unwittingly be delaying the release of these sub-objects. If your .Net program is a long living kind of program, e.g. 24X7, you will observe behaviour akin to memory leak on the COM Server.

If you examine the .Net equivalent dotted expression in ILDasm, you will be seeing something like this (pseudo-code):

application.Document["blah"] -> temp1
temp1.Text -> temp2
temp2.Line[2] -> temp3

temp1, temp2 and temp3 signify some internally generated intermediate .Net objects of relevant types.

Since they are just RCW wrapping up the corresponding COM objects, we are confronting a situation of bridging the two cultures. On the RCW side the object's life-cycles are totally controlled by the garbage collector while on the COM side, the COM object is controlled by the reference count.

Once these temp1, temp2, and temp3 are of no use they become candidates for garbage collection but that action may not come for a while. In the mean time because they have not been collected, they are holding reference counts on the corresponding COM object preventing them to die releasing the resources they are holding on. These cause the COM server's memory usage to increase as compared with the behaviour when using an unmanaged client.

If you increase memory demand on the .Net side to a point that triggers a garbage collection cycle, you will see an abrupt fall in memory on the COM server and this corresponds to the sudden surge of a batch of COM objects being released at the same time.

So it is not a leak in a strictest sense but it does have that kind of behaviour. If you are confronted with this kind of situation, you need to be confident about it and trust the garbage collector. It will do the job.

If those resources that are being held unnecessary long are not just unmanaged memory but other more precious types, such as database connections or sockets, then you will have a problem because the starvation of resource could be transmitted from unmanaged world to the managed world.

To avoid this kind of situation, you have to restructure your client code to avoid using dotted expression to take control of execution to avoid the generation of those internally generated intermediate objects. In this way, you can control the life cycles of those temporary objects. Often this can result in faster execution performance as well as less demand on the memory.

I have encountered this kind of behaviour in a project I recently been involved with and it is hard to convince the other party that there is nothing wrong.

Passing unmanaged COM object to a managed COM object - side effects

I have encountered a situation like this and it has taken me a week to identify the cause of this problem. Before I explain the architecture to you, the impact on this operation is to prevent an unmanaged COM local server from terminating.

To understand this problem you need to be aware of the architecture that causes this problem.

I have an ATL/C++ local server which hosts a com component let's call this AtlComServer with an interface IAtlComServer. This component embeds a .Net COM component and this component has an interface called IMyDotNetComServer.

When the client calls IAtlComServer.CreateInstance(), it will create an instance of the .Net COM component and call its Setup(), passing to it an object that implements ICallback.

The C++ code in IAtlComServer.CreateInstance() looks something like this (with all error handling removed for brevity):

CComPtr spCallback = CreateInternalObject();
CComVariant v ( spCallback );
hr = spDotNetCom->Setup( ......, v );

spDotNetCom is a variable of type CComPtr<IMyDotNetComServer> previously created.

For the moment consider MyDotNetComServer.Setup() does nothing, including holding onto the ICallback, how can such harmless piece of code can prevent the ATL server from shutting down?

Even if the client take extra care to dispose the AtlComServer with Marshal.ReleaseComObject(), this action only deletes AtlComServer object promptly, but the server still stubbornly refuses to terminate. Why?

To understand this one needs to understand the behaviour of a COM local server. Whenever a COM object is created by this server, that object increments the server's lock count. This is to prevent another com object from initiating a server termination process while there are still living COM objects hosted on this process.

So when an internal (ole non-createable) object is being created by CreateInternalObject(), that object places a server lock on the AtlComServer process. When that is passed across to IMyDotNetComServer.Setup() via the CCW of the .Net Com component, .Net creates a RCW to represent the source com object by incrementing a ref count on the unmanaged COM component.

Even when IMyDotNetComServer.Setup() does nothing with this COM object, like saving it in a member field, that call causes a .Net RCW object to be created and its life-cycle is then managed by the garbage collector. So when the function returns, the RCW object becomes a candidate for garbage collection and that may not come for a long while.

In the mean time, the unmanaged COM object is being held up and since its ref count is not zero, the server lock count is not zero. Because the server lock count is not zero, the local server does not initiate a server shutdown process.

You can follow this kind of action in more detail if you enable the ATL to trace interfaces access by defining the macro _ATL_DEBUG_INTERFACES.

To correct this problem, you have to take action to release the COM object held by the RCW promptly as follows:

public class MyDotNetComServer : IMyDotNetComServer {
public void Setup( ....., object callback ) {
Marshal.ReleaseComObject( callback );
}
}

With this, you will see the server terminates in accordance to the server termination policy.

These two scenarios highlight the danger of programming and using COM in two different cultures without fully aware of the underlying issues.

In the first one, the RCW is holding up the COM object. In the second scenario, the RCW is holding up the COM object which resulted in preventing the server to terminate. The second scenario starts from a CCW and progressing to the generation of an RCW, which ends up holding up the COM server.

No comments:

Blog Archive