Home
Archives
About us...
Advertising
Contacts
Site Map
 

ruby in steel

 

VB .NET COMMUNICATIONS #2

Was the poet, Burns, a secret programmer? This month Dermot Hogan gets nostalgic about the VMS operating system as he contemplates a ‘wee tim’rous beastie’…

Requirements:
VB .NET (.NET 1.1)

 

Download The Source Code:
vb5src.zip

 

See also part one of this series and the conclusion, part three

“The best laid schemes o’ mice an’ men gang aft agley”. Or if you’re not familiar with the Scots dialect of Robert Burns: the best laid schemes of mice and men have a distinct tendency to go a bit pear shaped. And so it often goes with programming. What seems at first sight to be a simple operation can turn out to be, shall we say, a tad more complicated.

This month Dermot provides a handy guide to some of the more obscure corners of .NET, Windows, VB and VMS with occasional excursions into Lowland Scots, English slang, the poetry of Burns and the rules of snooker. We've provided a few links to assist the terminally confused...

Last month, I confidently started out to ‘roll-my-own’ .NET communication control. Initially, it all went pretty well. In .NET, it’s easy to start a thread and perform whatever I/O operations are required for the communications port (COM1, etc.) on that thread, leaving the main program thread free to handle the keyboard and display. It’s also easy to combine everything in a user control and use properties to communicate with the control. In last month’s column, I created a ‘Comm’ control which allowed you to open a communication port, set and clear the DTR signal line and be notified in the main program thread when the input signal line corresponding to DSR changed.

So far, so good. It looked as though it would be all downhill from there. My aim was to end up with a workable .NET Comm control, equivalent to (or better than) Microsoft’s somewhat limited MSComm control from earlier versions of Visual Basic. However, there was a ‘gotcha’ lurking!

The problem comes from the nature of the WaitCommEvent API. This, in the simple form used last month, does just what it says – it waits for a control signal to change on the COM port. But if the thread that handles the COM events issues the WaitCommEvent call, and then the main thread attempts to set DTR, the program hangs. It doesn’t crash, nor gobble up all the CPU in a loop or indeed do anything uncivilized. It simply does nothing!

Behind The Eight Ball?

After some head scratching, the light dawned: the WaitCommEvent API is a ‘blocking’ I/O request. If it is in progress, any other I/O request must wait ‘behind’ the current I/O operation, until the this I/O operation has finished. So the request to set DTR is queued behind the WaitCommEvent I/O request and never gets to run until WaitCommEvent completes – even if it’s on a different thread. Worse still, the program goes into a ‘wait’ state from which you can’t wake it. All you can do is kill off the program or satisfy WaitCommEvent by asserting DSR on the port.

In other words (not from Rabbie Burns this time), we’re snookered.

Now it turns out that this behaviour is quite by design. And the reason it’s like this is that Windows XP is based on an older operating system called VMS from Digital Equipment Corporation. VMS itself was based on a much earlier operating system called RSX. And the primary I/O interface in RSX was a system call, named QIO (that is, Queue I/O) which exhibited exactly this blocking behaviour – that’s why it was called QIO since I/O requests were queued behind one another for serial execution: they ran one at a time. There’s another interesting piece of information: all three operating systems were designed by a certain Dave Cutler, who was lured away from a declining Digital Equipment by Bill Gates, no doubt in return for many squillions of Microsoft stock options.

So maybe all that’s required is a good working knowledge of VMS internals! And sure enough, after a bit of deep recall (and some nostalgia for my misspent youth, hacking away at VMS), the mechanisms required to solve this particular problem came to the surface. It was quicker than searching through the Windows documentation, anyway.

The first thing to do is to prime the I/O sub-system to expect asynchronous I/O operations – that is, operations which are not completed when the I/O API returns. In Windows, it’s called ‘overlapping’ and you need to set a flag when you open the COM port (or open a file):

CreateFile(portname, GENERIC_READ Or GENERIC_WRITE, 0, 0, OPEN_EXISTING, FILE_FLAG_OVERLAPPED, 0)

This tells the operating system that certain APIs have the ability to return to the calling thread before the I/O operation they were requesting has completed. But the I/O is still active deep inside the operating system, and all that has happened is that the initial I/O request has completed. So now another I/O request can be queued to the device by another API call and we’re no longer blocked.

Next, we need an ‘OVERLAPPED’ structure, which is used to communicate with the still active I/O operation. Indeed, the existence of an OVERLAPPED structure in the arguments to an API usually indicates that the I/O is to be asynchronous. Here’s the structure:

Structure OVERLAPPED
   Private Internal As Integer
   Private InternalHigh As Integer
   Private offset As Integer
   Private OffsetHigh As Integer
   Public hEvent As Integer
End Structure

The only thing of interest to us in this is the hEvent member. This must be set to the handle of an ‘event’. Events are just one type of signalling mechanism used by the operating system (see Signals and Mutexes, below). One way to think of an event is as a flag that pops up when something occurs. Other things (threads) are looking for the flag and wait until they see it. When they do, off they go again running whatever code is in the thread.

Events are used to synchronize various activities between threads and inside the operating system itself. But they are also used to communicate between the operating system core (the ‘kernel’) and the threads that run in your program. It might seem a bit strange at first sight that there is such a problem. But the kernel runs according to a very strict set of rules. The ability to poke around in a user level thread is not among them.

Events can be either automatically re-signalled – reset themselves, so to speak – or they can manually reset. I prefer the latter, since it’s clear then as to what is going on. Also, the event can either be set to signalled (flag raised) or cleared (flag down) when the event is created:

o.hEvent = CreateEvent( Nothing, 1, 0, portnumber.ToString)

This API creates an event with no security (first argument), manual reset (second), initially cleared (third) and with the port number as the event’s name (last argument).


One of the problems with asynchronous I/O is the number of APIs you need to declare to do anything.
And they’re all several lines long, as well.

Hanging Around

Now we’re ready for the modified event-handling thread. The WaitCommEvent API actually takes three parameters. The first two – the port’s handle and the event mask used to indicate what signal caused the communication event – are the same as we used last month. But the third, the lpOverlapped parameter must be set to an Overlapped structure. If this is set, the API will not wait, but will return immediately with an error code, ERROR_IO_PENDING. However, this isn’t really an error - it’s what we expect in an overlapped I/O operation - but we’ve got to check anyway:

If r <> 1 Then
   r = GetLastError()
   If r <> ERROR_IO_PENDING Then
      Throw New System.Exception("WaitCommEvent failed")
   End If
End If

The real wait must be done next, using the WaitForSingleObject API:

r = WaitForSingleObject(o.hEvent, INFINITE)

This waits (yes, it really does as advertised and wait this time) until the event specified by the event handle is raised. The event is the one in the Overlapped structure which was given to the earlier WaitForCommEvent API and this is the event set by the operating system when the I/O completes. There’s also a timeout parameter which will return after a given period of time has elapsed. This isn’t required, so set it to an INFINITE value.

Now you might expect that the event mask originally specified in the WaitForCommEvent would now be set. After all, the I/O has completed and the result should be there.

Well, actually, no. There are two problems. First, you have to go and get the I/O results using (yet another API), GetOverlappedResult. The reason for this is that the operating system has lost the original I/O request – it returned with a status of ERROR_IO_PENDING. The way to get the result is to issue another I/O request to allow the operating system to communicate back to the thread that issued the first request. This may sound peculiar, but that’s the way Windows XP works. By and large, the operating system doesn’t come back and say ‘I’m done’; you have to ask it if it’s done, using the GetOverLappedResult API.

Secondly, the I/O might not have completed – in spite of the event being signalled. This can actually happen, though whether it’s a feature or a bug is anyone’s guess. In any case, it’s good coding practice to check – anything can trigger a signal, not just the WaitForCommEvent completion code in the operating system kernel.

So first, the GetOverlappedResult API:

r = GetOverlappedResult(h, o, Nothing, 0)

The first parameter is the communication port’s handle, the second is the Overlapped structure used previously, the third is used to return the number of bytes transferred (not relevant here) and the last indicates if we want to wait – which we definitely do not.

Now we have to check if the signal really did indicate that the I/O was completed:

If r <> 1 Then
   r = GetLastError()
   If r = ERROR_IO_INCOMPLETE Then
      evtMask = -1
   Else
     Throw New System.Exception("GetOverlappedResult failed")
   End If
End If

Here, the code sets evtMask to –1 to indicate that the signal was spurious and should be ignored.

And finally, we need to reset the signal:

ResetEvent(o.hEvent)

When you run the program, you need to initialize the communications ports by clicking the ‘Initialize’ button. Connect the two communication ports together using a cross-over cable – either make one or buy one – and you should then be able to set and clear DTR/RTS on both ports. If you do this you’ll get simple diagnostic pop-up messages indicating that the state of the corresponding DSR/CTS lines has changed. Finally, you can use the ‘Check’ button to determine the DSR/CTR states of the ports.


After some hard grind, the program now works correctly – the WaitForCommEvent API no longer ‘blocks’ the other APIs.


GOING FURTHER...

A .NET bug?

How to believe six impossible things before breakfast...

While I was developing the code for this article, I came across a really peculiar bug. The symptoms were as follows. The ports were initialised as usual and then a test call was made to check on the state of the DSR or CTS line. Suddenly, an error occurred: ‘object reference not set to an instance of an object’. Not only was the error message incomprehensible, the call stack was reset to about two levels deep – the sort of thing that regularly happens in C++ programs.

I tracked the bug down (eventually) by displaying the Disassembly window and single stepping though the lines of MSIL (Intermediate Language) code displayed. Now you don’t have to be an expert in MSIL to follow what’s going on, especially if there’s something dramatic about to happen. At the critical point just before the error occurred, everything looked normal. Just afterwards the call stack indicated that the program was somewhere silly.


Even in .NET with all its type checking and validation, it’s possible to get errors due to data corruption.
They can be difficult to track down too.

That turned out to be the key. The error happened at the call to GetCommModemStatus where the .NET framework seemed to be doing the impossible. The final clue was the error message ‘Fatal execution engine error’ which turned up about half of the time. According to the .NET documentation, these "should never occur"!

So it looked as if the .NET Framework was corrupt – maybe a Microsoft bug? Now the first thing to do if you suspect Microsoft of writing buggy code (it does occur) is remember the Biblical proverb about beams, motes and eyes. It is much, much more likely that you’ve goofed than Microsoft – whatever you think of them.

And so it turned out in this case. I had entered the API definition for GetCommModemStatus incorrectly, inadvertently aliasing it to GetCommState. This is an entirely different beast which did indeed corrupt the .NET engine when given a pointer to an integer rather than a larger Device Control Block which it expected.

The moral of the story: be very careful about how you define API calls.


Here’s a ‘fatal execution engine error’ – a sad end for a program.
According to Microsoft, this should never occur. Great. Unfortunately it just did.


Signals and Mutexes

Any operating system worth its name has to offer some sort of synchronisation between competing processes or threads. In Windows, there are two basic types: event and mutex.

An event just has two states: it’s either ‘signalled’ or the opposite, ‘non-signalled’. Any thread can set an event to signalled, as can the operating system when, for example, an I/O completes. You can force a thread to wait for an event to be signalled and, further, any number of threads can wait (or ‘block’) for a single event. All the waiting threads will be released (or unblocked) when the event is signalled.

The other basic synchronization primitive, the ‘mutex’ works rather differently. A mutex stands for ‘mutual exclusion’ and has the property that one and only one thread can own a mutex at any given time. This is useful where access to a shared resource (say memory) must be controlled. In fact, you can use an event to achieve the same effect. But a mutex has one further property which differentiates it from an event: threads waiting for ownership of the mutex are queued in a first-in-first-out basis. When the mutex is released the first thread in the mutex queue, regardless of priority, will be executed.


An Event



A Mutex
Events and mutexes are used to synchronize access to a given resource. However, mutexes form an orderly queue.


Pinned down

VB .NET doesn’t only let you handle asynchronous events in a sensible manner, you can also handle the necessary memory buffers that hold the received data with reasonable efficiency. This was always a bit tricky in earlier versions of Visual Basic, requiring several calls to undocumented functions like StrPtr and ObjPtr. VB .NET cleans all this up defining clearly the interface between ‘managed’ memory – memory that’s controlled by the Common Language Runtime and ‘unmanaged’ memory required by the fundamental Windows API.

The key to handling memory that’s used by the API is to ‘pin’ the memory so that is can’t be moved by the .NET Garbage Collector. This is essential for API calls that return values to memory locations. Fortunately, .NET will automatically pin most variables for you when you use an API. So in the API call:

r = GetCommModemStatus(h, modem_status)

the variable, h, is passed by value as so doesn’t need pinning while modem_status is passed by reference and so must be pinned. This is done automatically so you don’t have to do anything,

But where you have an API that uses overlapped I/O you may have to pin any overlapped structures yourself. This is because the overlapped structure may be used by the operating system after the API has returned but before the actual I/O has completed. Allowing the garbage collector to move the object under these circumstances will be fatal – it’s better to be safe than sorry.


Before garbage collection (on the left) and after garbage collection (on the right).

Memory must not be garbage-collected during a call to an API.
Allowing garbage-collection can lead to disaster.

Next Month...

Now that was seriously hard work. And all because WaitForCommEvent doesn’t do quite what you might expect, though it does work as documented. But next month, we should be able to move to reading the odd character from the port – and some can be very odd, as we’ll see – and writing data as well.

 

September 2005

 


Home | Archives | Contacts

Copyright © 2006 Dark Neon Ltd. :: not to be reproduced without permission