A .NET Developers Guide to C++ (part III)

We’re near­ing the end!

Part I focused on the very fun­da­men­tals of C and C++, mak­ing sure that you under­stand the build sys­tem and the very basics of the syntax.

Part II expanded on this to teach you all the C++ you’ll need to do basic work in the lan­guage, includ­ing a few use­ful parts of the stan­dard library, such as vec­tors and strings.

You now know all the basics we need, and the actual Win32 API should now be very sim­ple to deal with. Not ele­gant or con­sis­tent, but com­pre­hen­si­ble as long as you keep a close eye on the doc­u­men­ta­tion and take noth­ing for granted.

First of all, the doc­u­men­ta­tion can be found here. As you prob­a­bly already know, Microsoft’s own search capa­bil­i­ties are nonex­is­tent, and to find the func­tion you need, you’ll typ­i­cally want to use Google. But some­times, the com­plete ref­er­ence is use­ful, so here it is.

To teach you how to use the Win32 API, I wil, run you through a pair of func­tions with some very basic func­tion­al­ity: retriev­ing the last error message.

Should be sim­ple, right? You’d think so, if you’re new to Win32.

The oper­a­tion con­sists of two steps. First we have to retrieve the last error code, and then we have to ask Win­dows for the asso­ci­ated mes­sage as a string.

And the first step is indeed easy. We just have to call the GetLastError func­tion. Let’s start with the com­plete code:

#include <windows.h>

int main() {
  DWORD error = GetLastError();
}

Feel free to run this in the debug­ger and see which code the func­tion returns. (Most likely you’re going to get a 0, since no error has actu­ally occurred at this point).

Now let’s look at the actual doc­u­men­ta­tion. They describe the func­tion as hav­ing this signature:

DWORD WINAPI GetLastError(void);

which looks like noth­ing we’ve seen so far. Let’s take the easy parts first. The function’s name is Get­LastEr­ror. It takes a para­me­ter of type void, or so it seems. This is actu­ally a throw­back to C, and means that the func­tion takes no para­me­ters. Both GetLastError() and GetLastError(void) is legal in C++. In C, the two used to have sub­tly dif­fer­ent mean­ings. (void) prop­erly declared a func­tion which took no para­me­ters, while () declared a func­tion which took any num­ber of para­me­ters, but sim­ply didn’t access them. But that was C. In C++, the two are iden­ti­cal, and we usu­ally use () to indi­cate a func­tion that takes no parameters.

Next, we have the return type at the far left. DWORD is short for Dou­ble Word (a word is the “nat­ural” data size on the CPU, which, back in the old days, was a 16-bit inte­ger. Hence, a dou­ble word is a 32 bits wide. Microsoft has defined DWORD as a macro alias for porta­bil­ity. Under the hood it is sim­ply an unsigned int, but that may change some day. If it does, they will rede­fine the DWORD macro to stand for some other type. So if you use DWORD when they tell you to, your code will still com­pile when it hap­pens. It is eas­i­est to just nod and accept this. It doesn’t make a big dif­fer­ence for us, but if and when we need to, we know that we can cast a DWORD to an unsigned int

That leaves the last name, WINAPI, which exists for pretty much the same pur­pose. It is another macro, and stands for the call­ing con­ven­tion. The call­ing con­ven­tion for a func­tion spec­i­fies how para­me­ters should be passed to it, and where it should place its return value. If we don’t know the call­ing con­ven­tion of a func­tion, we can not call it. Nor­mally, we’re happy to use the default call­ing con­ven­tion, but the Win­dows API has to be spe­cific, so they add the WINAPI macro. And again, they use a macro so that if they one day decide to change the under­ly­ing call­ing con­ven­tion, they can sim­ply rede­fine this macro, and everyone’s code should still com­pile with no problems.

Fol­low­ing this, they describe the para­me­ters and return value in detail. This is always worth read­ing in detail, because often, some para­me­ters may or must be NULL. Like­wise, the return value may have sev­eral mean­ings, and there is no sin­gle con­sis­tent con­ven­tion. Some func­tions return zero on suc­cess, oth­ers return non-zero, or a pos­i­tive value on suc­ces. Some don’t return a suc­cess code at all. And some func­tions returns NULL on error, and actual data on suc­cess. Always, always read this sec­tion carefully.

In this case, we’re lucky. It sim­ply returns the cur­rently active error code, although it does ram­ble on about all the incon­sis­ten­cies caused by other functions.

The remarks sec­tion tells us other infor­ma­tion that doesn’t fit under one spe­cific para­me­ter or under the return value. Again, this should never be skipped. This is where all the incon­sis­ten­cies and spe­cial cases are often listed.

Some func­tions then have a link to an exam­ple usage.

Finally, the doc­u­men­ta­tion shows us where and when the func­tion is defined. In this case, we need at least Win­dows 2000, and the func­tion is defined in WinBase.h (but we should just include windows.h).

And it is defined in the Kernel32.Lib library. This library is included by default, so we don’t have to worry about this.

So far, it hasn’t been too bad, has it? It should be clear already that it’s not a pretty API, but as long as we stick to the doc­u­men­ta­tion it’s pretty straightforward.

So let’s move on to the FormatMessage func­tion. Fol­low that link, and take a look… I’ll be here waiting.…

Done? Good. Now this looks scary. And no, this time I can’t give you a sim­ple expla­na­tion. This func­tion truly is scary. Of course, this is one of the rea­sons why I picked it for this exam­ple. This is about as bad as the Win32 API gets.

The page lists the fol­low­ing func­tion prototype:

DWORD WINAPI FormatMessage(
  __in      DWORD dwFlags,
  __in_opt  LPCVOID lpSource,
  __in      DWORD dwMessageId,
  __in      DWORD dwLanguageId,
  __out     LPTSTR lpBuffer,
  __in      DWORD nSize,
  __in_opt  va_list *Arguments
);

__in, __in_opt and __out are Microsoft-specific exten­sions, and are mainly used for doc­u­men­ta­tion and for sta­tic code ver­i­fi­ca­tion. It tells us which para­me­ters are used for input, and which ones are for out­put, as well as which ones are optional.

LPCVOID is another Microsoft macro. Microsoft spent a decade or two pro­mot­ing Hun­gar­ian Nota­tion before they had to admit what an aston­ish­ingly bad idea it actu­ally was. But of course Win32 is stuck with it. The LP pre­fix stands for “Long Pointer”, and you can pretty much ignore the “Long” part. That dates back to 16-bit com­put­ers, where you actu­ally had dif­fer­ent types of point­ers (far and near point­ers). All we need to know is that it is a pointer. The C is for con­stant. In other words, this is a con­stant pointer to void, or const void*. (Of course, void isn’t a very mean­ing­ful thing to point to. A void pointer is essen­tially used as a pointer to an unknown type.)

LPTSTR is another adven­ture in Hun­gar­ian Nota­tion. You already know LP. STR is prob­a­bly obvi­ous too. It’s a string. (Of course, since this is a C API, we’re talk­ing about a C string, or a char pointer, which also explains the pres­ence of the LP part. That leaves the T. What can that mean? I’m not sure. It might be “Tem­plate” or sim­i­lar. It was intro­duced when Microsoft real­ized that they’d have to sup­port Uni­code. As I men­tioned pre­vi­ously, Win­dows uses wchar_t for uni­code text, and so their API had to accept wchar_t point­ers when work­ing with Uni­code strings. But they still had to be back­wards com­pat­i­ble as well, and be able to han­dle plain char point­ers as well.

So they invented a new set of macros The T essen­tially stands for “whichever char­ac­ter type is cur­rently active”. If you enter your project’s prop­er­ties, you’ll see the option to enable or dis­able Uni­code on the Gen­eral tab. It should be enabled by default.

As long as Uni­code is enabled, any macro includ­ing this T will be mapped to the equiv­a­lent macro using a W (for Wide). If Uni­code is dis­abled, the macro will instead point to a sim­i­larly named macro with­out this character.

In other words:

  • LPTSTR -> LPWSTR or LPSTR
  • LPTCSTR -> LPWCSTR or LPCSTR
  • TCHAR -> WCHAR or CHAR

And all of these again point to the types you would prob­a­bly now expect. LPWSTR is a pointer to a wide string (wchar_t*). And LPCSTR is a const pointer to a string, or const char*. And WCHAR is a wchar_t.

As if this wasn’t com­pli­cated enough, the func­tion itself is also a macro. Two ver­sions of the func­tion actu­ally exist:

  • FormatMessageA is the old ASCII ver­sion, using plain char strings.
  • FormatMessageW is the “new” Uni­code ver­sion, using wchar_t strings.

For­matMes­sage is not itself a func­tion, but sim­ply a macro, which is resolved by the pre­proces­sor to one of these two. (C doesn’t allow over­loaded func­tions, so they had to set­tle for this ugly hack to allow mul­ti­ple def­i­n­i­tions of the same function).

This also means that we can actu­ally call these two names directly. If we call FormatMessageW, we’ll get the Uni­code ver­sion regard­less of whether Uni­code is enabled in project set­tings. This makes it safe for us to use wchar_t strings directly, rather than mess around with TCHAR strings which might be one or the other.

Going back to the func­tion dec­la­ra­tion, the last para­me­ter, va_list, looks a bit out of place. It’s not cap­i­tal­ized, and it doesn’t have these ugly pre­fixes. It is used in C to indi­cate a vari­able num­ber of argu­ments, com­monly known as varargs. As I men­tioned in part I, printf uses varargs as well, and this throws away all hope of type safety, or even know­ing how many para­me­ters are pased to the func­tion. Hope­fully we won’t need to mess with this. (it’s marked as __in_opt, so it should be optional. Let’s hope we won’t have to use it then).

Any­way, there’s noth­ing for it. Let’s dive in. First parameter:

Ok, so this is a DWORD flag, and seems to store a com­bi­na­tion of two val­ues. The sec­ond table is shorter, so let’s take that first. There are three options here. A zero just means to pre­serve what­ever line breaks exist in the mes­sage by default. the con­stant FORMAT_MESSAGE_MAX_WIDTH_MASK seems to pre­serve hard­coded line­breaks, but removes “reg­u­lar” ones. I have no clue what the dif­fer­ence is.

The last option (men­tioned just under the table) is to store any other num­ber into the value. This then spec­i­fies the max­i­mum line width. We’re happy to just use the default line breaks though, so we’ll set­tle for the zero value. That leaves the first table.

Look­ing through the options there, it seems that we need FORMAT_MESSAGE_FROM_SYSTEM. FORMAT_MESSAGE_ALLOCATE_BUFFER seems poten­tially inter­est­ing as well, but this table doesn’t really explain what hap­pens if this flag is not enabled. If the sys­tem doesn’t allo­cate a buffer for us, who does? Look­ing down fur­ther, at the input para­me­ter nSize we see that:

If the FORMAT_MESSAGE_ALLOCATE_BUFFER flag is not set, this para­me­ter spec­i­fies the size of the out­put buffer, in TCHARs.

In other words, if we don’t use this flag, we have to pro­vide a buffer. But we don’t know the length of the mes­sage we’re try­ing to retrieve, so this seems a bad idea. (Of course we could just pro­vide a buffer of 64KB, which the doc­u­men­ta­tion men­tions is the max­i­mum size, but this seems silly).

Finally, if we skip down to the “Secu­rity Remarks”, it says to add FORMAT_MESSAGE_IGNORE_INSERTS if we’re going to pass “arbi­trary sys­tem error codes”, which we are. Most API’s try to ensure that the sim­plest action is the cor­rect one. Win32 seems to be designed for the oppo­site case, ensur­ing that that cor­rect usage should only be pos­si­ble if you have already read the entire doc­u­men­ta­tion page, very care­fully, and at least three times. But that won’t stop us.

So the dwFlags para­me­ter should then be the com­bi­na­tion of these flags: FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_IGNORE_INSERTS | 0‘, although of course the 0 can be omitted.

Next, we have lpSource. Luck­ily, this is marked optional, and it is stated that this is ignored unless one of the two listed dwFlags val­ues are set, which they’re not in our case. So we ignore it and sim­ply pass NULL.

Then we have the mes­sage ID. This must be the value we got from GetLastError. Then we have the lan­guage ID. Rather than going search­ing for pos­si­ble val­ues to pass here, we can see that if we just pass a zero, it’ll try to pick a sen­si­ble default. So let’s do that.

Now comes the pointer to the out­put buffer. Read what it says here carefully:

If dwFlags includes FORMAT_MESSAGE_ALLOCATE_BUFFER, the func­tion allo­cates a buffer using the LocalAl­loc func­tion, and places the pointer to the buffer at the address spec­i­fied in lpBuffer.

So the para­me­ter lpBuffer is a pointer to the pointer to the buffer. That is, we must pass it a pointer to the pointer it should set to point to the allo­cated buffer.

It also men­tions that the buffer is allo­cated with LocalAlloc, and must be freed by us with LocalFree. Bet­ter remem­ber this, or we’ll leak mem­ory. Note that Win­dows defines sev­eral dif­fer­ent mem­ory allo­ca­tion func­tions. This time they chose to use LocalAl­loc. C++‘s new and delete are imple­mented in terms of some of these, but who knows which?.

Now comes nSize. It allows us to spec­ify the min­i­mum num­ber of char­ac­ters to allo­cate? Why would we care about that? Let’s just pass zero and hope for the best. It’s just a min­i­mum after all.

Finally, we have Arguments. We already spec­i­fied that the sys­tem should ignore inserts, so it seems like it shouldn’t actu­ally care about these argu­ments. They’re also spec­i­fied as optional, so let’s pass a big fat NULL here.

And that should be it! Now we just have to han­dle the return value:

  • zero on fail­ure, or
  • the num­ber of TCHARs stored in the out­put buffer, not count­ing the ter­mi­nat­ing NULL

And… we’re through. Now let’s try putting the pieces together and see what happens:

#include <windows.h>
#include <iostream>

int main() {
  DWORD error = GetLastError();

  wchar_t* buffer;

  DWORD length = FormatMessageW(
  FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_IGNORE_INSERTS,
  NULL,
  error,
  0,
  (wchar_t*)&buffer,
  0,
  NULL);

  std::wcout << buffer << std::endl;

  LocalFree(buffer);
}

Note the ugly cast we need to on the buffer. This is nec­es­sary because the argu­ment may be either a pointer to a pre-allocated buffer, or (as in our case), a pointer to a pointer we’d like to be set to point to the system-allocated buffer. But the func­tion expects a pointer to a string buffer, not a pointer to a pointer to a string buffer, so if we want to pass it the lat­ter, we have to cast it to the for­mer type.

Note that I’m call­ing the W ver­sion of the func­tion specif­i­cally, and using wchar_t instead of TCHAR. The rea­son is sim­ple. I want the Uni­code ver­sion, regard­less of Uni­code set­ting in the project. Part of the rea­son is that it’s a lot eas­ier to print out the string when we know what type it is. In par­tic­u­lar, the stan­dard library requires us to use cout for reg­u­lar char­ac­ter strings, and wcout for wide strings. If we’re given a string of TCHAR’s, do we call cout or wcout to print it? Eas­ier to just be spe­cific and make sure we have wide characters.

Well, that’s it. Try run­ning it. It should print out that “the oper­a­tion com­pleted suc­cess­fully”. Gee, thanks. That really makes it all feel worth­while, doesn’t it? Make sure you under­stand what our code means (in par­tic­u­lar, why the cast is nec­es­sary, and how wcout is able to print out the string and know where it ends, when all it has is a pointer to a character.

Any­way, you’ve now seen some of the worst the Win32 API has to offer. And you’re still alive. Many of the func­tions you might want to call are far sim­pler than this.

Share and Enjoy: These icons link to social book­mark­ing sites where read­ers can share and dis­cover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

Tags: , , ,

Leave a Reply

Name and Email Address are required fields. Your email will not be published or shared with third parties.