A .NET Developers Guide to C++ (part II)

Wel­come to the sec­ond install­ment in my guide of “what you need to know if you’re a .NET pro­gram­mer who wants to be able to write C++ code and call native APIs”. It took me much longer to get this posted than I’d hoped. My work on my the­sis has kept me more busy than I’d orig­i­nally expected. Sorry for the delay!

In part I, I went through a min­i­mal “Hello World” pro­gram in some detail, and attempted to explain the arcane work­ings of the C/C++ com­pi­la­tion model. Some may argue that this had no rel­e­vance to my tar­get audi­ence, but I think it is a nec­es­sary evil. Almost all C++ pro­gram­mers get tripped up at some point by the the dif­fer­ence between com­piler and linker errors, and what exactly the #include direc­tive actu­ally does. Hope­fully, by read­ing part I, you’ll be able to avoid this.

With that out of the way, we can get started on the inter­est­ing part, though. Part II will focus on actual C++ code. We won’t con­sider man­aged interop or even the Win32 API yet, though. This part will still take place in native C++-land only. In short, the pur­pose of this part is to enable you to write sim­ple C++ pro­grams, and more impor­tantly, to under­stand the C++ sam­ple code you prob­a­bly run into from time to time.

I will not cover all the idioms and tech­niques that “real” C++ pro­gram­mers use. We’ll set­tle for the bare min­i­mum required to get by in a .NET-to-Win32 interop sce­nario where you really just want to write enough C++ code to call some native API func­tion. This means that we won’t get the most robust, reusable, ele­gant or con­cise C++ code. But we will be able to get the job done.

I’d love to write a more detailed series of posts about “mod­ern C++“1 some other time, but it is beyond the scope of this series of posts.

Using C++

Before we get into the Win32 API, let’s run through some slightly big­ger C++ exam­ples than the Hello World from part I. At the very least you’re going to need to know how to define and use classes, and a few use­ful com­po­nents in the stan­dard library.

You already know that it is pos­si­ble to define class mem­ber func­tions out­side classes, but you haven’t yet seen a non­triv­ial class def­i­n­i­tion. Let us try cre­at­ing one. For the pur­poses of demon­stra­tion, I’ll imple­ment the sim­plest class I can think of; a counter. It’ll sim­ply con­tain an inte­ger, and callers will be able to incre­ment the value, and get the cur­rent value.

class counter {
public:
  counter() : i(0) {}
  int current() {return i; }
  void update() { ++i; }
private:
  int i;
};

There, we now have a basic class. We can call it from this func­tion: (I use assert to indi­cate expected val­ues of vari­ables, much like you would in a unit-test. Note that the asserts are pseudocode (among other things, I will access pri­vate class mem­bers with them, which obvi­ously won’t work in reality)

// assume that we either placed the class definition here, or have a #include for the header in which the class is defined.
int main(){
  counter c;
  int i = c.current();
  assert(i == 0);
  c.update();
  assert(c.i == 1);
  assert(c.current() == 1);  
}

A ridicu­lous sim­ple pro­gram, of course. But there are sev­eral things worth not­ing. In no par­tic­u­lar order:

  • Our counter object c is cre­ated with­out using new, and with­out explic­itly call­ing a con­struc­tor. All C++ types are fun­da­men­tally sim­i­lar to .NET’s value types — so c is not a ref­er­ence to a counter, but instead a default-constructed instance of one, placed on the stack. If noth­ing else is spec­i­fied, the default con­struc­tor is called when the vari­able is declared. (To call another con­struc­tor, we could have done some­thing like counter c(1, "hello", 2.0f);.
  • the class def­i­n­i­tion is ter­mi­nated by a semi­colon. This is impor­tant to remem­ber, as for­get­ting it can lead to very mis­lead­ing com­piler errors. I won’t get into why this semi­colon is nec­es­sary here though. It is a long story, and it is caused by the need for C compatibility.
  • access spec­i­fiers are not applied per-member, but rather used to divide the class into sec­tions. In a class, the default spec­i­fier is private. I tend to put my pub­lic mem­bers at the top of the class, to make it eas­ier for read­ers to find the pub­lic inter­face. Fur­ther down, we have a private spec­i­fier, for hid­ing our int mem­ber. The valid access spec­i­fiers are public, private and protected, which each behave just like in C#. internal does not exist how­ever, since there is no notion of assem­blies, and the only way to share types between files is with #include’s as men­tioned in part I.
  • there is no clear, com­mon nam­ing con­ven­tion in C++. The stan­dard library users lower-case, and sep­a­rate words by under­scores, as in class_name. Many pro­gram­mers how­ever, use a con­ven­tion sim­i­lar to in .NET, nam­ing types ClassName, and vari­ables className. There are no fixed rules, so as long as you are con­sis­tent it’s fine with me.
  • .NET has both classes and structs, and the two have very dif­fer­ent mean­ings. In C++, both classes and structs exist as well, but their mean­ing is almost the same. The only dif­fer­ence between a class and a struct in C++ is that a struct defaults to pub­lic acces­si­bil­ity for mem­bers, where a class defaults to pri­vate. In other words, if I had defined the above as a struct, I could have omit­ted the public: line. (For this rea­son, I often find myself using structs. As I said, I tend to put the pub­lic inter­face at the top of the class, and add a private: sec­tion fur­ther down. How­ever, it is not a big deal. A com­mon rule of thumb is much the same as is used in C#: Structs are sim­ple con­tain­ers of data, where classes have behav­ior. My own style tends to be a com­pro­mise between the two. Classes with com­plex behav­ior are made classes for this rea­son, but in sim­pler bor­der­line cases, I tend to pre­fer struct, even if it has some behav­ior. It makes no dif­fer­ence to the com­piler, and it saves me a line of code, because it defaults to public, which is what I want at the top of my class anyway.)
  • the obser­vant reader will prob­a­bly have noticed a dif­fer­ence here com­pared to the exam­ple shown in part I. Back then we declared the mem­ber method with­out defin­ing its body inside the class. Now we define the body inside the class. Both approaches are legal, and each have their pros and cons. In par­tic­u­lar, defin­ing the body out­side the class lead­ers to shorter class def­i­n­i­tions, which may aid read­abil­ity. On the other hand, defin­ing func­tions inside the class leads to bet­ter local­ity — you only have to look in one place to learn all about the class. Fur­ther, the com­piler is gen­er­ally bet­ter able to opti­mize code if mem­ber meth­ods are defined “inline”. For these rea­sons, peo­ple often put short func­tions of 2 – 3 lines or so inside the class, and define larger ones sep­a­rately. There is one caveat, how­ever. Func­tions defined inline (either by plac­ing the full def­i­n­i­tion inside the class, or by mark­ing the def­i­n­i­tion with the inline key­word) may have a def­i­n­i­tion in each trans­la­tion unit. In other words, they may be placed in head­ers (where they’re seen by mul­ti­ple trans­la­tion units. Non-inline func­tions must only be defined once, and so gen­er­ally have to be defined in a .cpp file, sim­i­lar to what we did in part I.
  • the con­struc­tor looks a bit dif­fer­ent than you may be used to. The i mem­ber is ini­tial­ized via the ini­tial­izer list, spec­i­fied after the colon. This is sim­i­lar to how you would call a base class con­struc­tor in C#, although instead of base, the mem­ber name is used. Also note that we have to explic­itly ini­tial­ize i because as a prim­i­tive type, it would oth­er­wise not be ini­tial­ized at all. The ini­tial­izer list syn­tax is only legal in con­struc­tors, and should be used as much as pos­si­ble. I’ll explain why in a moment.

“Spe­cial” mem­ber functions

We could have defined the con­struc­tor in a more famil­iar way:

counter() {
  i = 0;
}

and in this sim­ple case, it would have made no dif­fer­ence. In more com­plex classes, how­ever, there is an impor­tant dis­tinc­tion: any­thing that hap­pens in the constructor’s body hap­pens after mem­bers are ini­tial­ized. If the mem­ber is not spec­i­fied in the ini­tial­izer list, it is default ini­tial­ized, which means that for prim­i­tive types, noth­ing hap­pens, they just con­tain ran­dom garbage val­ues, and for classes defin­ing a default con­struc­tor, it gets called before the constructor’s body is eval­u­ated, in which we assign the actual value we want the mem­ber to contain.

So yes, for our sim­ple int case, we might as well have writ­ten the con­struc­tor with­out using the ini­tial­izer list. But con­sider what would have hap­pened if the mem­ber had been some com­plex user-defined class. Instead of sim­ply con­struct­ing the object with the right value to begin with, we would have default-constructed it, and then exe­cuted an assign­ment. This would obvi­ously have been less effi­cient than sim­ply con­struct­ing the object cor­rectly in the first place.

But just as impor­tantly, some types can not be assigned to once they are ini­tial­ized. Like­wise, some types may not have a default con­struc­tor, in which case fail­ure to use the ini­tial­izer list to explic­itly call another con­struc­tor will result in a com­piler error! So in gen­eral, the ini­tial­izer list should be pre­ferred both from per­for­mance and cor­rect­ness con­cerns. A side effect of the ini­tial­izer list is that the actual body of con­struc­tors can often be left empty.

In .NET, there is a dis­tinc­tion between value and ref­er­ence types, and the behav­ior of the assign­ment oper­a­tor is com­pletely dif­fer­ent for each of the two cases. x = y for val­ues of a ref­er­ence type sim­ply stores a ref­er­ence to y into x. But if the two types are value types, a com­plete copy is cre­ated instead.

In C++, all vari­ables obey value seman­tics. x = y will always copy the value y into x. This is why I said when dis­cussing the constructor’s ini­tial­izer list that an extra assign­ment may be expensive.

Since the plain value seman­tics as used by C# would be both inflex­i­ble and inef­fi­cient, C++ pro­vides a num­ber of tools for con­trol­ling the behav­ior of your class. In par­tic­u­lar, you can define a copy con­struc­tor and an assign­ment oper­a­tor to over­ride exactly how assign­ment should be per­formed. The fol­low­ing demon­strates what they may look like.

class counter {
public:
  counter(const counter& other) : i(other.i) {} // copy constructor
  counter& operator= (const counter& other) { // assignment operator
    if (this == &other) {return *this; }
    i = other.i;
  }
  ....
};

Per­haps the first thing we should men­tion is the mean­ing of the & char­ac­ter. It is used to denote a ref­er­ence, essen­tially an alias for a vari­able. It is related to point­ers (see this post for a more detailed expla­na­tion of point­ers), but is sim­pler and more lim­ited. In par­tic­u­lar, it can not be reseated. Once it is ini­tial­ized, it is an alias for the vari­able it points to for­ever. Also, unlike point­ers, there is no spe­cial syn­tax for using a reference:

int i; // create an integer.
int& r = i; // create a reference as an alias of i. Note that we simply assign i, unlike with pointers where we would have had to take the address of i first with the `&` operator.
r = 42; // assign 42 to whatever the reference points to. Again, no special syntax. There is nothing here to tell us that r is a reference.
int j = 13; // create another integer
r = j; // assign it to our reference. The effect of this is *not* to make r point to j (as would have happened had it been a pointer), but simply to assign the value of j to i. In other words, i will now equal 13, and r will still point to i.

Because a ref­er­ence can not be reseated, it is also a nice exam­ple of a case where the constructor’s ini­tial­izer list must be used. Imag­ine a class which has a ref­er­ence mem­ber. Ref­er­ences must point to some­thing, so they have no default con­struc­tor. And once they are ini­tial­ized to point to an object, they always point to that object. In other words, it must be con­structed before the constructor’s body is exe­cuted, which means in the ini­tial­izer list. Fail­ure to do so sim­ply won’t compile.

Now to explain the copy con­struc­tor, which is fairly sim­ple. It is sim­ply a con­struc­tor which takes one argu­ment, a const ref­er­ence to the type itself. Copy con­struc­tors are com­monly used to ini­tial­ize class mem­bers with copies of the argu­ments passed to the “outer” class’ con­struc­tor. In the copy con­struc­tor above, we also copy-construct i, for exam­ple. (The value of other.i is copied into our own i)

The assign­ment oper­a­tor is a bit trick­ier. The first line inside it tests for assign­ment to itself. (As would hap­pen in x = x). This may not have been a prob­lem in this sim­ple class, but in more com­pli­cated ones, self-assignment can cause prob­lems, as you will be read­ing data from the same object you’re writ­ing to. We also note that instead of sim­ply com­par­ing this to other in the test, we use &other. We wish to check that this and other refer to the same object instance, not just that they con­tain the same value. To achieve this, we need to com­pare point­ers. this is already a pointer2, but other is a ref­er­ence, so we have to take the address of it first. Because a ref­er­ence is essen­tially an alias for the ref­er­enced value, the address-of oper­a­tor returns the address of the ref­er­enced value, not of the ref­er­ence itself.

Next, note that the assign­ment oper­a­tor does not have an ini­tial­izer list, but instead per­forms the copy­ing in the func­tion body. The rea­son for this is obvi­ous: It is not a con­struc­tor, so all its mem­bers are already ini­tial­ized. An ini­tial­izer list would not make sense, and is not allowed by the lan­guage. This also means that here, i’s assig­ment oper­a­tor is invoked, rather than its copy con­struc­tor, as was used in the pre­vi­ous exam­ple. (Tech­ni­cally, built-in types have nei­ther assign­ment oper­a­tor or copy con­struc­tor. How­ever, the same syn­tax is allowed, it sim­ply uses the obvi­ous built-in operations.)

A final note about assign­ment oper­a­tors and copy con­struc­tors is that if = is used to declare a vari­able, the copy con­struc­tor, and not the assign­ment oper­a­tor, is called. As I said before, these func­tions are spe­cial and known to the com­piler, and so, can be invoked in spe­cial cases. That is, if you are given vari­ables c and d of type counter, then c = d calls the assign­ment oper­a­tor on c, because c is already ini­tial­ized. But if it had instead been counter c = d, then c would have been ini­tial­ized as a copy of d, and so its copy con­struc­tor would have been used. The com­piler ensures this, even if you use assign­ment syn­tax in the ini­tial­iza­tion of a variable.

Finally we get to another dreaded C++ con­struct: the destruc­tor. This is auto­mat­i­cally called when the object is destroyed, and can be defined thusly:

class counter {
public:
  ~counter(){
    std::cout << i << std::endl;
  }
  ....
};

The syn­tax is sim­i­lar to final­iz­ers in C#, but the effect is some­what dif­fer­ent. The destruc­tor is invoked instantly when an object is deleted, and it is guar­an­teed to be called. In our case, we sim­ply use it to print out the counter value.

Let’s try using these new func­tions and operators:

int main(){
  counter c; // use the default constructor to create a counter
  c.update(); // increment its value
  assert(c.i == 1);
  counter d(c); // use the copy constructor to create a new copy of our existing counter.
  assert(d.i == 1);
  d.update();
  assert(c.i == 1); // our copy constructor made sure to create a *new* counter variable, so c is not affected by changes to d, and vice versa
  assert(d.i == 2); 
  c = d; // since c has already been initialized, the assignment operator is used to copy d into c.
  assert(c.i == 2);
} // at this point, both c and d go out of scope, and so their destructors are called. Destructors are always called in opposite order of destruction, so d's destructor will be invoked first.

All three func­tions are auto-generated by the com­piler, if not declared explic­itly. (The one excep­tion is the assign­ment oper­a­tor, which it may not be pos­si­ble to auto-generate. If a class con­tains a mem­ber with no assign­ment oper­a­tor, or a ref­er­ence (which can not be reseated), the com­piler will fail to gen­er­ate an assign­ment oper­a­tor, and all attempts to per­form assign­ment will fail if one is not explic­itly defined by the user.

The trio of copy con­struc­tor, assign­ment oper­a­tor and destruc­tor are some­times called “the big three”, or we may speak of “the rule of three”. This is a rule of thumb that if you find your­self imple­ment­ing one of these three spe­cial func­tions, you almost cer­tainly should also imple­ment the other two. The rea­son­ing is pretty sim­ple: The assign­ment oper­a­tor and copy con­struc­tor are related — both are used to copy an object. If spe­cial care has to be taken when copy­ing, then it should prob­a­bly be defined for both these functions.

Fur­ther, if copy­ing requires non­triv­ial han­dling, then it is a good bet that the class man­ages some kind of resource or con­tains data which requires spe­cial care in the destruc­tor as well. Per­haps a pointer point­ing to dynam­i­cally allo­cated mem­ory, which must be deleted, or per­haps it should decre­ment a global counter used to count the num­ber of live instances of the class. Or per­haps it is a file han­dle which must be closed. The fact that we had to imple­ment spe­cial han­dling when copy­ing is a strong hint that there will prob­a­bly also be spe­cial han­dling required when clean­ing up in the destructor.

And the con­verse is also true. If the destruc­tor has to do some­thing spe­cial, it must be because the class owns some kind of resource that must be released. And if it owns a resource, then we should ensure that the resource gets copied when the class itself does. So we should prob­a­bly define copy con­struc­tor and assign­ment oper­a­tor as well.

POD types

A final note about classes may be worth men­tion­ing. C had no classes, only sim­ple structs con­tain­ing val­ues, but no mem­ber func­tions, and with­out allow­ing inher­i­tance or access spec­i­fiers. Since C++ was designed to be (mostly) backwards-compatible, such types have a spe­cial sta­tus in C++. In the above, I men­tioned “prim­i­tive types” a few times. While an int is tech­ni­cally a prim­i­tive type (all built-in types are con­sid­ered prim­i­tive types), the behav­ior I described is actu­ally com­mon to all POD (Plain Old Data) types. A POD type is essen­tially a type that would have been legal in C — in other words, it is either a built-in (prim­i­tive) type, or a class or struct where

  • all mem­bers are public
  • no mem­ber meth­ods exist
  • no con­struc­tor, copy con­struc­tor, assign­ment oper­a­tor or destruc­tor is defined
  • no base classes exist
  • All mem­bers are POD types as well

Such POD types are given spe­cial treat­ment in many ways. For exam­ple, they may be treated as “raw mem­ory”. The standard-library C func­tion memcpy, which sim­ply copies a num­ber of bytes from one loca­tion to another, may be used to copy POD types, but not non-POD classes. The rea­son for this is that non-POD types may have extra behav­ior that would break if this was done. As an obvi­ous exam­ple, if we cre­ated a copy in this way, we would bypass the assign­ment operator/copy con­struc­tor, but we would end up with two objects, both of which would have their destruc­tors called when deleted — so we would end up with a mis­match where the destruc­tor is called more often than the con­struc­tors, a clear error if the class imple­ments reference-counting, for example.

Another pecu­liar­ity of POD types is that they are not ini­tial­ized unless a con­struc­tor is explic­itly called. this is why we had to ini­tial­ize i in our con­struc­tor above. As a POD type, i would oth­er­wise con­tain what­ever garbage value was found in mem­ory. The same is true for POD structs. They too con­tain garbage if not explic­itly ini­tial­ized by call­ing a constructor:

int i; // no initialization occurs
int i(); // explicitly require default initialization -- for POD types, this is done by setting all members to zero.

In other words, had our counter class stored a non-POD mem­ber, the ini­tial­izer list would not have been nec­es­sary. Its mem­ber would auto­mat­i­cally be default-constructor if noth­ing else was spec­i­fied. But POD types do not have that extra behav­ior, so if noth­ing else is spec­i­fied, they sim­ply don’t get initialized.

Enough about classes

There are a few other nitty-gritty details about the lan­guage we should dis­cuss. You may have already won­dered about one or two of them. So with­out fur­ther ado,

  • vari­able dec­la­ra­tion is usu­ally done with­out using new. The new oper­a­tor allo­cates mem­ory on the heap, and returns a pointer to the newly declared vari­able. Since there is no garbage col­lec­tor, we have to man­u­ally call delete on this pointer to free the mem­ory. This is the source of C++‘s rep­u­ta­tion as a play­ground for mem­ory leaks. Of course the astute reader will have noticed that so far, I haven’t used new and delete even once. The truth is that these can often be avoided or hid­den, thus remov­ing all pos­si­bil­ity of mem­ory leaks. Any vari­able declared with­out using new is declared “locally” — if it is declared in a func­tion, it becomes a local vari­able, and is destroyed when we leave the scope in which it is declared. If it is a class mem­ber, it is destroyed when the own­ing class is destroyed. If it is declared inside a loop, it is destroyed when we leave the loop, and if it is defined in a func­tion, it is destroyed when we leave the func­tion. In other words, vari­ables declared with­out new have “auto­matic stor­age dura­tion”, and in fact, int i = 42 could also be writ­ten as auto int i = 42. The auto key­word indi­cates exactly this, that the life­time of the vari­able is auto­matic. Since this is the default, the key­word is never actu­ally used, but it exists, and this is what it means. And just to clear up any doubts, vari­ables with auto­matic stor­age dura­tion are destroyed when we leave the scope it was declared in, no mat­ter how we leave it. It doesn’t mat­ter if we return from the func­tion, or if an excep­tion is thrown. In both cases, the local variable’s destruc­tor is called.
  • Just to avoid con­fu­sion, we’d bet­ter look at a quick exam­ple of using new: Con­sider this line of code: counter* p = new counter(). Here, we allo­cate an object of our counter class on the heap, with dynamic stor­age dura­tion, but we also declare a local vari­able — the pointer p. The pointer is a local vari­able with auto­matic stor­age dura­tion. In other words, the pointer itself will be freed just fine when we leave the func­tion — but the dynam­i­cally allo­cated counter to which it points will not. This is how mem­ory leaks occur. Once p gets destroyed, we no longer have a pointer to the dynam­i­cally allo­cated mem­ory, so we can never free it.
  • Avoid­ing cyclic depen­den­cies can take a bit of work, since C++ code is read by the com­piler from top to bot­tom. It won’t let a func­tion or class refer to another which hasn’t been defined yet. Some­times, this can be solved through refac­tor­ing, by split­ting out the code we need to refer to, out into a sep­a­rate class which can be declared first. But another trick is to use for­ward dec­la­ra­tions. You have already seen it used for the class mem­ber method in part I. We can declare a func­tion with­out spec­i­fy­ing its body. This tells the com­piler that the func­tion exists, which means we can call it safely. So if we put such a dec­la­ra­tion at the top of a file, we can pro­vide the actual def­i­n­i­tion includ­ing the body at the end of the file, after what­ever classes or func­tions we need to refer to. For classes, we can do a sim­i­lar trick, and sim­ply declare class counter;. As with the func­tion case, this tells the com­piler that counter is a class, and that it does exist. The def­i­n­i­tion just isn’t shown yet. This won’t let you access class mem­bers yet (since the com­piler still doesn’t know which mem­bers it has), and you can’t declare vari­ables of that type yet (because the com­piler doesn’t know which, if any, con­struc­tor to call, and it doesn’t know the size of the class). But you can cre­ate ref­er­ences and point­ers to the class.
  • C# uses func­tion over­load­ing to allow for func­tions where some para­me­ters may have sen­si­ble default val­ues. If we have a func­tion tak­ing para­me­ters a and b, we can cre­ate an over­load which takes only a, and pro­vides a default value for b. The same can be done in C++, but you also have the option of pro­vid­ing default val­ues. The func­tion void foo(int i = 0) {std::cout << i << std::endl; } can be called just with foo(), and will print out 0. If you are more com­fort­able with over­load­ing, you may not need to use default para­me­ters, but you may still encounter third-party code which uses them, so you should be famil­iar with the syntax.

The stan­dard library

We’re near­ing the end. The last thing you should know about C++ before I let you run loose is a few stan­dard library classes. The C++ stan­dard library is very small com­pared to .NET or Java’s class libraries, but it is also widely con­sid­ered C++‘s main sav­ing grace — most peo­ple con­sider the lan­guage an over­com­pli­cated mess in many ways, but the stan­dard library stands out, both as an exam­ple of C++ done right, and as a redeem­ing fea­ture which trans­forms C++ into a pow­er­ful and ele­gant lan­guage3. Or more pre­cisely, part of the stan­dard library pos­sesses these qualities.

In the fol­low­ing I’ll briefly sketch out the main parts of the stan­dard library, and explain a few use­ful classes. For more gen­eral infor­ma­tion, Microsoft has some excel­lent doc­u­men­ta­tion for all parts of the stan­dard library here.

The stan­dard library has been assem­bled piece­meal over the years, and as such, rep­re­sents sev­eral dif­fer­ent styles and par­a­digms. The old­est parts of it are sim­ple func­tions car­ried over from C’s stan­dard library. I have already men­tioned two of these, printf and memcpy, but of course many oth­ers exist.

After these came the first C++-specific addi­tions, in the form of the iostreams library. You have also encoun­tered a few mem­bers of this, in cout, cin and endl, as well as the operator<< used for stream­ing. This library is, hon­estly, not very nice. It does the job for sim­ple Hello World-like appli­ca­tions, but it is inflex­i­ble, inef­fi­cient, over­com­pli­cated and hard to extend. In fact, many C++ pro­gram­mers stick to printf over cout despite all the dis­ad­van­tages I listed in part I. Of course, iostreams also con­tains file streams as well as some other basic stream func­tion­al­ity. A related addi­tion is the string class, and the locale facilities.

These all have one thing in com­mon: they are very old-fashioned and are, today, con­sid­ered far from ideal. The string class got some last-minute surgery when it was added to make it a bit more mod­ern, and a few addi­tions were made to the stream classes as well, but over­all, these are relics from the era of “C with classes”.

Finally, the star of the show is the Stan­dard Tem­plate Library, or the STL for short. This remark­able library com­pletely changed the how the lan­guage was used, and is def­i­nitely worth explor­ing. I won’t ram­ble on about it here, but I will men­tion that one of its char­ac­ter­is­tics is that it almost com­pletely aban­dons tra­di­tional Object-Oriented pro­gram­ming (which iostreams used heav­ily), in favor of the less known and almost C++-specific par­a­digm Generic Pro­gram­ming.

The STL con­sists of three dis­tinct “pillars”:

  • Con­tainer classes are the equiv­a­lents of .NET’s System.Collections.Generics classes. They store sequences of data, and lit­tle else.
  • Iter­a­tor classes are super­fi­cially sim­i­lar to .NET’s IEnu­mer­a­tor. They allow tra­ver­sal over a con­tainer, but where .NET only allows tra­ver­sal from the begin­ning to the end, C++ iter­a­tors also allow reversed iter­a­tion (from end to begin­ning), as well as tra­ver­sal over sub­sets of the con­tainer (from the 6th to the 12th ele­ment, for exam­ple). Pairs of iter­a­tors are often used to mark sequences for fur­ther pro­cess­ing. Indi­vid­ual iter­a­tors are often used as “mark­ers” into a sequence.
  • Algo­rithm func­tions work on iter­a­tors, or a pair of iter­a­tors, and per­form almost all sequence pro­cess­ing. Sort­ing, search­ing, copy­ing, foreach, accu­mu­lat­ing val­ues or any other algo­rithm involv­ing sequences of data is imple­mented as an algo­rithm work­ing on iterators.

The clever part about this setup is that algo­rithms and con­tain­ers know noth­ing of each oth­ers. An algo­rithm works on iter­a­tors, wher­ever they come from. It works whether the iter­a­tors are point­ers into an array, into a linked list, or per­haps even into a stream or a data­base. As long as the iter­a­tor imple­ments the appro­pri­ate func­tion­al­ity, it can be used by the algo­rithms. This allows for a degree of reusabil­ity that would have been impos­si­ble in .NET. The same find func­tion for exam­ple, works on all of the stan­dard con­tainer classes, in addi­tion to work­ing on any iter­a­tors your define your­self. As long as they ful­fill a few basic require­ments, you get find, sort and many other com­mon oper­a­tions for free.

And again unlike .NET, there is no inter­face you have to imple­ment to cre­ate a new iter­a­tor type, or, for that mat­ter, a new con­tainer class. The STL relies on a form of Duck Typ­ing (if it looks like a duck, and walks like a duck, and quacks like a duck, it must be a duck) — this means that an iter­a­tor is not “a class which imple­ments IIterator<T> or any­thing like that, but sim­ply “A type T for which the fol­low­ing state­ments are defined, given an object x of type T: ++x, *x, T() and a few oth­ers. In other words, if a type defines a default con­struc­tor and a few oper­a­tors, then it is an iter­a­tor, and it’ll work seam­lessly with the rest of the STL. In fact, raw point­ers are valid iter­a­tors as well.

In .NET, every col­lec­tion class has to define its own search func­tion, and there is no ele­gant way to decou­ple it com­pletely. (We could define the func­tion in a sta­tic helper class, but it would still be work­ing on some­thing spe­cific like an IList, rather than just any sequence). In C++, the func­tion std::find works on any pair of iterators.

While iter­a­tors and algo­rithms are key to “mod­ern C++”, I will focus on the con­tain­ers here, as they can be used with lit­tle expla­na­tion, and are almost indis­pens­able (just like you wouldn’t want to pro­gram in C# with­out the List<T> class)

The equiv­a­lent of .NET’s List<T> class is the vector:

#include <vector>

int main() {
  std::vector<int> v;
  v.push_back(1);
  v.push_back(2);
  v.push_back(3);
  v.push_back(42);
  v.pop_back();
  // v now contains the values [1, 2, 3]
  v.resize(5); // resize to contain 5 elements
  // v now contains [1, 2, 3, 0, 0]
  assert(v[1] == 2);
  v[3] = 42;
  // v now contains [1, 2, 3, 42, 0]
  int& r = v[0]; // create a reference to the first element
  int* p = &v[0]; // create a pointer to the first element
}

Pretty straight­for­ward. And again, note that we’ve man­aged to cre­ate an arbi­trary num­ber of objects in our appli­ca­tion, with­out even once hav­ing to call new. Which also means that there is no pos­si­ble way in which this appli­ca­tion can leak mem­ory. (short of bugs in the com­piler or stan­dard library).

There are a cou­ple of caveats to be aware of though:

  • There is typ­i­cally no bounds-checking on the [ ] oper­a­tor. This doesn’t mean it is legal to do v[999] above, it just means that there is no guar­an­tee of what will hap­pen if you do it. It is unde­fined behav­ior.
  • Point­ers and ref­er­ences to indi­vid­ual ele­ments within a vec­tor may be inval­i­dated when we add ele­ments to the vec­tor. Like with C#‘s List<T>, it is a dynamic array, and resizes as nec­es­sary. Each such resiz­ing oper­a­tion con­sists of allo­cat­ing a new array, copy­ing the con­tents into that, and then free­ing the old array. A pointer to data in the old array is there­fore no longer valid. The same applies for iter­a­tors. Any iter­a­tor point­ing into a vec­tor is inval­i­dated if the vec­tor is resized.

Because a vec­tor guar­an­tees that its data is stored con­tigu­ously, essen­tially as an array, we can use this class instead of an array when inter­fac­ing with old C code (which only has point­ers and arrays, but no vec­tors). In the above, the vari­able p could be passed to a C func­tion as a pointer to the begin­ning of an array of int’s. we still have to be care­ful of course. The func­tion must not be allowed to write past the end of the array.

Other con­tainer classes are the the map (equiv­a­lent to .NET’s Dictionary<Key, Value>. std::map<Key, Value> in the map header), and the set (no equiv­a­lent in .NET 2.0, although HashSet<T> in 3.5 is sim­i­lar). Works much like a map with­out the Value para­me­ter: std::set<T> in the set header). Their use is pretty much as you would expect.

In gen­eral, I would dis­cour­age you from using arrays. Pre­fer vec­tors instead, and if an API expects a pointer to an array, pass it a pointer to the first ele­ment of the vec­tor instead, as shown in the pre­vi­ous exam­ple. Vec­tors are safer and sim­pler to work with.

Strings

A final pair of classes worth men­tion­ing are std::string and std::wstring. C++ has no built-in string type, and so to work with strings, you have to include the string header, and use these classes. A string is sim­ply a string of char’s, single-byte char­ac­ters. A wstring is a string of wchar_t’s, or wide char­ac­ters. On Win­dows, these are 16 bits wide, and use the UTF16 encod­ing, allow­ing them to be used for uni­code strings.

These classes behave much as you would expect, so I won’t dis­cuss them fur­ther. Instead I’ll skip to a related point of con­fu­sion: C has no string type at all. Instead, char point­ers (or wchar_t point­ers) are used as prim­i­tive strings.

A C-string is sim­ply a sequence of char­ac­ters, ter­mi­nated by a null char­ac­ter ('\0'). If the null char­ac­ter is left out, all C string func­tions will just assume that the string con­tin­ues until a null char­ac­ter hap­pens to be found. This is obvi­ously extremely frag­ile and a com­mon source of bugs. but it’s an unavoid­able fact of life when inter­fac­ing with C code.

This also rears its head when work­ing with string lit­er­als. "hello world" does not have type std::string in C++. It has type const char[12], that is, an array of 12 const char­ac­ters. (Note that the string is only 11 char­ac­ters long. The com­piler auto­mat­i­cally gen­er­ates the ter­mi­nat­ing null, and sets aside space for this as well).

Arrays in C and C++ are very prim­i­tive and frag­ile things, and implic­itly decays into point­ers when needed. When­ever you have an array, you can assign it to a pointer, and the pointer will auto­mat­i­cally point to the begin­ning of the array. Because arrays are so lim­ited (a func­tion can not return an array or take an array as argu­ment either), arrays are often passed around as point­ers — and in fact, point­ers can be treated much like arrays as well. Given a pointer p, p[2] is legal, and is equiv­a­lent to *(p+2). But because it is just a pointer, the size of the array isn’t known. It is up to the pro­gram­mer to keep track of that.

Get­ting back to strings, the way arrays can decay into point­ers means that this is legal: const char* str = "hello world". The pointer str now points to the sta­t­i­cally allo­cated array of char­ac­ters “hello world”, and for all prac­ti­cal pur­poses, str is now a C-string.

To cre­ate a wide string lit­eral, the string is pre­fixed with a ‘L’, as in wchar_t* wstr = L"hello world".

Because C-style strings are used in most API’s, you often need to con­vert between this and the C++ string class. This can be done as in the following:

const char* str = "hello world";
std::string str2 = str; // an implicit conversion exists from char pointer to string. So in addition to this line, 'std::string str = "hello world" would also have worked.
const char* str3 = str2.c_str(); // the c_str() member method on the string class returns a C-style string.

Because string lit­er­als are C-style strings, there are a few pit­falls to be aware of when using them:

char* str = "hello worl";
char* str2 = str + str; // #1
str += 'd'; // #2

In line #1, we get a com­pile error. Because str is just a pointer, addi­tion is not defined, and so the com­piler chokes. A related exam­ple is in #2 where we try to add a char­ac­ter to the string. This com­piles, per­haps sur­pris­ingly, but it won’t do what you expect. Instead, the char gets con­verted to an int, and added to the value of the pointer. So the result is a pointer to 'd' char­ac­ters past the begin­ning of the string.

For these oper­a­tions to work, we must have a proper C++ string:

std::string str = "hello ";
str += "worl";
std::string str2 = str + 'd';

will work as expected, and result in the string “hello world”.

You now know all you need to know about C++ to use it with­out shoot­ing your­self in the foot too much. You also know enough to read a lot of the code snip­pets you’re likely to find online. And you’ve got a start­ing point for search­ing out more infor­ma­tion should you wish to.

In the next install­ment, we will finally get to inter­fac­ing with the Win32 API. You may want to play around a bit with the com­piler to make sure you under­stand point­ers and C-style strings in par­tic­u­lar, as we’re going to need those quite a bit. As I men­tioned in part I, the Win­dows API is a C API, and an ugly, incon­sis­tent one at that. It’s not a bad idea to make sure you’re some­what com­fort­able with the basics of the lan­guage before try­ing to grap­ple with it.


  1. “Mod­ern C++” is not just a ran­dom name. It is a style of C++ pro­gram­ming named after Alexandrescu’s book, Mod­ern C++ Design — there are fun­da­men­tally two ways to pro­gram in C++. One style is often, and some­what deri­sively, called “C with classes” — implying that it is used in much the same way one would pro­gram in C, but with the addi­tion of classes, mem­ber meth­ods and public/private access spec­i­fiers. The other, supe­rior, approach is “Mod­ern C++”. C with classes is often what begin­ners encounter, and per­haps sur­pris­ingly, what Java and C# are based upon — mean­ing that pro­gram­mers com­ing from these lan­guages tend to set­tle on an obso­lete and sub-optimal style. I often make a point of teach­ing new­com­ers “proper” mod­ern C++, but this is not the place. The goal of this series of posts is not to teach good C++ prac­tices, but sim­ply to enable .NET pro­gram­mers to talk to native API’s. 

  2. Unfor­tu­nately, there is no par­tic­u­larly good rea­son for this. this should have been a ref­er­ence. That would have made much more sense. How­ever, when this was added to the lan­guage, ref­er­ences did not yet exist, so it had to be a pointer. And later, when ref­er­ences were added, chang­ing this to a ref­er­ence would have bro­ken back­wards com­pat­i­bil­ity. 

  3. Bjarne Strous­trup, the designer of C++, once said that “Within C++, there is a much smaller and cleaner lan­guage strug­gling to get out” 

Share and Enjoy: These icons link to social book­mark­ing sites where read­ers can share and dis­cover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

Tags: , , ,

One Response to A .NET Developers Guide to C++ (part II)

  1. Boris says:

    This is a great series, your writ­ing style is very direct and read­ing this arti­cle feels like hav­ing a friend who knows me very well, giv­ing me a crash course in C++.

    My pro­fes­sion has made me use C# a whole lot, and I’ve for­got­ten most of what I had learned in C++, so this arti­cle makes for a great refresher for me.

    Thanks.

Leave a Reply

Name and Email Address are required fields. Your email will not be published or shared with third parties.