A .NET Developers Guide to C++

One of my cowork­ers is essen­tially a self-taught pro­gram­mer, but he is inter­ested in, and wants to learn, absolutely every­thing. A year or two back, he asked me to give him a crash course in C++, because he felt it was a prob­lem that when­ever he needed to do some­thing that required func­tion­al­ity not exposed by the .NET frame­work, he essen­tially hit a wall.

So we took an after­noon out to run through some basic C++ code, and while we had fun doing it, and I’m pretty sure he found it inter­est­ing, it didn’t really achieve the goal of mak­ing him com­fort­able with writ­ing small C++ pro­grams to com­mu­ni­cate with native APIs such as the Win­dows one.

After­wards, I real­ized that the rea­son for our fail­ure was that we hadn’t really made it clear what we were try­ing to achieve. He might have been inter­ested in C++ in gen­eral, but what he actu­ally needed was some­thing a bit sim­pler: Being able to call native (pri­mar­ily Win32) APIs.

Of course, the dif­fer­ence between these two is not obvi­ous. In the .NET world, the two would basi­cally have been the same thing. In learn­ing C# (or another .NET lan­guage), you also learn to inter­face with .NET APIs, and if you need to inter­face with these APIs, you have to learn a .NET language.

In the case of C++ and native APIs, the sit­u­a­tion is a bit dif­fer­ent. Learn­ing the lan­guage does not guar­an­tee pro­fi­ciency with using native APIs, and native APIs can be used with­out know­ing C++.

So this series of posts is going to be my sec­ond attempt at teach­ing a .NET devel­oper to at least be able to set up a basic native appli­ca­tion, and more impor­tantly, to call a func­tion in the Win32 API.

The fol­low­ing is not a com­pletely gen­eral intro­duc­tion to C++. If you actu­ally intend to learn and use the C++ lan­guage, there are many bet­ter texts to fol­low. I might even write my own attempt one day.

In this series of posts, I will

  • assume famil­iar­ity with pro­gram­ming in .NET or another man­aged plat­form (such as Java). You’ll prob­a­bly be able to get by as well if you’re com­ing from another high-level lan­guage such as Python or Ruby, as long as you can under­stand the basic syn­tax of the C fam­ily of languages.
  • leave out a lot of things a “ded­i­cated” C++ pro­gram­mer should know. The goal is not to turn the reader into a pro­fes­sional C++ devel­oper, but sim­ply to break down the wall and enable you to make occa­sional for­ays into native-land to call an API func­tion or two before head­ing back to your favorite language.

Before we begin

Before we get into the actual code, there are a few pecu­liar­i­ties of native lan­guages to be aware of.

Almost all native APIs are actu­ally writ­ten in C, not C++. Both lan­guages have some respon­si­bil­ity for this. Part of the rea­son is that C is the lin­gua franca of pro­gram­ming lan­guages. When your Python code has to talk to your Java code, they use a C inter­face. Vir­tu­ally every lan­guage has C wrap­pers avail­able to allow it to com­mu­ni­cate with C code. So by writ­ing your API in C, you ensure that every lan­guage can use it with­out too much trou­ble. And of course C is a very sim­ple lan­guage, so almost any lan­guage can cope with a C API. There are no classes, no higher-order func­tions or excep­tions or other pec­u­lar­i­ties of more mod­ern pro­gram­ming par­a­digms. So part of the rea­son is that C is sim­ply a good inter­me­di­ate language.

The other part of the rea­son is found in C++: C++ has no fixed ABI1. C++ func­tions com­piled by one com­piler can not be called from code com­piled by another. And when C++ com­pil­ers can’t coop­er­ate, entirely dif­fer­ent lan­guages don’t stand any chance of being able to talk to C++ code. COM objects pro­vide a par­tial solu­tion to this, but require a lot of plumb­ing to imple­ment cor­rectly. For widely used API’s, it is often sim­pler to restrict your inter­face to C code.

So the code we need to inter­face with is actu­ally C, not C++. Our own code is going to be a lim­ited sub­set of C++. If you intend to write actual appli­ca­tions in C++, you really owe it to your­self to learn the lan­guage prop­erly, but for our pur­poses, stick­ing with a smaller sub­set is simpler.

So what does it mean in prac­tice that the API is writ­ten in C? Pri­mar­ily two things:

  • No excep­tions — errors have to be reported through error codes.
  • No classes — C allows structs, con­tain­ing data, but no mem­ber func­tions, and no access spec­i­fiers. All mem­bers are public.

Now, on to how we’re going to tackle our task:

The first three install­ments in this series of posts will deal exclu­sively with native code. This first one will demon­strate a sim­ple Hello world pro­gram, and dis­cuss some fun­da­men­tals of orga­niz­ing and com­pil­ing C++ code. This isn’t exactly excit­ing stuff, but it is use­ful to under­stand, as it com­monly trips up begin­ners (and even some rea­son­ably expe­ri­enced programmers).

The sec­ond part will teach all the miss­ing piece of C++ (the ones that we’re going to need, any­way), so that you’re com­fort­able with read­ing and writ­ing sim­ple C++ programs.

In the third part we’ll get into the Win32 API, call­ing a few func­tions (of vary­ing com­plex­ity) and not least, learn­ing to read the arcane spec­i­fi­ca­tion on MSDN.

Hello World

Load up Visual Stu­dio, and cre­ate a new project. The project type should be Win32 Console Application. This brings you to the C++ Project Wiz­ard. If it looks like some­thing that belonged in Win­dows 95, that’s because that is when it was last updated. It is writ­ten in Javascript and HTML, of all things.

This wiz­ard gives you access to a cou­ple of appli­ca­tion set­tings. For now, set Application Type to Console Application, and select Empty Project under Additional Options. In par­tic­u­lar, we do not want a pre­com­piled header. It is a hack that can speed up com­pi­la­tion time in large C++ projects, but it is noth­ing more than a source for con­fu­sion in sim­ple, small projects. Nei­ther ATL or MFC head­ers should be added under Add common header files for.

Click finish, and we’re given an empty project, just like we asked for. It con­tains three “fold­ers”, named Header Files, Resource Files and Source Files. I put “fold­ers” in quotes because they aren’t. Visual Stu­dio calls them fil­ters, and they basi­cally just group files by file type, rather than actu­ally enforc­ing any par­tic­u­lar loca­tion on the file sys­tem. They’re also not par­tic­u­lar impor­tant to us so you can delete them if you like. If you add a .cpp file to the project, it is auto­mat­i­cally listed under the Source Files fil­ter, while .h files get listed under Header Files.

Now, let’s see some actual code. To begin with, let’s try a Hello World:

Cre­ate a new .cpp file in the project.

Now type the fol­low­ing into it: (We’ll get into what it means in a moment)

#include <iostream>

int main() {
   std::cout << "Hello world" << std::endl;
}

Now com­pile and run it. No big sur­prises here, it does exactly what we’d expect a “hello world” pro­gram to do. As for what the code means, let’s start with the main func­tion itself. It’s not a mem­ber of any class — in C++, non­mem­ber func­tions are allowed (and com­monly used), and main in par­tic­u­lar must be a non­mem­ber func­tion. The obser­vant reader may have noticed another curi­ous thing about it: we declare int as its return type, but don’t actu­ally have a return state­ment. This is allowed as a spe­cial case for main. Other func­tions still have to return nor­mally, but if con­trol reaches the end of the main func­tion, it implic­itly returns 02.

Inside the main func­tion, you might won­der about the <<‘s. The oper­a­tors exist in C# as well, and their built-in mean­ing is the same. For­mally, they are used for bit-shifting in both lan­guages, but C++ allows them to be over­loaded, and in par­tic­u­lar, streams define over­loaded versions.

So the << oper­a­tor “streams” data into std::cout. std::endl is a stream manip­u­la­tor which, when it is fed into a stream, pro­duces a line break, and flushes the stream. In this exam­ple, we could just have writ­ten std::cout << "hello world\n" to get the new­line with­out flush­ing the stream, and in some ways, that would actu­ally have been prefer­able. But I wanted to intro­duce endl.

A final note is the std:: pre­fix. Where C# uses a sim­ple dot for all scope res­o­lu­tion oper­a­tors, C++ defines a few dif­fer­ent ones:

  • For spec­i­fy­ing mem­bers of a name­space, or spec­i­fy­ing sta­tic mem­bers of a class, :: is used.
  • For non­sta­tic class mem­bers, . is used. Given an object o, we can access a mem­ber m with the syn­tax o.m, exactly like in C#.
  • For non­sta­tic class mem­bers accessed through a pointer to the class, -> is used. If we have a pointer p to an object, access­ing its mem­ber m looks like this instead: p->m.

So in our Hello World pro­gram, we ref­er­ence the object cout in the std name­space. We could sim­ply add a using namespace std; at the top of the pro­gram, much like we would in C#, but in C++, it is not cus­tom­ary to do so. You’ll note that the name­space actu­ally has a very short name, unlike .NET’s long names and nested name­space. Rather than System.Collections.Generic.List, for exam­ple, C++ defines std::vector. Almost the entire C++ stan­dard library exists in the std name­space. One of the main rea­sons for this struc­ture is to make it easy and con­ve­nient to access name­space mem­bers with­out hav­ing to do using namespace X.

cout stands for char­ac­ter out­put, and is the stream used for stan­dard out­put, much like the output-related mem­bers of .NET’s Console class. There is also a cin stream object respon­si­ble for input.

cout and cin are actu­ally noth­ing more than global vari­ables of the type std::ostream and std::istream respec­tively. Another out­put mech­a­nism you’re likely to see is the C func­tion printf, which is syn­tac­ti­cally closer to what you’re used to from .NET.

Given an inte­ger i we want to print out along with a mes­sage, cout and printf would be used like this:

std::cout << "You have " << i << pancakes\n";
printf("You have %d pancakes.\n", i); 

Each of these have their advan­tages and dis­ad­van­tages as you can prob­a­bly see. The nice thing about cout is that it is type-safe, and allows us to com­pose our out­put string with­out hav­ing to worry about the type of i or the num­ber of para­me­ters. We just stream what­ever we like into cout one para­me­ter at a time, and it all just works. It also works with user-defined types. They just have to define an appro­pri­ate operator <<.

The nice thing about printf on the other hand, is that the actual for­mat of the string is much more read­able, and para­me­ters are spec­i­fied sep­a­rately at the end. As you know from .NET’s string.Format func­tion, it is very con­ve­nient to be able to write the entire for­mat string in one go, and only spec­ify para­me­ters after­wards. It is a bit awk­ward that cout requires you to break up the string with <<‘s all over the place. But there are some seri­ous lim­i­ta­tions to printf as well:

  • It can not be extended. It works for the basic built-in types, and noth­ing else.
  • It requires the pro­gram­mer to spec­ify the type of the para­me­ter as part of the for­mat string. (%d spec­i­fies that the para­mater at this par­ti­tion is expected to be an inte­ger (I assume the d stands for dec­i­mal). But there is no type-checking to ver­ify that this is actu­ally the case. I can pass a float to printf, and print it out with %d, and I get garbage.)
  • The num­ber of para­me­ters to the func­tion are unknown to the com­piler. C (and C++) only have very rudi­men­tary sup­port for func­tions with vari­able argu­ments. Once you make use of this fea­ture, you lose all type infor­ma­tion and infor­ma­tion about the num­ber of para­me­ters passed to the function.

I tend to pre­fer cout for these rea­sons; it is safer, and it can be extended. But you’re likely to encounter printf in code sam­ples and should at the very least be famil­iar with it.

Finally, let’s deal with the very first line. There are four things to note about it. In order of appear­ance, they are:

  • The # at the very start of the line indi­cates that this is a pre­proces­sor direc­tive. In other words, this is eval­u­ated in a sep­a­rate pass before the com­piler starts work­ing. Mod­ern com­pil­ers don’t main­tain a strict sep­a­ra­tion between pre­pro­cess­ing and com­pi­la­tion, but as the lan­guage is spec­i­fied, the pre­proces­sor basi­cally runs over the source code per­form­ing a num­ber of sim­ple mod­i­fi­ca­tions before the com­piler is invoked.
  • include is the actual pre­proces­sor direc­tive. It spec­i­fies that we would like to include a file.
  • The file name is sur­rounded by angle brack­ets (<>). When these are used, the pre­proces­sor searches for the file to include in sys­tem direc­to­ries. If we had used dou­ble quotes (""), the pre­proces­sor would have searched for the file locally first. So slightly sim­pli­fied, use ´<>to include system headers, and””‘ to include files from the same project or solution.
  • Finally, inside the angle brack­ets, we have the name of the header file we’d like to include. In gen­eral, your own files should use a .h or .hpp suf­fix. Head­ers belong­ing to the C stan­dard library also use .h, but C++ stan­dard library head­ers have no exten­sion. (So you have iostream instead of iostream.h).

Finally, what does it mean for a file to be #include’d? It’s not quite the same thing as the using state­ments you put at the top of a file in C#. Those using state­ments are func­tion­ally sim­i­lar to the using namespace state­ment men­tioned ear­lier — they allow us to ref­er­ence types defined in other name­spaces as if they were mem­bers of the cur­rent name­space. If we do not have the using state­ment, we have to spec­ify the full name­space pre­fix when using the type (System.Collections.Generic.List<T> instead of sim­ply List<T>), but the types are still avail­able. I can ref­er­ence System.Collections.Generic.List<T> in C# with­out any using state­ments. Like­wise, I can ref­er­ence std::cout as I did in the pre­vi­ous exam­ple with­out hav­ing a using namespace std.

But with­out the #include, the com­piler would not have been aware of cout at all.

An #include is in a sense very sim­ple. All that actu­ally hap­pens is a copy/paste oper­a­tion. The pre­proces­sor locates the file iostream, and copies its con­tents into our file at the loca­tion of the #include. The effect of this is to give us access to any­thing defined in the file. In .NET this is all taken care of by magic. Any­thing in the cur­rent assem­bly is auto­mat­i­cally vis­i­ble, and any­thing that isn’t declared internal in other assem­blies is vis­i­ble as soon as we add a ref­er­ence to it.

In C++, no such mech­a­nism exists. What the com­piler sees is just the cur­rent file. Other files, even in the same project, are not vis­i­ble when the cur­rent file is being com­piled. The com­pi­la­tion model is noto­ri­ously quirky, and prob­a­bly deserves some explanation.

The pre­proces­sor and the C/C++ com­pi­la­tion model

C++ code is com­piled in a cou­ple of stages. I already men­tioned the pre­proces­sor. In the old days, this was a sep­a­rate pro­gram, which was run on the source code first, per­fom­ing sim­ple text manip­u­la­tion (search/replace, and con­di­tion­ally remov­ing chunks of code). The out­put of this was then fed to the com­piler. Finally, the out­put of the com­piler is fed to a linker, which we’ll get to later. Today, the pre­proces­sor is built into the com­piler, but it is still a sep­a­rate pass made over the code before the actual com­pi­la­tion begins.

Let’s wrap up the pre­proces­sor quickly though. It can do a few other things that we’ll prob­a­bly run into soon enough. In par­tic­u­lar, #define has a few uses. It cre­ates a macro — when­ever the name of this macro is encoun­tered, it is replaced with the macro definition.

So in the following:

#define waffles pancakes
std::cout << "I like " << waffles();

we cre­ate a macro named waffles, and from that point onwards, any occurence of waffles is swapped for pancakes. Which means that the func­tion that actu­ally gets called in line two is pancakes(), rather than waffles() — high­light­ing another impor­tant aspect of the pre­proces­sor. Because it is run before com­pi­la­tion, it has no notion of actual lan­guage syn­tax. It doesn’t care about the con­text of the text it is replac­ing. It doesn’t care that this is a func­tion call, just like it wouldn’t care if the named had been found in a dif­fer­ent name­space than the one the macro was defined in. It doesn’t respect scop­ing rules or any­thing else. It won’t swap out the mid­dle of words, or the con­tents of string lit­er­als (so ilikewaffles() would go untouched, as would "waffles", but that’s about it. Any­thing else gets bru­tally replaced by the preprocessor.

Another com­mon exam­ple of its sim­plic­ity is the following:

#define four 2+2
int i = four * four;

The result of this? It is 8. The pre­proces­sor just per­forms sim­ple text sub­sti­tu­tion, result­ing in this code: int i = 2+2 * 2+2, which of course gets eval­u­ated as int i = 2 + (2*2) + 2.

We can also use the pre­proces­sor to per­form con­di­tional com­pi­la­tion remov­ing sec­tions of code at compile-time:

#define waffles
#ifdef waffles // #if defined(waffles) would also have been legal
// this will get compiled
#else
// this will get removed by the preprocessor
#endif

A vari­a­tion on this is used in almost every header file, but we’ll get to that soon enough.

The com­piler processes what is tech­ni­cally known as trans­la­tion units. A trans­la­tion unit is a sin­gle source file (typ­i­cally .cpp or .cc for C++, or .c for C code), after pre­pro­cess­ing. So in our Hello World pro­gram, we have one trans­la­tion unit, con­sist­ing of the con­tents of the header file iostream, fol­lowed by our main func­tion. The result of com­pi­la­tion is not a pro­gram, but rather an object file (Visual Stu­dio uses the exten­sion .obj for these — GCC uses .o). An object file con­tains all the com­piled code for this file, but with cer­tain place­holder “gaps”. This is nec­es­sary as code files will typ­i­cally depend on func­tions or vari­ables defined in other trans­la­tion units. We are able to tell the com­piler that a func­tion defined in another trans­la­tion unit exists, but it won’t be able to see the actual defin­tion of the func­tion, so it has to gen­er­ate a kind of place­holder, say­ing “call the func­tion with this name, as soon as we find out where that func­tion is”. That is essen­tially the role of object files. Store the com­piled code, along with the nec­es­sary infor­ma­tion about which sym­bols this file defines, and which sym­bols it depends upon, and which must be found in other files for the pro­gram to be complete.

When all the object files are cre­ated, they are passed to the linker, which per­forms the final steps — read­ing all the object files, locat­ing all these place­hold­ers, and fill­ing them in. If some code in object file A calls a func­tion f defined in another file B, the linker must read both files A and B, deter­mine the address of the func­tion f, and insert it into the func­tion call inside A.

If the linker finds mul­ti­ple con­flict­ing def­i­n­i­tions of f (per­haps object file C also defined a func­tion with the same sig­na­ture), it is of course an error. Like­wise, if it is unable to locate the full def­i­n­i­tion of a sym­bol ref­er­enced from a file, we get an error. Because the linker does not have access to the actual source code, but only the object files, linker errors are noto­ri­ously hard to under­stand, but it can be done. The fol­low­ing sim­ple code causes a linker error: (we’re going to run with this exam­ple for a while, so feel free to add it to a new project, or over­write the pre­vi­ous file. This code should be the only con­tents of the project)

class myclass {
public:
    int f(float fl);
};

int main(){
    myclass c;
    c.f(1.0f);
}

The code should be straight­for­ward enough. We declare a class with a mem­ber func­tion f. In the main func­tion we cre­ate an instance of our class, and call the f func­tion. There is just one prob­lem: the func­tion is declared, but it has not been defined. In other words, the com­piler knows it exists (so we don’t get a com­piler errror when we try to call it, as we would if we called a com­pletely unknown func­tion), but because it does not have the func­tion body, it has to assume that the full def­i­n­i­tion is… else­where. So the com­piler lets this pass, hop­ing that the linker can sort things out.

But the the linker is given only this one trans­la­tion unit. So it is unable to find a def­i­n­i­tion for the func­tion f, so it spits the fol­low­ing error at us:

error LNK2019: unre­solved exter­nal sym­bol "public: int __thiscall myclass::f(float)" (?f@myclass@@QAEHM@Z) ref­er­enced in func­tion _main

Ouch. Again, the linker doesn’t have access to the source code, so this is about the best it can do. It tells us that the prob­lem is an “unre­solved exter­nal sym­bol”, or in other words, it was unable to resolve a sym­bol that one of our trans­la­tion units expected to be “exter­nal” (defined in another trans­la­tion unit). As for the sym­bol itself? All it actu­ally sees is the man­gled string near the end: ?f@myclass@@QAEHM@Z. This is the name for the func­tion gen­er­ated by the com­piler and stored in the object file, and I have no clue what the @‘s or the let­ters fol­low­ing it mean. They some­how encode infor­ma­tion about para­me­ters and return type, but that’s about all I can say. Luck­ily, the linker is able to decode this name, which it also does for us. It tells us that the func­tion has pub­lic vis­i­b­lity, and its return type is int. __thiscall is the call­ing con­ven­tion used for mem­ber meth­ods. (It is essen­tially a call­ing con­ven­tion that allows for a this para­me­ter, hence the name). The call­ing con­ven­tion isn’t usu­ally impor­tant here though. Next, we can see that the unre­solved sym­bol is a mem­ber of the class myclass, the func­tion is named f, and it takes a float as its para­me­ter. Finally, it tells us that the sym­bol was ref­er­enced from the _main func­tion (again, we can’t always trust the com­piler to pre­serve the pre­cise names, but it’s prob­a­bly a safe bet to assume that when it says _main, it means main.

So the error is actu­ally pretty straight­for­ward once you fil­ter out the noise. A lot of C++ pro­gram­mers don’t real­ize this, and go into a panic when­ever they encounter a linker error, which is why I wanted to demon­strate this one. They typ­i­cally con­tain a lot of noise (espe­cially in more com­pli­cated cases), but they can be deci­phered if you elim­i­nate all the @@ non­sense and read the remain­ing text slowly and carefully.

The other rea­son why I wanted to demon­strate this is that it is key to why header files are used. Based on the above exam­ple, we now know that the com­piler can be tricked into accept­ing a call to a func­tion it has no knowl­edge of, as long as it can see a valid dec­la­ra­tion. (a func­tion dec­la­ra­tion is essen­tially just the sig­na­ture (includ­ing return type), fol­lowed by a semi­colon, much like an inter­face method in C#.

So per­haps we should get cre­ative and see if we can make the linker happy too. First, we cre­ate sec­ond .cpp file with the fol­low­ing contents:

class myclass {
public:
    int f(float fl);
};

There’s still no def­i­n­i­tions of f, but we’re tak­ing it a step at a time. Now, though, we have two files con­tain­ing the same def­i­n­i­tion of myclass. Of course, the com­piler only sees one file at a time, so it won’t notice this, but what will the linker say? Won’t it com­plain about mul­ti­ple def­i­n­i­tions of the same sym­bol? Try com­pil­ing the project and find out.

As it turns out, we get exactly the same error as before. But we don’t get any com­plaints about the mul­ti­ple def­i­n­i­tions of the same class. This is actu­ally allowed. We are allowed to cre­ate as many def­i­n­i­tions of the same sym­bol as we like, as long as there is only one in each trans­la­tion unit (the com­piler will choke on it if you try to define a class you’ve already defined), and all the def­i­n­i­tions are exactly iden­ti­cal. (The linker will typ­i­cally not enforce the last require­ment though. If the def­i­n­i­tions are not iden­ti­cal, it typ­i­cally man­i­fests as weird crashes at runtime)

This is called the One Def­i­n­i­tion Rule (ODR). Only one def­i­n­i­tion may exist. That def­i­n­i­tion may occur in mul­ti­ple places, but it must be iden­ti­cal, it must be the same def­i­n­i­tion, every time it is encountered.

So it seems like we have a prob­lem, doesn’t it? We’re allowed to dupli­cate the class def­i­n­i­tion, but we’re not allowed to mod­ify it! So how are we sup­posed to add the def­i­n­i­tion of f?

Try chang­ing your sec­ond file (the one with­out the main func­tion) to the following:

class myclass {
public:
    int f(float fl);
};

int myclass::f(float fl){
    return 42;
}

and com­pile it. Voila! It works. We didn’t mod­ify the actual class def­i­n­i­tion, so we obeyed the ODR rule. Instead, we added the func­tion def­i­n­i­tion after­wards, out­side the actual class def­i­n­i­tion. And both the com­piler and linker are happy. The linker now sees two iden­ti­cal def­i­n­i­tions of the class myclass, but that’s allowed under the ODR rule. It also sees a call to the func­tion myclass::f, and a sin­gle def­i­n­i­tion of the same func­tion, so it is able to glue every­thing together into one sin­gle program.

Of course, hav­ing to copy/paste, and main­tain dupli­cate code in every .cpp file is hardly ideal. Sooner or later, we’re going to mod­ify myclass in one file, and for­get to do the same mod­i­fi­ca­tions in all the other files. That will break the ODR rule, and every­thing will crash horribly.

That is where header files come in. We could put the shared code in a sep­a­rate file, and use the #include direc­tive men­tioned ear­lier to auto­mat­i­cally copy/paste the con­tents in! Let’s try that now. Cre­ate a new file (with the .h or .hpp exten­sion), and place the class def­i­n­i­tion in that. Now remove the class def­i­n­i­tion from the two .cpp files we already had, and replace it with a #include ref­er­enc­ing the header.

That is, your pro­jcet should con­tain the fol­low­ing three files: (I’m going to name the .cpp files main.cpp and myclass.cpp for convenience:

// myclass.h
class myclass {
public:
  int f(float fl);
};

// myclass.cpp 
#include "myclass.h" // note we use quotes, not angle brackets here
int myclass::f(float fl){
  return 42;
}

// main.cpp
#include "myclass.h"
int main(){
  myclass c;
  c.f(1.0f);
}

And it seems to work. Clever. There is one lit­tle prob­lem though. What hap­pens if we include our header mul­ti­ple times? We prob­a­bly won’t inten­tion­ally do this, but per­haps we’re going to include it, and then include another header, which also includes it. We can eas­ily get out into a sit­u­a­tion where some head­ers get included many times. Think of stan­dard head­ers like iostream. We’re going to end up includ­ing it fairly often. Sooner or later, we’ll end up includ­ing some of our head­ers twice, which breaks the ODR rule! We’re not allowed to have mul­ti­ple def­i­n­i­tions in the same trans­la­tion unit. To test the prob­lem, feel free to dupli­cate the #include state­ment and ver­ify that the com­piler chokes on it.

So to solve this prob­lem include guards are used. Mod­ify your header as follows:

#ifndef MYCLASS_H
#define MYCLASS_H

class myclass {
public:
  int f(float fl);
};

#endif

There should be noth­ing new in this, but the con­se­quence might be sur­pris­ing. First, we ask the pre­proces­sor to check if the macro MYCLASS_H is defined, and only eval­u­ate the fol­low­ing if it is not defined (the direc­tive is named ifndef, or if not defined).

If we enter the if state­ment, the first thing we do is define the sym­bol MYCLASS_H, and then we eval­u­ate the orig­i­nal con­tents of the header. Finally, we end the if-statement with an #endif. So what hap­pens if the file gets included twice now?

For sim­plic­ity, assume the fol­low­ing .cpp file, con­tain­ing noth­ing except two includes:

#include "myclass.h"
#include "myclass.h"

As the pre­proces­sor parses this, it’ll expand both #include’s, result­ing in this:

#ifndef MYCLASS_H // At this point, the macro MYCLASS_H is not defined, so we enter the following block:
#define MYCLASS_H // define the macro MYCLASS_H

class myclass { // allow this code to stay in the translation unit
public:
  int f(float fl);
};

#endif // end the if statement
#ifndef MYCLASS_H // now MYCLASS_H *is* defined, the condition is not true, and so we *skip* the if statement.
//#define MYCLASS_H // of course the preprocessor doesn't actually comment out the code, it simply removes it from the translation unit. I'm commenting it to illustrate what happens
//
//class myclass { // this time, the preprocessor *removes* all this code, because it is inside a #if statement we're skipping
//public:
//  int f(float fl);
//};
//
#endif

so after the pre­proces­sor has run, only this code actu­ally gets inserted in our trans­la­tion unit:

class myclass {
public:
  int f(float fl);
};

So it seems we’re able to han­dle mul­ti­ple inclu­sions of the same header now.

So to review, we’re now able to split our code across mul­ti­ple source files, and do it cor­rectly. We don’t need to dupli­cate any code — all the shared code can be placed in header files, and include guards pro­tect against acci­den­tally includ­ing the same file twice in the same com­pi­la­tion unit.

And now you should finally under­stand what it meant when we included iostream in the orig­i­nal Hello World exam­ple. We’re sim­ply past­ing in a lot of sys­tem code, con­tain­ing dec­la­ra­tions that get linked together with the stan­dard library con­tain­ing the full definitions.

This turned out a lot longer than I’d orig­i­nally intended (I had orig­i­nally, and naïvely, planned to write the entire series in one post), so let’s call it a day here. Part two will be posted very soon, and cover some actual C++, now that we’ve got the fun­da­men­tals out of the way. You needed to under­stand how C++ code is com­piled before you’re able to write any­thing use­ful in the language.


  1. Appli­ca­tion Binary Inter­face — a com­mon ABI is required for two func­tions to be able to call each oth­ers. The ABI defines the mem­ory lay­out of structs or classes, as well as call­ing con­ven­tions and basi­cally every­thing you need to be able to call a func­tion. Where should the return value go, where should para­me­ters be placed, and so on. C defines a fixed ABI, which makes it easy to inter­face with. 

  2. The story goes that Bjarne Strous­trup, the lan­guage designer, didn’t want to cre­ate a lan­guage where a sim­ple hello world required mul­ti­ple lines of code in the main func­tion. Hence the spe­cial rule that main doesn’t have to have an explicit return state­ment. 

Share and Enjoy: These icons link to social book­mark­ing sites where read­ers can share and dis­cover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

Tags: , , ,

2 Responses to A .NET Developers Guide to C++

  1. sheepsimulator says:

    I enjoyed this arti­cle, it does a good job at explain­ing a lot of begin­ning ques­tions in C++.

    Your writ­ing style is a bit repet­i­tive, but it’s extremely thor­ough. It reminds me a bit of the Ger­man the­olo­gian Mar­tin Chem­nitz. Mind you, that’s a com­ple­ment com­ing from me, I have most of his works avail­able in Eng­lish trans­la­tion. You hit all the minu­tiae, which is great.

    I look for­ward to read­ing more, ~sheepsimulator

  2. jalf says:

    Thanks for the feed­back. About the writ­ing, I felt that I had to be thor­ough to avoid mis­un­der­stand­ings, since this post deals with some real fun­da­men­tals that you have to get right if they’re to be use­ful at all. I know that also made the post extremely ver­bose and a bit repet­i­tive. I’ll try to get it cleaned up a bit one of these days though.

    But I’ve been sit­ting on it for a week or two now, and haven’t really had time to go through it prop­erly yet, so I thought I might as well post it as is, rather than keep wait­ing for free time to pol­ish it further. ;)

    Glad you liked it though.

Leave a Reply

Name and Email Address are required fields. Your email will not be published or shared with third parties.