One of my coworkers is essentially a self-taught programmer, but he is interested in, and wants to learn, absolutely everything. A year or two back, he asked me to give him a crash course in C++, because he felt it was a problem that whenever he needed to do something that required functionality not exposed by the .NET framework, he essentially hit a wall.
So we took an afternoon out to run through some basic C++ code, and while we had fun doing it, and I’m pretty sure he found it interesting, it didn’t really achieve the goal of making him comfortable with writing small C++ programs to communicate with native APIs such as the Windows one.
Afterwards, I realized that the reason for our failure was that we hadn’t really made it clear what we were trying to achieve. He might have been interested in C++ in general, but what he actually needed was something a bit simpler: Being able to call native (primarily Win32) APIs.
Of course, the difference between these two is not obvious. In the .NET world, the two would basically have been the same thing. In learning C# (or another .NET language), you also learn to interface with .NET APIs, and if you need to interface with these APIs, you have to learn a .NET language.
In the case of C++ and native APIs, the situation is a bit different. Learning the language does not guarantee proficiency with using native APIs, and native APIs can be used without knowing C++.
So this series of posts is going to be my second attempt at teaching a .NET developer to at least be able to set up a basic native application, and more importantly, to call a function in the Win32 API.
The following is not a completely general introduction to C++. If you actually intend to learn and use the C++ language, there are many better texts to follow. I might even write my own attempt one day.
In this series of posts, I will
- assume familiarity with programming in .NET or another managed platform (such as Java). You’ll probably be able to get by as well if you’re coming from another high-level language such as Python or Ruby, as long as you can understand the basic syntax of the C family of languages.
- leave out a lot of things a “dedicated” C++ programmer should know. The goal is not to turn the reader into a professional C++ developer, but simply to break down the wall and enable you to make occasional forays into native-land to call an API function or two before heading back to your favorite language.
Before we begin
Before we get into the actual code, there are a few peculiarities of native languages to be aware of.
Almost all native APIs are actually written in C, not C++. Both languages have some responsibility for this. Part of the reason is that C is the lingua franca of programming languages. When your Python code has to talk to your Java code, they use a C interface. Virtually every language has C wrappers available to allow it to communicate with C code. So by writing your API in C, you ensure that every language can use it without too much trouble. And of course C is a very simple language, so almost any language can cope with a C API. There are no classes, no higher-order functions or exceptions or other pecularities of more modern programming paradigms. So part of the reason is that C is simply a good intermediate language.
The other part of the reason is found in C++: C++ has no fixed ABI1. C++ functions compiled by one compiler can not be called from code compiled by another. And when C++ compilers can’t cooperate, entirely different languages don’t stand any chance of being able to talk to C++ code. COM objects provide a partial solution to this, but require a lot of plumbing to implement correctly. For widely used API’s, it is often simpler to restrict your interface to C code.
So the code we need to interface with is actually C, not C++. Our own code is going to be a limited subset of C++. If you intend to write actual applications in C++, you really owe it to yourself to learn the language properly, but for our purposes, sticking with a smaller subset is simpler.
So what does it mean in practice that the API is written in C? Primarily two things:
- No exceptions — errors have to be reported through error codes.
- No classes — C allows structs, containing data, but no member functions, and no access specifiers. All members are public.
Now, on to how we’re going to tackle our task:
The first three installments in this series of posts will deal exclusively with native code. This first one will demonstrate a simple Hello world program, and discuss some fundamentals of organizing and compiling C++ code. This isn’t exactly exciting stuff, but it is useful to understand, as it commonly trips up beginners (and even some reasonably experienced programmers).
The second part will teach all the missing piece of C++ (the ones that we’re going to need, anyway), so that you’re comfortable with reading and writing simple C++ programs.
In the third part we’ll get into the Win32 API, calling a few functions (of varying complexity) and not least, learning to read the arcane specification on MSDN.
Hello World
Load up Visual Studio, and create a new project. The project type should be Win32 Console Application. This brings you to the C++ Project Wizard. If it looks like something that belonged in Windows 95, that’s because that is when it was last updated. It is written in Javascript and HTML, of all things.
This wizard gives you access to a couple of application settings. For now, set Application Type to Console Application, and select Empty Project under Additional Options. In particular, we do not want a precompiled header. It is a hack that can speed up compilation time in large C++ projects, but it is nothing more than a source for confusion in simple, small projects. Neither ATL or MFC headers should be added under Add common header files for.
Click finish, and we’re given an empty project, just like we asked for. It contains three “folders”, named Header Files, Resource Files and Source Files. I put “folders” in quotes because they aren’t. Visual Studio calls them filters, and they basically just group files by file type, rather than actually enforcing any particular location on the file system. They’re also not particular important to us so you can delete them if you like. If you add a .cpp file to the project, it is automatically listed under the Source Files filter, while .h files get listed under Header Files.
Now, let’s see some actual code. To begin with, let’s try a Hello World:
Create a new .cpp file in the project.
Now type the following into it: (We’ll get into what it means in a moment)
#include <iostream>
int main() {
std::cout << "Hello world" << std::endl;
}
Now compile and run it. No big surprises here, it does exactly what we’d expect a “hello world” program to do.
As for what the code means, let’s start with the main function itself. It’s not a member of any class — in C++, nonmember functions are allowed (and commonly used), and main in particular must be a nonmember function. The observant reader may have noticed another curious thing about it: we declare int as its return type, but don’t actually have a return statement. This is allowed as a special case for main. Other functions still have to return normally, but if control reaches the end of the main function, it implicitly returns 02.
Inside the main function, you might wonder about the <<‘s. The operators exist in C# as well, and their built-in meaning is the same. Formally, they are used for bit-shifting in both languages, but C++ allows them to be overloaded, and in particular, streams define overloaded versions.
So the << operator “streams” data into std::cout. std::endl is a stream manipulator which, when it is fed into a stream, produces a line break, and flushes the stream. In this example, we could just have written std::cout << "hello world\n" to get the newline without flushing the stream, and in some ways, that would actually have been preferable. But I wanted to introduce endl.
A final note is the std:: prefix. Where C# uses a simple dot for all scope resolution operators, C++ defines a few different ones:
- For specifying members of a namespace, or specifying static members of a class,
::is used. - For nonstatic class members,
.is used. Given an objecto, we can access a membermwith the syntaxo.m, exactly like in C#. - For nonstatic class members accessed through a pointer to the class,
->is used. If we have a pointerpto an object, accessing its member m looks like this instead:p->m.
So in our Hello World program, we reference the object cout in the std namespace.
We could simply add a using namespace std; at the top of the program, much like we would in C#, but in C++, it is not customary to do so. You’ll note that the namespace actually has a very short name, unlike .NET’s long names and nested namespace. Rather than System.Collections.Generic.List, for example, C++ defines std::vector. Almost the entire C++ standard library exists in the std namespace. One of the main reasons for this structure is to make it easy and convenient to access namespace members without having to do using namespace X.
cout stands for character output, and is the stream used for standard output, much like the output-related members of .NET’s Console class. There is also a cin stream object responsible for input.
cout and cin are actually nothing more than global variables of the type std::ostream and std::istream respectively.
Another output mechanism you’re likely to see is the C function printf, which is syntactically closer to what you’re used to from .NET.
Given an integer i we want to print out along with a message, cout and printf would be used like this:
std::cout << "You have " << i << pancakes\n";
printf("You have %d pancakes.\n", i);
Each of these have their advantages and disadvantages as you can probably see. The nice thing about cout is that it is type-safe, and allows us to compose our output string without having to worry about the type of i or the number of parameters. We just stream whatever we like into cout one parameter at a time, and it all just works. It also works with user-defined types. They just have to define an appropriate operator <<.
The nice thing about printf on the other hand, is that the actual format of the string is much more readable, and parameters are specified separately at the end. As you know from .NET’s string.Format function, it is very convenient to be able to write the entire format string in one go, and only specify parameters afterwards. It is a bit awkward that cout requires you to break up the string with <<‘s all over the place. But there are some serious limitations to printf as well:
- It can not be extended. It works for the basic built-in types, and nothing else.
- It requires the programmer to specify the type of the parameter as part of the format string. (
%dspecifies that the paramater at this partition is expected to be an integer (I assume thedstands for decimal). But there is no type-checking to verify that this is actually the case. I can pass a float toprintf, and print it out with%d, and I get garbage.) - The number of parameters to the function are unknown to the compiler. C (and C++) only have very rudimentary support for functions with variable arguments. Once you make use of this feature, you lose all type information and information about the number of parameters passed to the function.
I tend to prefer cout for these reasons; it is safer, and it can be extended. But you’re likely to encounter printf in code samples and should at the very least be familiar with it.
Finally, let’s deal with the very first line. There are four things to note about it. In order of appearance, they are:
- The
#at the very start of the line indicates that this is a preprocessor directive. In other words, this is evaluated in a separate pass before the compiler starts working. Modern compilers don’t maintain a strict separation between preprocessing and compilation, but as the language is specified, the preprocessor basically runs over the source code performing a number of simple modifications before the compiler is invoked. includeis the actual preprocessor directive. It specifies that we would like to include a file.- The file name is surrounded by angle brackets (
<>). When these are used, the preprocessor searches for the file to include in system directories. If we had used double quotes (""), the preprocessor would have searched for the file locally first. So slightly simplified, use ´<>to include system headers, and””‘ to include files from the same project or solution. - Finally, inside the angle brackets, we have the name of the header file we’d like to include. In general, your own files should use a
.hor.hppsuffix. Headers belonging to the C standard library also use.h, but C++ standard library headers have no extension. (So you haveiostreaminstead ofiostream.h).
Finally, what does it mean for a file to be #include’d? It’s not quite the same thing as the using statements you put at the top of a file in C#. Those using statements are functionally similar to the using namespace statement mentioned earlier — they allow us to reference types defined in other namespaces as if they were members of the current namespace. If we do not have the using statement, we have to specify the full namespace prefix when using the type (System.Collections.Generic.List<T> instead of simply List<T>), but the types are still available. I can reference System.Collections.Generic.List<T> in C# without any using statements. Likewise, I can reference std::cout as I did in the previous example without having a using namespace std.
But without the #include, the compiler would not have been aware of cout at all.
An #include is in a sense very simple. All that actually happens is a copy/paste operation. The preprocessor locates the file iostream, and copies its contents into our file at the location of the #include. The effect of this is to give us access to anything defined in the file. In .NET this is all taken care of by magic. Anything in the current assembly is automatically visible, and anything that isn’t declared internal in other assemblies is visible as soon as we add a reference to it.
In C++, no such mechanism exists. What the compiler sees is just the current file. Other files, even in the same project, are not visible when the current file is being compiled. The compilation model is notoriously quirky, and probably deserves some explanation.
The preprocessor and the C/C++ compilation model
C++ code is compiled in a couple of stages. I already mentioned the preprocessor. In the old days, this was a separate program, which was run on the source code first, perfoming simple text manipulation (search/replace, and conditionally removing chunks of code). The output of this was then fed to the compiler. Finally, the output of the compiler is fed to a linker, which we’ll get to later. Today, the preprocessor is built into the compiler, but it is still a separate pass made over the code before the actual compilation begins.
Let’s wrap up the preprocessor quickly though. It can do a few other things that we’ll probably run into soon enough. In particular, #define has a few uses. It creates a macro — whenever the name of this macro is encountered, it is replaced with the macro definition.
So in the following:
#define waffles pancakes
std::cout << "I like " << waffles();
we create a macro named waffles, and from that point onwards, any occurence of waffles is swapped for pancakes. Which means that the function that actually gets called in line two is pancakes(), rather than waffles() — highlighting another important aspect of the preprocessor. Because it is run before compilation, it has no notion of actual language syntax. It doesn’t care about the context of the text it is replacing. It doesn’t care that this is a function call, just like it wouldn’t care if the named had been found in a different namespace than the one the macro was defined in. It doesn’t respect scoping rules or anything else. It won’t swap out the middle of words, or the contents of string literals (so ilikewaffles() would go untouched, as would "waffles", but that’s about it. Anything else gets brutally replaced by the preprocessor.
Another common example of its simplicity is the following:
#define four 2+2
int i = four * four;
The result of this? It is 8. The preprocessor just performs simple text substitution, resulting in this code: int i = 2+2 * 2+2, which of course gets evaluated as int i = 2 + (2*2) + 2.
We can also use the preprocessor to perform conditional compilation removing sections of code at compile-time:
#define waffles
#ifdef waffles // #if defined(waffles) would also have been legal
// this will get compiled
#else
// this will get removed by the preprocessor
#endif
A variation on this is used in almost every header file, but we’ll get to that soon enough.
The compiler processes what is technically known as translation units. A translation unit is a single source file (typically .cpp or .cc for C++, or .c for C code), after preprocessing. So in our Hello World program, we have one translation unit, consisting of the contents of the header file iostream, followed by our main function. The result of compilation is not a program, but rather an object file (Visual Studio uses the extension .obj for these — GCC uses .o). An object file contains all the compiled code for this file, but with certain placeholder “gaps”. This is necessary as code files will typically depend on functions or variables defined in other translation units. We are able to tell the compiler that a function defined in another translation unit exists, but it won’t be able to see the actual defintion of the function, so it has to generate a kind of placeholder, saying “call the function with this name, as soon as we find out where that function is”. That is essentially the role of object files. Store the compiled code, along with the necessary information about which symbols this file defines, and which symbols it depends upon, and which must be found in other files for the program to be complete.
When all the object files are created, they are passed to the linker, which performs the final steps — reading all the object files, locating all these placeholders, and filling them in. If some code in object file A calls a function f defined in another file B, the linker must read both files A and B, determine the address of the function f, and insert it into the function call inside A.
If the linker finds multiple conflicting definitions of f (perhaps object file C also defined a function with the same signature), it is of course an error. Likewise, if it is unable to locate the full definition of a symbol referenced from a file, we get an error. Because the linker does not have access to the actual source code, but only the object files, linker errors are notoriously hard to understand, but it can be done. The following simple code causes a linker error: (we’re going to run with this example for a while, so feel free to add it to a new project, or overwrite the previous file. This code should be the only contents of the project)
class myclass {
public:
int f(float fl);
};
int main(){
myclass c;
c.f(1.0f);
}
The code should be straightforward enough. We declare a class with a member function f. In the main function we create an instance of our class, and call the f function. There is just one problem: the function is declared, but it has not been defined. In other words, the compiler knows it exists (so we don’t get a compiler errror when we try to call it, as we would if we called a completely unknown function), but because it does not have the function body, it has to assume that the full definition is… elsewhere. So the compiler lets this pass, hoping that the linker can sort things out.
But the the linker is given only this one translation unit. So it is unable to find a definition for the function f, so it spits the following error at us:
error LNK2019: unresolved external symbol
"public: int __thiscall myclass::f(float)" (?f@myclass@@QAEHM@Z)referenced in function_main
Ouch. Again, the linker doesn’t have access to the source code, so this is about the best it can do. It tells us that the problem is an “unresolved external symbol”, or in other words, it was unable to resolve a symbol that one of our translation units expected to be “external” (defined in another translation unit). As for the symbol itself? All it actually sees is the mangled string near the end: ?f@myclass@@QAEHM@Z. This is the name for the function generated by the compiler and stored in the object file, and I have no clue what the @‘s or the letters following it mean. They somehow encode information about parameters and return type, but that’s about all I can say. Luckily, the linker is able to decode this name, which it also does for us. It tells us that the function has public visiblity, and its return type is int. __thiscall is the calling convention used for member methods. (It is essentially a calling convention that allows for a this parameter, hence the name). The calling convention isn’t usually important here though. Next, we can see that the unresolved symbol is a member of the class myclass, the function is named f, and it takes a float as its parameter. Finally, it tells us that the symbol was referenced from the _main function (again, we can’t always trust the compiler to preserve the precise names, but it’s probably a safe bet to assume that when it says _main, it means main.
So the error is actually pretty straightforward once you filter out the noise. A lot of C++ programmers don’t realize this, and go into a panic whenever they encounter a linker error, which is why I wanted to demonstrate this one. They typically contain a lot of noise (especially in more complicated cases), but they can be deciphered if you eliminate all the @@ nonsense and read the remaining text slowly and carefully.
The other reason why I wanted to demonstrate this is that it is key to why header files are used. Based on the above example, we now know that the compiler can be tricked into accepting a call to a function it has no knowledge of, as long as it can see a valid declaration. (a function declaration is essentially just the signature (including return type), followed by a semicolon, much like an interface method in C#.
So perhaps we should get creative and see if we can make the linker happy too. First, we create second .cpp file with the following contents:
class myclass {
public:
int f(float fl);
};
There’s still no definitions of f, but we’re taking it a step at a time. Now, though, we have two files containing the same definition of myclass. Of course, the compiler only sees one file at a time, so it won’t notice this, but what will the linker say? Won’t it complain about multiple definitions of the same symbol? Try compiling the project and find out.
As it turns out, we get exactly the same error as before. But we don’t get any complaints about the multiple definitions of the same class. This is actually allowed. We are allowed to create as many definitions of the same symbol as we like, as long as there is only one in each translation unit (the compiler will choke on it if you try to define a class you’ve already defined), and all the definitions are exactly identical. (The linker will typically not enforce the last requirement though. If the definitions are not identical, it typically manifests as weird crashes at runtime)
This is called the One Definition Rule (ODR). Only one definition may exist. That definition may occur in multiple places, but it must be identical, it must be the same definition, every time it is encountered.
So it seems like we have a problem, doesn’t it? We’re allowed to duplicate the class definition, but we’re not allowed to modify it! So how are we supposed to add the definition of f?
Try changing your second file (the one without the main function) to the following:
class myclass {
public:
int f(float fl);
};
int myclass::f(float fl){
return 42;
}
and compile it. Voila! It works. We didn’t modify the actual class definition, so we obeyed the ODR rule. Instead, we added the function definition afterwards, outside the actual class definition. And both the compiler and linker are happy. The linker now sees two identical definitions of the class myclass, but that’s allowed under the ODR rule. It also sees a call to the function myclass::f, and a single definition of the same function, so it is able to glue everything together into one single program.
Of course, having to copy/paste, and maintain duplicate code in every .cpp file is hardly ideal. Sooner or later, we’re going to modify myclass in one file, and forget to do the same modifications in all the other files. That will break the ODR rule, and everything will crash horribly.
That is where header files come in. We could put the shared code in a separate file, and use the #include directive mentioned earlier to automatically copy/paste the contents in! Let’s try that now. Create a new file (with the .h or .hpp extension), and place the class definition in that. Now remove the class definition from the two .cpp files we already had, and replace it with a #include referencing the header.
That is, your projcet should contain the following three files: (I’m going to name the .cpp files main.cpp and myclass.cpp for convenience:
// myclass.h
class myclass {
public:
int f(float fl);
};
// myclass.cpp
#include "myclass.h" // note we use quotes, not angle brackets here
int myclass::f(float fl){
return 42;
}
// main.cpp
#include "myclass.h"
int main(){
myclass c;
c.f(1.0f);
}
And it seems to work. Clever.
There is one little problem though. What happens if we include our header multiple times? We probably won’t intentionally do this, but perhaps we’re going to include it, and then include another header, which also includes it. We can easily get out into a situation where some headers get included many times. Think of standard headers like iostream. We’re going to end up including it fairly often. Sooner or later, we’ll end up including some of our headers twice, which breaks the ODR rule! We’re not allowed to have multiple definitions in the same translation unit. To test the problem, feel free to duplicate the #include statement and verify that the compiler chokes on it.
So to solve this problem include guards are used. Modify your header as follows:
#ifndef MYCLASS_H
#define MYCLASS_H
class myclass {
public:
int f(float fl);
};
#endif
There should be nothing new in this, but the consequence might be surprising. First, we ask the preprocessor to check if the macro MYCLASS_H is defined, and only evaluate the following if it is not defined (the directive is named ifndef, or if not defined).
If we enter the if statement, the first thing we do is define the symbol MYCLASS_H, and then we evaluate the original contents of the header. Finally, we end the if-statement with an #endif. So what happens if the file gets included twice now?
For simplicity, assume the following .cpp file, containing nothing except two includes:
#include "myclass.h"
#include "myclass.h"
As the preprocessor parses this, it’ll expand both #include’s, resulting in this:
#ifndef MYCLASS_H // At this point, the macro MYCLASS_H is not defined, so we enter the following block:
#define MYCLASS_H // define the macro MYCLASS_H
class myclass { // allow this code to stay in the translation unit
public:
int f(float fl);
};
#endif // end the if statement
#ifndef MYCLASS_H // now MYCLASS_H *is* defined, the condition is not true, and so we *skip* the if statement.
//#define MYCLASS_H // of course the preprocessor doesn't actually comment out the code, it simply removes it from the translation unit. I'm commenting it to illustrate what happens
//
//class myclass { // this time, the preprocessor *removes* all this code, because it is inside a #if statement we're skipping
//public:
// int f(float fl);
//};
//
#endif
so after the preprocessor has run, only this code actually gets inserted in our translation unit:
class myclass {
public:
int f(float fl);
};
So it seems we’re able to handle multiple inclusions of the same header now.
So to review, we’re now able to split our code across multiple source files, and do it correctly. We don’t need to duplicate any code — all the shared code can be placed in header files, and include guards protect against accidentally including the same file twice in the same compilation unit.
And now you should finally understand what it meant when we included iostream in the original Hello World example. We’re simply pasting in a lot of system code, containing declarations that get linked together with the standard library containing the full definitions.
This turned out a lot longer than I’d originally intended (I had originally, and naïvely, planned to write the entire series in one post), so let’s call it a day here. Part two will be posted very soon, and cover some actual C++, now that we’ve got the fundamentals out of the way. You needed to understand how C++ code is compiled before you’re able to write anything useful in the language.
-
Application Binary Interface — a common ABI is required for two functions to be able to call each others. The ABI defines the memory layout of structs or classes, as well as calling conventions and basically everything you need to be able to call a function. Where should the return value go, where should parameters be placed, and so on. C defines a fixed ABI, which makes it easy to interface with. ↩
-
The story goes that Bjarne Stroustrup, the language designer, didn’t want to create a language where a simple
hello worldrequired multiple lines of code in the main function. Hence the special rule thatmaindoesn’t have to have an explicitreturnstatement. ↩





I enjoyed this article, it does a good job at explaining a lot of beginning questions in C++.
Your writing style is a bit repetitive, but it’s extremely thorough. It reminds me a bit of the German theologian Martin Chemnitz. Mind you, that’s a complement coming from me, I have most of his works available in English translation. You hit all the minutiae, which is great.
I look forward to reading more, ~sheepsimulator
Thanks for the feedback. About the writing, I felt that I had to be thorough to avoid misunderstandings, since this post deals with some real fundamentals that you have to get right if they’re to be useful at all. I know that also made the post extremely verbose and a bit repetitive. I’ll try to get it cleaned up a bit one of these days though.
But I’ve been sitting on it for a week or two now, and haven’t really had time to go through it properly yet, so I thought I might as well post it as is, rather than keep waiting for free time to polish it further. ;)
Glad you liked it though.