Welcome to the second installment in my guide of “what you need to know if you’re a .NET programmer who wants to be able to write C++ code and call native APIs”. It took me much longer to get this posted than I’d hoped. My work on my thesis has kept me more busy than I’d originally expected. Sorry for the delay!
In part I, I went through a minimal “Hello World” program in some detail, and attempted to explain the arcane workings of the C/C++ compilation model. Some may argue that this had no relevance to my target audience, but I think it is a necessary evil. Almost all C++ programmers get tripped up at some point by the the difference between compiler and linker errors, and what exactly the #include directive actually does. Hopefully, by reading part I, you’ll be able to avoid this.
With that out of the way, we can get started on the interesting part, though. Part II will focus on actual C++ code. We won’t consider managed interop or even the Win32 API yet, though. This part will still take place in native C++-land only. In short, the purpose of this part is to enable you to write simple C++ programs, and more importantly, to understand the C++ sample code you probably run into from time to time.
I will not cover all the idioms and techniques that “real” C++ programmers use. We’ll settle for the bare minimum required to get by in a .NET-to-Win32 interop scenario where you really just want to write enough C++ code to call some native API function. This means that we won’t get the most robust, reusable, elegant or concise C++ code. But we will be able to get the job done.
I’d love to write a more detailed series of posts about “modern C++“1 some other time, but it is beyond the scope of this series of posts.
Using C++
Before we get into the Win32 API, let’s run through some slightly bigger C++ examples than the Hello World from part I. At the very least you’re going to need to know how to define and use classes, and a few useful components in the standard library.
You already know that it is possible to define class member functions outside classes, but you haven’t yet seen a nontrivial class definition. Let us try creating one. For the purposes of demonstration, I’ll implement the simplest class I can think of; a counter. It’ll simply contain an integer, and callers will be able to increment the value, and get the current value.
class counter {
public:
counter() : i(0) {}
int current() {return i; }
void update() { ++i; }
private:
int i;
};
There, we now have a basic class. We can call it from this function: (I use assert to indicate expected values of variables, much like you would in a unit-test. Note that the asserts are pseudocode (among other things, I will access private class members with them, which obviously won’t work in reality)
// assume that we either placed the class definition here, or have a #include for the header in which the class is defined.
int main(){
counter c;
int i = c.current();
assert(i == 0);
c.update();
assert(c.i == 1);
assert(c.current() == 1);
}
A ridiculous simple program, of course. But there are several things worth noting. In no particular order:
- Our counter object
cis created without usingnew, and without explicitly calling a constructor. All C++ types are fundamentally similar to .NET’s value types — socis not a reference to a counter, but instead a default-constructed instance of one, placed on the stack. If nothing else is specified, the default constructor is called when the variable is declared. (To call another constructor, we could have done something likecounter c(1, "hello", 2.0f);. - the class definition is terminated by a semicolon. This is important to remember, as forgetting it can lead to very misleading compiler errors. I won’t get into why this semicolon is necessary here though. It is a long story, and it is caused by the need for C compatibility.
- access specifiers are not applied per-member, but rather used to divide the class into sections. In a class, the default specifier is
private. I tend to put my public members at the top of the class, to make it easier for readers to find the public interface. Further down, we have aprivatespecifier, for hiding our int member. The valid access specifiers arepublic,privateandprotected, which each behave just like in C#.internaldoes not exist however, since there is no notion of assemblies, and the only way to share types between files is with#include’s as mentioned in part I. - there is no clear, common naming convention in C++. The standard library users lower-case, and separate words by underscores, as in
class_name. Many programmers however, use a convention similar to in .NET, naming typesClassName, and variablesclassName. There are no fixed rules, so as long as you are consistent it’s fine with me. - .NET has both classes and structs, and the two have very different meanings. In C++, both classes and structs exist as well, but their meaning is almost the same. The only difference between a class and a struct in C++ is that a struct defaults to public accessibility for members, where a class defaults to private. In other words, if I had defined the above as a struct, I could have omitted the
public:line. (For this reason, I often find myself using structs. As I said, I tend to put the public interface at the top of the class, and add aprivate:section further down. However, it is not a big deal. A common rule of thumb is much the same as is used in C#: Structs are simple containers of data, where classes have behavior. My own style tends to be a compromise between the two. Classes with complex behavior are made classes for this reason, but in simpler borderline cases, I tend to prefer struct, even if it has some behavior. It makes no difference to the compiler, and it saves me a line of code, because it defaults topublic, which is what I want at the top of my class anyway.) - the observant reader will probably have noticed a difference here compared to the example shown in part I. Back then we declared the member method without defining its body inside the class. Now we define the body inside the class. Both approaches are legal, and each have their pros and cons. In particular, defining the body outside the class leaders to shorter class definitions, which may aid readability. On the other hand, defining functions inside the class leads to better locality — you only have to look in one place to learn all about the class. Further, the compiler is generally better able to optimize code if member methods are defined “inline”. For these reasons, people often put short functions of 2 – 3 lines or so inside the class, and define larger ones separately. There is one caveat, however. Functions defined inline (either by placing the full definition inside the class, or by marking the definition with the
inlinekeyword) may have a definition in each translation unit. In other words, they may be placed in headers (where they’re seen by multiple translation units. Non-inline functions must only be defined once, and so generally have to be defined in a.cppfile, similar to what we did in part I. - the constructor looks a bit different than you may be used to. The
imember is initialized via the initializer list, specified after the colon. This is similar to how you would call a base class constructor in C#, although instead ofbase, the member name is used. Also note that we have to explicitly initializeibecause as a primitive type, it would otherwise not be initialized at all. The initializer list syntax is only legal in constructors, and should be used as much as possible. I’ll explain why in a moment.
“Special” member functions
We could have defined the constructor in a more familiar way:
counter() {
i = 0;
}
and in this simple case, it would have made no difference. In more complex classes, however, there is an important distinction: anything that happens in the constructor’s body happens after members are initialized. If the member is not specified in the initializer list, it is default initialized, which means that for primitive types, nothing happens, they just contain random garbage values, and for classes defining a default constructor, it gets called before the constructor’s body is evaluated, in which we assign the actual value we want the member to contain.
So yes, for our simple int case, we might as well have written the constructor without using the initializer list. But consider what would have happened if the member had been some complex user-defined class. Instead of simply constructing the object with the right value to begin with, we would have default-constructed it, and then executed an assignment. This would obviously have been less efficient than simply constructing the object correctly in the first place.
But just as importantly, some types can not be assigned to once they are initialized. Likewise, some types may not have a default constructor, in which case failure to use the initializer list to explicitly call another constructor will result in a compiler error! So in general, the initializer list should be preferred both from performance and correctness concerns. A side effect of the initializer list is that the actual body of constructors can often be left empty.
In .NET, there is a distinction between value and reference types, and the behavior of the assignment operator is completely different for each of the two cases. x = y for values of a reference type simply stores a reference to y into x. But if the two types are value types, a complete copy is created instead.
In C++, all variables obey value semantics. x = y will always copy the value y into x. This is why I said when discussing the constructor’s initializer list that an extra assignment may be expensive.
Since the plain value semantics as used by C# would be both inflexible and inefficient, C++ provides a number of tools for controlling the behavior of your class. In particular, you can define a copy constructor and an assignment operator to override exactly how assignment should be performed. The following demonstrates what they may look like.
class counter {
public:
counter(const counter& other) : i(other.i) {} // copy constructor
counter& operator= (const counter& other) { // assignment operator
if (this == &other) {return *this; }
i = other.i;
}
....
};
Perhaps the first thing we should mention is the meaning of the & character. It is used to denote a reference, essentially an alias for a variable. It is related to pointers (see this post for a more detailed explanation of pointers), but is simpler and more limited. In particular, it can not be reseated. Once it is initialized, it is an alias for the variable it points to forever. Also, unlike pointers, there is no special syntax for using a reference:
int i; // create an integer.
int& r = i; // create a reference as an alias of i. Note that we simply assign i, unlike with pointers where we would have had to take the address of i first with the `&` operator.
r = 42; // assign 42 to whatever the reference points to. Again, no special syntax. There is nothing here to tell us that r is a reference.
int j = 13; // create another integer
r = j; // assign it to our reference. The effect of this is *not* to make r point to j (as would have happened had it been a pointer), but simply to assign the value of j to i. In other words, i will now equal 13, and r will still point to i.
Because a reference can not be reseated, it is also a nice example of a case where the constructor’s initializer list must be used. Imagine a class which has a reference member. References must point to something, so they have no default constructor. And once they are initialized to point to an object, they always point to that object. In other words, it must be constructed before the constructor’s body is executed, which means in the initializer list. Failure to do so simply won’t compile.
Now to explain the copy constructor, which is fairly simple. It is simply a constructor which takes one argument, a const reference to the type itself. Copy constructors are commonly used to initialize class members with copies of the arguments passed to the “outer” class’ constructor. In the copy constructor above, we also copy-construct i, for example. (The value of other.i is copied into our own i)
The assignment operator is a bit trickier.
The first line inside it tests for assignment to itself. (As would happen in x = x). This may not have been a problem in this simple class, but in more complicated ones, self-assignment can cause problems, as you will be reading data from the same object you’re writing to. We also note that instead of simply comparing this to other in the test, we use &other. We wish to check that this and other refer to the same object instance, not just that they contain the same value. To achieve this, we need to compare pointers. this is already a pointer2, but other is a reference, so we have to take the address of it first. Because a reference is essentially an alias for the referenced value, the address-of operator returns the address of the referenced value, not of the reference itself.
Next, note that the assignment operator does not have an initializer list, but instead performs the copying in the function body. The reason for this is obvious: It is not a constructor, so all its members are already initialized. An initializer list would not make sense, and is not allowed by the language. This also means that here, i’s assigment operator is invoked, rather than its copy constructor, as was used in the previous example. (Technically, built-in types have neither assignment operator or copy constructor. However, the same syntax is allowed, it simply uses the obvious built-in operations.)
A final note about assignment operators and copy constructors is that if = is used to declare a variable, the copy constructor, and not the assignment operator, is called. As I said before, these functions are special and known to the compiler, and so, can be invoked in special cases. That is, if you are given variables c and d of type counter, then c = d calls the assignment operator on c, because c is already initialized. But if it had instead been counter c = d, then c would have been initialized as a copy of d, and so its copy constructor would have been used. The compiler ensures this, even if you use assignment syntax in the initialization of a variable.
Finally we get to another dreaded C++ construct: the destructor. This is automatically called when the object is destroyed, and can be defined thusly:
class counter {
public:
~counter(){
std::cout << i << std::endl;
}
....
};
The syntax is similar to finalizers in C#, but the effect is somewhat different. The destructor is invoked instantly when an object is deleted, and it is guaranteed to be called. In our case, we simply use it to print out the counter value.
Let’s try using these new functions and operators:
int main(){
counter c; // use the default constructor to create a counter
c.update(); // increment its value
assert(c.i == 1);
counter d(c); // use the copy constructor to create a new copy of our existing counter.
assert(d.i == 1);
d.update();
assert(c.i == 1); // our copy constructor made sure to create a *new* counter variable, so c is not affected by changes to d, and vice versa
assert(d.i == 2);
c = d; // since c has already been initialized, the assignment operator is used to copy d into c.
assert(c.i == 2);
} // at this point, both c and d go out of scope, and so their destructors are called. Destructors are always called in opposite order of destruction, so d's destructor will be invoked first.
All three functions are auto-generated by the compiler, if not declared explicitly. (The one exception is the assignment operator, which it may not be possible to auto-generate. If a class contains a member with no assignment operator, or a reference (which can not be reseated), the compiler will fail to generate an assignment operator, and all attempts to perform assignment will fail if one is not explicitly defined by the user.
The trio of copy constructor, assignment operator and destructor are sometimes called “the big three”, or we may speak of “the rule of three”. This is a rule of thumb that if you find yourself implementing one of these three special functions, you almost certainly should also implement the other two. The reasoning is pretty simple: The assignment operator and copy constructor are related — both are used to copy an object. If special care has to be taken when copying, then it should probably be defined for both these functions.
Further, if copying requires nontrivial handling, then it is a good bet that the class manages some kind of resource or contains data which requires special care in the destructor as well. Perhaps a pointer pointing to dynamically allocated memory, which must be deleted, or perhaps it should decrement a global counter used to count the number of live instances of the class. Or perhaps it is a file handle which must be closed. The fact that we had to implement special handling when copying is a strong hint that there will probably also be special handling required when cleaning up in the destructor.
And the converse is also true. If the destructor has to do something special, it must be because the class owns some kind of resource that must be released. And if it owns a resource, then we should ensure that the resource gets copied when the class itself does. So we should probably define copy constructor and assignment operator as well.
POD types
A final note about classes may be worth mentioning. C had no classes, only simple structs containing values, but no member functions, and without allowing inheritance or access specifiers. Since C++ was designed to be (mostly) backwards-compatible, such types have a special status in C++. In the above, I mentioned “primitive types” a few times. While an int is technically a primitive type (all built-in types are considered primitive types), the behavior I described is actually common to all POD (Plain Old Data) types. A POD type is essentially a type that would have been legal in C — in other words, it is either a built-in (primitive) type, or a class or struct where
- all members are public
- no member methods exist
- no constructor, copy constructor, assignment operator or destructor is defined
- no base classes exist
- All members are POD types as well
Such POD types are given special treatment in many ways. For example, they may be treated as “raw memory”. The standard-library C function memcpy, which simply copies a number of bytes from one location to another, may be used to copy POD types, but not non-POD classes. The reason for this is that non-POD types may have extra behavior that would break if this was done. As an obvious example, if we created a copy in this way, we would bypass the assignment operator/copy constructor, but we would end up with two objects, both of which would have their destructors called when deleted — so we would end up with a mismatch where the destructor is called more often than the constructors, a clear error if the class implements reference-counting, for example.
Another peculiarity of POD types is that they are not initialized unless a constructor is explicitly called. this is why we had to initialize i in our constructor above. As a POD type, i would otherwise contain whatever garbage value was found in memory. The same is true for POD structs. They too contain garbage if not explicitly initialized by calling a constructor:
int i; // no initialization occurs
int i(); // explicitly require default initialization -- for POD types, this is done by setting all members to zero.
In other words, had our counter class stored a non-POD member, the initializer list would not have been necessary. Its member would automatically be default-constructor if nothing else was specified. But POD types do not have that extra behavior, so if nothing else is specified, they simply don’t get initialized.
Enough about classes
There are a few other nitty-gritty details about the language we should discuss. You may have already wondered about one or two of them. So without further ado,
- variable declaration is usually done without using
new. Thenewoperator allocates memory on the heap, and returns a pointer to the newly declared variable. Since there is no garbage collector, we have to manually calldeleteon this pointer to free the memory. This is the source of C++‘s reputation as a playground for memory leaks. Of course the astute reader will have noticed that so far, I haven’t usednewanddeleteeven once. The truth is that these can often be avoided or hidden, thus removing all possibility of memory leaks. Any variable declared without usingnewis declared “locally” — if it is declared in a function, it becomes a local variable, and is destroyed when we leave the scope in which it is declared. If it is a class member, it is destroyed when the owning class is destroyed. If it is declared inside a loop, it is destroyed when we leave the loop, and if it is defined in a function, it is destroyed when we leave the function. In other words, variables declared withoutnewhave “automatic storage duration”, and in fact,int i = 42could also be written asauto int i = 42. The auto keyword indicates exactly this, that the lifetime of the variable is automatic. Since this is the default, the keyword is never actually used, but it exists, and this is what it means. And just to clear up any doubts, variables with automatic storage duration are destroyed when we leave the scope it was declared in, no matter how we leave it. It doesn’t matter if we return from the function, or if an exception is thrown. In both cases, the local variable’s destructor is called. - Just to avoid confusion, we’d better look at a quick example of using
new: Consider this line of code:counter* p = new counter(). Here, we allocate an object of ourcounterclass on the heap, with dynamic storage duration, but we also declare a local variable — the pointerp. The pointer is a local variable with automatic storage duration. In other words, the pointer itself will be freed just fine when we leave the function — but the dynamically allocatedcounterto which it points will not. This is how memory leaks occur. Oncepgets destroyed, we no longer have a pointer to the dynamically allocated memory, so we can never free it. - Avoiding cyclic dependencies can take a bit of work, since C++ code is read by the compiler from top to bottom. It won’t let a function or class refer to another which hasn’t been defined yet. Sometimes, this can be solved through refactoring, by splitting out the code we need to refer to, out into a separate class which can be declared first. But another trick is to use forward declarations. You have already seen it used for the class member method in part I. We can declare a function without specifying its body. This tells the compiler that the function exists, which means we can call it safely. So if we put such a declaration at the top of a file, we can provide the actual definition including the body at the end of the file, after whatever classes or functions we need to refer to. For classes, we can do a similar trick, and simply declare
class counter;. As with the function case, this tells the compiler thatcounteris a class, and that it does exist. The definition just isn’t shown yet. This won’t let you access class members yet (since the compiler still doesn’t know which members it has), and you can’t declare variables of that type yet (because the compiler doesn’t know which, if any, constructor to call, and it doesn’t know the size of the class). But you can create references and pointers to the class. - C# uses function overloading to allow for functions where some parameters may have sensible default values. If we have a function taking parameters
aandb, we can create an overload which takes onlya, and provides a default value forb. The same can be done in C++, but you also have the option of providing default values. The functionvoid foo(int i = 0) {std::cout << i << std::endl; }can be called just withfoo(), and will print out0. If you are more comfortable with overloading, you may not need to use default parameters, but you may still encounter third-party code which uses them, so you should be familiar with the syntax.
The standard library
We’re nearing the end. The last thing you should know about C++ before I let you run loose is a few standard library classes. The C++ standard library is very small compared to .NET or Java’s class libraries, but it is also widely considered C++‘s main saving grace — most people consider the language an overcomplicated mess in many ways, but the standard library stands out, both as an example of C++ done right, and as a redeeming feature which transforms C++ into a powerful and elegant language3. Or more precisely, part of the standard library possesses these qualities.
In the following I’ll briefly sketch out the main parts of the standard library, and explain a few useful classes. For more general information, Microsoft has some excellent documentation for all parts of the standard library here.
The standard library has been assembled piecemeal over the years, and as such, represents several different styles and paradigms. The oldest parts of it are simple functions carried over from C’s standard library. I have already mentioned two of these, printf and memcpy, but of course many others exist.
After these came the first C++-specific additions, in the form of the iostreams library. You have also encountered a few members of this, in cout, cin and endl, as well as the operator<< used for streaming. This library is, honestly, not very nice. It does the job for simple Hello World-like applications, but it is inflexible, inefficient, overcomplicated and hard to extend. In fact, many C++ programmers stick to printf over cout despite all the disadvantages I listed in part I. Of course, iostreams also contains file streams as well as some other basic stream functionality. A related addition is the string class, and the locale facilities.
These all have one thing in common: they are very old-fashioned and are, today, considered far from ideal. The string class got some last-minute surgery when it was added to make it a bit more modern, and a few additions were made to the stream classes as well, but overall, these are relics from the era of “C with classes”.
Finally, the star of the show is the Standard Template Library, or the STL for short. This remarkable library completely changed the how the language was used, and is definitely worth exploring. I won’t ramble on about it here, but I will mention that one of its characteristics is that it almost completely abandons traditional Object-Oriented programming (which iostreams used heavily), in favor of the less known and almost C++-specific paradigm Generic Programming.
The STL consists of three distinct “pillars”:
- Container classes are the equivalents of .NET’s System.Collections.Generics classes. They store sequences of data, and little else.
- Iterator classes are superficially similar to .NET’s IEnumerator. They allow traversal over a container, but where .NET only allows traversal from the beginning to the end, C++ iterators also allow reversed iteration (from end to beginning), as well as traversal over subsets of the container (from the 6th to the 12th element, for example). Pairs of iterators are often used to mark sequences for further processing. Individual iterators are often used as “markers” into a sequence.
- Algorithm functions work on iterators, or a pair of iterators, and perform almost all sequence processing. Sorting, searching, copying,
foreach, accumulating values or any other algorithm involving sequences of data is implemented as an algorithm working on iterators.
The clever part about this setup is that algorithms and containers know nothing of each others. An algorithm works on iterators, wherever they come from. It works whether the iterators are pointers into an array, into a linked list, or perhaps even into a stream or a database. As long as the iterator implements the appropriate functionality, it can be used by the algorithms. This allows for a degree of reusability that would have been impossible in .NET. The same find function for example, works on all of the standard container classes, in addition to working on any iterators your define yourself. As long as they fulfill a few basic requirements, you get find, sort and many other common operations for free.
And again unlike .NET, there is no interface you have to implement to create a new iterator type, or, for that matter, a new container class. The STL relies on a form of Duck Typing (if it looks like a duck, and walks like a duck, and quacks like a duck, it must be a duck) — this means that an iterator is not “a class which implements IIterator<T> or anything like that, but simply “A type T for which the following statements are defined, given an object x of type T: ++x, *x, T() and a few others. In other words, if a type defines a default constructor and a few operators, then it is an iterator, and it’ll work seamlessly with the rest of the STL. In fact, raw pointers are valid iterators as well.
In .NET, every collection class has to define its own search function, and there is no elegant way to decouple it completely. (We could define the function in a static helper class, but it would still be working on something specific like an IList, rather than just any sequence). In C++, the function std::find works on any pair of iterators.
While iterators and algorithms are key to “modern C++”, I will focus on the containers here, as they can be used with little explanation, and are almost indispensable (just like you wouldn’t want to program in C# without the List<T> class)
The equivalent of .NET’s List<T> class is the vector:
#include <vector>
int main() {
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
v.push_back(42);
v.pop_back();
// v now contains the values [1, 2, 3]
v.resize(5); // resize to contain 5 elements
// v now contains [1, 2, 3, 0, 0]
assert(v[1] == 2);
v[3] = 42;
// v now contains [1, 2, 3, 42, 0]
int& r = v[0]; // create a reference to the first element
int* p = &v[0]; // create a pointer to the first element
}
Pretty straightforward. And again, note that we’ve managed to create an arbitrary number of objects in our application, without even once having to call new. Which also means that there is no possible way in which this application can leak memory. (short of bugs in the compiler or standard library).
There are a couple of caveats to be aware of though:
- There is typically no bounds-checking on the
[ ]operator. This doesn’t mean it is legal to dov[999]above, it just means that there is no guarantee of what will happen if you do it. It is undefined behavior. - Pointers and references to individual elements within a vector may be invalidated when we add elements to the vector. Like with C#‘s
List<T>, it is a dynamic array, and resizes as necessary. Each such resizing operation consists of allocating a new array, copying the contents into that, and then freeing the old array. A pointer to data in the old array is therefore no longer valid. The same applies for iterators. Any iterator pointing into a vector is invalidated if the vector is resized.
Because a vector guarantees that its data is stored contiguously, essentially as an array, we can use this class instead of an array when interfacing with old C code (which only has pointers and arrays, but no vectors). In the above, the variable p could be passed to a C function as a pointer to the beginning of an array of int’s. we still have to be careful of course. The function must not be allowed to write past the end of the array.
Other container classes are the the map (equivalent to .NET’s Dictionary<Key, Value>. std::map<Key, Value> in the map header), and the set (no equivalent in .NET 2.0, although HashSet<T> in 3.5 is similar). Works much like a map without the Value parameter: std::set<T> in the set header). Their use is pretty much as you would expect.
In general, I would discourage you from using arrays. Prefer vectors instead, and if an API expects a pointer to an array, pass it a pointer to the first element of the vector instead, as shown in the previous example. Vectors are safer and simpler to work with.
Strings
A final pair of classes worth mentioning are std::string and std::wstring. C++ has no built-in string type, and so to work with strings, you have to include the string header, and use these classes. A string is simply a string of char’s, single-byte characters. A wstring is a string of wchar_t’s, or wide characters. On Windows, these are 16 bits wide, and use the UTF16 encoding, allowing them to be used for unicode strings.
These classes behave much as you would expect, so I won’t discuss them further. Instead I’ll skip to a related point of confusion: C has no string type at all. Instead, char pointers (or wchar_t pointers) are used as primitive strings.
A C-string is simply a sequence of characters, terminated by a null character ('\0'). If the null character is left out, all C string functions will just assume that the string continues until a null character happens to be found. This is obviously extremely fragile and a common source of bugs. but it’s an unavoidable fact of life when interfacing with C code.
This also rears its head when working with string literals. "hello world" does not have type std::string in C++. It has type const char[12], that is, an array of 12 const characters. (Note that the string is only 11 characters long. The compiler automatically generates the terminating null, and sets aside space for this as well).
Arrays in C and C++ are very primitive and fragile things, and implicitly decays into pointers when needed. Whenever you have an array, you can assign it to a pointer, and the pointer will automatically point to the beginning of the array. Because arrays are so limited (a function can not return an array or take an array as argument either), arrays are often passed around as pointers — and in fact, pointers can be treated much like arrays as well. Given a pointer p, p[2] is legal, and is equivalent to *(p+2). But because it is just a pointer, the size of the array isn’t known. It is up to the programmer to keep track of that.
Getting back to strings, the way arrays can decay into pointers means that this is legal: const char* str = "hello world". The pointer str now points to the statically allocated array of characters “hello world”, and for all practical purposes, str is now a C-string.
To create a wide string literal, the string is prefixed with a ‘L’, as in wchar_t* wstr = L"hello world".
Because C-style strings are used in most API’s, you often need to convert between this and the C++ string class. This can be done as in the following:
const char* str = "hello world";
std::string str2 = str; // an implicit conversion exists from char pointer to string. So in addition to this line, 'std::string str = "hello world" would also have worked.
const char* str3 = str2.c_str(); // the c_str() member method on the string class returns a C-style string.
Because string literals are C-style strings, there are a few pitfalls to be aware of when using them:
char* str = "hello worl";
char* str2 = str + str; // #1
str += 'd'; // #2
In line #1, we get a compile error. Because str is just a pointer, addition is not defined, and so the compiler chokes.
A related example is in #2 where we try to add a character to the string. This compiles, perhaps surprisingly, but it won’t do what you expect. Instead, the char gets converted to an int, and added to the value of the pointer. So the result is a pointer to 'd' characters past the beginning of the string.
For these operations to work, we must have a proper C++ string:
std::string str = "hello ";
str += "worl";
std::string str2 = str + 'd';
will work as expected, and result in the string “hello world”.
You now know all you need to know about C++ to use it without shooting yourself in the foot too much. You also know enough to read a lot of the code snippets you’re likely to find online. And you’ve got a starting point for searching out more information should you wish to.
In the next installment, we will finally get to interfacing with the Win32 API. You may want to play around a bit with the compiler to make sure you understand pointers and C-style strings in particular, as we’re going to need those quite a bit. As I mentioned in part I, the Windows API is a C API, and an ugly, inconsistent one at that. It’s not a bad idea to make sure you’re somewhat comfortable with the basics of the language before trying to grapple with it.
-
“Modern C++” is not just a random name. It is a style of C++ programming named after Alexandrescu’s book, Modern C++ Design — there are fundamentally two ways to program in C++. One style is often, and somewhat derisively, called “C with classes” — implying that it is used in much the same way one would program in C, but with the addition of classes, member methods and public/private access specifiers. The other, superior, approach is “Modern C++”. C with classes is often what beginners encounter, and perhaps surprisingly, what Java and C# are based upon — meaning that programmers coming from these languages tend to settle on an obsolete and sub-optimal style. I often make a point of teaching newcomers “proper” modern C++, but this is not the place. The goal of this series of posts is not to teach good C++ practices, but simply to enable .NET programmers to talk to native API’s. ↩
-
Unfortunately, there is no particularly good reason for this.
thisshould have been a reference. That would have made much more sense. However, whenthiswas added to the language, references did not yet exist, so it had to be a pointer. And later, when references were added, changingthisto a reference would have broken backwards compatibility. ↩ -
Bjarne Stroustrup, the designer of C++, once said that “Within C++, there is a much smaller and cleaner language struggling to get out” ↩





This is a great series, your writing style is very direct and reading this article feels like having a friend who knows me very well, giving me a crash course in C++.
My profession has made me use C# a whole lot, and I’ve forgotten most of what I had learned in C++, so this article makes for a great refresher for me.
Thanks.