<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jalf.dk &#187; java</title>
	<atom:link href="http://jalf.dk/blog/tag/java/feed/" rel="self" type="application/rss+xml" />
	<link>http://jalf.dk/blog</link>
	<description>Musings and thoughts on programming and other geeky stuff</description>
	<lastBuildDate>Sun, 25 Mar 2012 09:51:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>The meaning of RAII — or why you never need to worry about resource management again</title>
		<link>http://jalf.dk/blog/2010/01/the-meaning-of-raii-or-why-you-never-need-to-worry-about-resource-management-again/</link>
		<comments>http://jalf.dk/blog/2010/01/the-meaning-of-raii-or-why-you-never-need-to-worry-about-resource-management-again/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 05:00:52 +0000</pubDate>
		<dc:creator>jalf</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[.net]]></category>
		<category><![CDATA[c++]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[raii]]></category>

		<guid isPermaLink="false">http://jalf.dk/blog/?p=340</guid>
		<description><![CDATA[I tried really hard to come up with some witty title or pun to weave into the title of this post. I couldn’t. RAII is just a terrible name, and it isn’t really clever or funny. Unfortunately, it is also the single most important key to C++. It is not just an idiom but a [...]]]></description>
			<content:encoded><![CDATA[<p>I tried <em>really</em> hard to come up with some witty title or pun to weave into the title of this post. I couldn’t. RAII is just a terrible name, and it isn’t really clever or funny. Unfortunately, it is also <em>the</em> single most important key to C++. It is not just an idiom but a fundamental philosophy used to solve almost any problem in the language. So we can’t really avoid it.</p>

<p>If I had to pinpoint one thing that marked the difference between a skilled and an unskilled C++ programmer, it would be “do they understand RAII”. Many people don’t, hence this post.<span id="more-340"></span></p>

<p>RAII is, apart from being badly named, one of those deceptively simple concepts that you <em>think</em> you understand when you first hear of it, think “well duh, that’s obvious”, and then proceed to write code as usual, because you just don’t see how widely applicable it is.</p>

<p>But let’s get the name out of the way first. <a href="http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization">RAII</a> stands for “Resource Acquisition Is Initialization”. And if you’re not already familiar with the idiom, then this has told you <em>nothing at all</em>. If you did know about RAII in advance, then you can, when you stop and think about it, kind of see how the name relates to it… vaguely… sort of.</p>

<p>What it actually <em>means</em> is simple: Resources should be managed by classes. When the class is initialized, the resource is acquired (hence the name). When the class is destroyed, the resource is released. And the lifetime of the object should exactly match the desired lifetime of the resource. That sounds obvious, and many programmers will (assuming they’re working in a language that <em>has</em> classes), say that this is what they always do.</p>

<p>Often, C++ developers think this just means “smart pointers. Wrap your memory allocation in a <code>boost::shared_ptr</code> and you’re done”. I see that as one not-very-often used border case though, rather than a typical example of RAII. So let’s take a step back instead.</p>

<p>The key idea isthat any kind of resource, not just memory, but file handles, sockets, database connections, or even more abstract resources like loggers or profiling timers or textures, really <em>any</em> concept or process which has a lifetime, should be mapped to an object.</p>

<p>Unlike the typical object-oriented line of thought which goes that “everything must be an object, because then.… well, everything will be an object, and your code will be better”, here we actually have a concrete <em>reason</em>: We want to use the object to manage the lifetime of the resource.</p>

<p>When I allocate memory with <code>new</code>, I have to deallocate it again sooner or later, with <code>delete</code>. (Or in C, with <code>malloc()</code> and <code>free()</code> respectively). And I have to make sure that this is done. And I have to make sure that it is not done twice. And that the object is not accessed after this is done. There are a lot of constraints we have to obey, all related to the lifetime of the resource. And this is why unmanaged programs have a reputation of leaking memory left and right. If we allocate memory, and it is to be used by a dynamic number of objects or functions all referencing the same allocations, which of the users is responsible for deleting it? And how do we know when it is safe to delete, when no users remain?</p>

<p>Ironically, most managed languages have <em>not</em> solved the problem. They have added a garbage collector (which yes, is very useful for a wide number of reasons), but that only solves one specific instance of the problem. It takes care of avoiding memory leaks, but it doesn’t avoid resource leaks <em>in general</em>.</p>

<p>The garbage collector ensures that this code won’t leak memory:</p>

<pre><code>void foo() {
  SomeObject* obj = new SomeObject();
  bar(obj);
}
</code></pre>

<p>where without a garbage collector, we’d (at least without RAII) have to write code such as</p>

<pre><code>void foo() {
  SomeObject* obj = new SomeObject();
  try {
    bar(obj);
    delete obj;
  }
  catch(...){ delete obj; }
}
</code></pre>

<p>In the garbage collected case, we don’t know what <code>bar</code> does, and we don’t <em>need</em> to know. It doesn’t have to delete the object. And neither does the <code>foo</code> function. So we have successfully dodged the problem of managing the lifetime of memory allocations. We haven’t really <em>solved</em> the problem though. We still don’t have any good tools to <em>manage</em> the lifetime. We’re just guaranteed by the system that it’ll last <em>long enough</em>.</p>

<p>In C++, this effect can be approximated using some kind of smart pointer<sup id="fnref:1"><a href="#fn:1" rel="footnote">1</a></sup>.</p>

<p>Smart pointers allow us to write code like this:</p>

<pre><code>void foo() {
  boost::shared_ptr&lt;SomeObject&gt; ptr = new SomeObject();
  bar(ptr);
}
</code></pre>

<p>and be sure we won’t leak memory. Of course, this solution isn’t perfect — reference counting is much more expensive than a good garbage collector, and if we create cyclic references, the objects will never be deleted, as the reference counts never reach zero. It is a decent approximation, but nowhere near as good and reliable as the garbage collector in managed languages.</p>

<p>But the problem shows up again if we use another type of resource. What if we’d opened a database connection instead?
We’d have to write code such as this:
(The following Java-like pseudocode is copied almost verbatim from <a href="http://stackoverflow.com/questions/161177/does-c-support-finally-blocks-and-whats-this-raii-i-keep-hearing-about/161247#161247">this StackOverflow.com answer</a>, courtesy of <a href="http://stackoverflow.com/users/14065/martin-york">Martin York</a>.)</p>

<pre><code>void writeToDb()
{
  Db db = new Db("DBDesciptionString");
  try
  {
    // Use the db object.
  }
  finally
  {
    db.close();
  }
}
</code></pre>

<p>(And of course it gets even worse if <code>db.close()</code> can throw exceptions. Then we have to catch <em>that</em> exception, just to avoid it propagating out from the <code>finally</code> clause if we reached <code>finally</code> because of an exception being thrown in the <code>try</code> clause.)</p>

<p>The resource management problem still exists. We still have to wrap the code in exception handling just to make sure that the connection is closed as soon as we’re done with it. And we have to do this at <em>every</em> use. And it gets complicated fast.</p>

<p>Of course, .NET makes this a bit simpler:</p>

<pre><code>using (Db db = new Db("DbDescriptionString"))
{
  // use the database object.
}
</code></pre>

<p>But the onus is still on the user of the class to ensure it is closed correctly. There is no obvious way to encode into the <code>Db</code> class that “once we’re done with an object of this type, the connection must be closed immediately”.</p>

<p>And in C++, smart pointers are no longer suitable solutions, since the resource to be managed is no longer a pointer allocated with <code>new</code>.</p>

<p>Instead, a more basic flavor of RAII comes to the fore:</p>

<pre><code>void someFunc()
{
    Db db("DBDesciptionString");
    // Use the db object.
} 
</code></pre>

<p>Yes, that’s all. When the <code>db</code> object goes out of scope, at the end of the function, its destructor is called. The destructor internally calls <code>this-&gt;Close()</code> for us, so we don’t need to do it! We just have to trust the scoping rules of C++, which guarantee that destructors are called on local variables when they go out of scope, and on class members when the class is destroyed.</p>

<p>So in a sense, the key idea in RAII is simply that “resources should behave sensibly”. They should get copied safely if an assignment is made (or otherwise, assignments should be prevented), they should be available if their owning object is successfully created (if it can’t create the resource, it should throw an exception, aborting the creation of the object), and when they are no longer used, they should be cleaned up.</p>

<p>The C++ standard library class template <code>std::vector</code> is a wonderful example of RAII in action. The resources being managed by a <code>vector</code> are memory (the array allocated internally to hold the objects being contained in the vector, as well as the objects themselves. When the <code>vector</code> is destroyed, every object it holds must be destroyed too, and the array in which they were placed must be deallocated.</p>

<p>In the following examples, assume that a function <code>foo</code> is passed a vector of <code>MyClass</code> objects by value. We don’t know how many, if any, objects are stored in it, but since we are passed a copy of the original <code>vector</code>, we take ownership of it. It exists only in the function <code>foo</code>, and must be destroyed afterwards.</p>

<pre><code>void foo(std::vector&lt;MyClass&gt; vec) {
  ...
 //  when we get to the end of the function, all local variables, including vec, 
 // are automatically destroyed by having their destructors invoked.
 // So no matter how many MyClass objects were stored in the vector, it ensures that they too have their destructors called.
 // And the vector also deallocates its internal array, leaving neither of its resources alive at the end of the function
}

void foo(std::vector&lt;MyClass&gt; vec) {
  throw std::exception("Oops");
  // as above, vec is automatically destroyed when we leave the function,
  // regardless of *how* we leave it. Even if we leave it because an exception was thrown and not caught.
} 

void foo(std::vector&lt;MyClass&gt; vec) {
  // other is constructed as a copy of vec. std::vector ensures that both of vecs resources are copied as well
  std::vector&lt;MyClass&gt; other = vec;
  // we now have two vectors, each owning a dynamically allocated array and a number of MyClass objects
  // and again, at the end of the function, both are deallocated cleanly
} 

void foo(std::vector&lt;MyClass&gt; vec) {
  std::vector&lt;MyClass&gt; other; // a second, empty, vector

  // perform an assignment, setting vec to be an empty vector
  // std::vector makes sure that if you do this, the resources previously held by vec are cleanly released
  // before copies are made of the resources held by other
  vec = other;

  // and so when the function ends, the MyClass objects originally held by vec
  // have already been destroyed, so their destructors are *not* invoked now
} 
</code></pre>

<p>As the above shows, <code>vec</code> owns its resources, and manages them tightly. Whenever a change happens to <code>vec</code>, it reflects this by updating its owned resources. If it is destroyed, it destroys its owned resources. If it is copied, it copies the resources it owns. If it is assigned to hold something else, it first destroys its existing resources. And so on. Nothing you do can bring it “out of balance”. It just works. <em>That</em> is RAII. Smart pointers are just convenient adapters turning raw pointers into RAII objects. But RAII is much more than smart pointers.</p>

<p>It is the broad and general idea that <em>resources should be mapped to objects</em>, so that the object can not be created unless it succeeded in acquiring its resource, and it can not be destroyed without also releasing its resource. This effectively saves C++ programmers from having to worry about resource management.</p>

<p>Take an example that’s guaranteed to cause pain without the use of RAII: Handling exceptions being through halfway through constructors. Say you have a class with multiple members which are initialized in its constructor. After the first member has been initialized, but before all of them have been initialized, an exception is thrown. Let’s use the following contrived example:</p>

<pre><code>class Foobar {
  Foo f;
  Bar b;
  MyClass c;

public:
  Foobar() : f(42), b("hello world), c('a') {}
};
</code></pre>

<p>unfortunately, <code>b</code>’s constructor throws an exception. How to handle this? We know that in C++, partially constructed objects do not automatically have their destructors called. when the construction is aborted.</p>

<p>And since we want to avoid any resource leaks, we require that the following must happen:
– <code>a</code> must have its destructor called (because <code>a</code> was successfully initialized before the error occurrd)
– <code>b</code> must release any resources it acquired in its constructor before it threw the exception
– <code>c</code> must do nothing. Its construction was not yet begun when the error ocurred, so it would be an error to attempt any kind of cleanup of <code>c</code>.
– The <code>Foobar</code> object (the object pointed to by the <code>this</code> pointer) must ensure that the above, and nothing else, happens, and it must do so without relying on its own destructor (which won’t be called, as construction did not successfully complete).</p>

<p>And of course, pretending that only <code>b</code> can throw an exception may be a simplification over the real world. Perhaps every member could throw one from its constructor. Care to write a <code>Foobar</code> constructor which takes all this into account, providing enough <code>try</code>/<code>catch</code> blocks to correctly catch every exception that might be thrown, and release exactly the resources that have been allocated until then, and <em>nothing</em> else? A tall order, and an open invitation for bugs. And of course, it’d lead to a huge, bloated and error-prone constructor. It’d also prevent us from using the <em>initializer list</em>. We’d have to perform some kind of “safe” non-throwing default construction of both <code>a</code>, <code>b</code> and <code>c</code> before entering the constructor body, where exception handling is possible, and from there, attempt to perform assignments to bring the three members into the desired state.</p>

<p>In pseudocode, the constructor might look something like this:</p>

<pre><code>Foobar() {
  a = new Foo(42);
  try {
    b = new Bar("hello world");
  }
  catch {
    destroy a;
    throw;
  }
 try {
    c = new MyClass();
  }
  catch {
    destroy b;
    destroy a;
    throw;
  }
}
</code></pre>

<p>Note that all this complexity is only necessary because we want to handle several different resources. <code>a</code>, <code>b</code> and <code>c</code> all contain resources that must be attempted acquired, and properly released if this fails. If there’d been only one resource, the job would have been much simpler. There wouldn’t be any point at which <em>some</em> resources have been acquired, and others have not. If we succeeded in acquiring that one resource, there’d be no risk of errors occurring afterwards, so we wouldn’t need complex conditional cleanup code. And if we failed to acquire the one resource, there’d be nothing to clean up — after all, the resource was never acquired!</p>

<p>So to keep down the complexity, the only safe way to define a class is to make it own <em>at most one</em> resource. And this one-to-one mapping of resources to classes is exactly what RAII is all about. If <code>a</code>, <code>b</code> and <code>c</code> had all been RAII objects, then the above code <em>would work</em>. Regardless of which members could or couldn’t throw exceptions. According to the rules of C++, we know that in the above case,</p>

<ul>
<li>the <code>Foobar</code> destructor (<code>this-&gt;Foobar::~Foobar()</code> will not be called, as <code>*this</code> was not successfully constructed.</li>
<li>the <code>a</code> destructor will be called, as this member was fully constructed at the time of the error.</li>
<li>the <code>b</code> and <code>c</code> destructors will not be called, as these members were not fully constructed at the time of the error.</li>
</ul>

<p>So assuming that <code>b</code>’s constructor takes care of releasing any resources successfully allocated when the error occurred (the number of which, as pointed out above, should ideally be zero), we’re actually home free! What happens is exactly what we listed earlier as our goal. <code>a</code> has its destructor called, <code>c</code>’s constructor was never run in the first place, so it doesn’t have to do anything, and <code>*this</code> doesn’t have to do <em>anything</em> special in its constructor. All of its members take care of their own resources, so the number of resources managed by <code>*this</code> is zero!</p>

<p>We don’t even need to write a destructor for <code>Foobar</code> now, if all its members are RAII objects. Whether the <code>Foobar</code> object is partially or fully constructed, its members take care of themselves. That is the power of RAII. Once a resource has been mapped to a class, we can use it as much as we like, and even in very complex situations, and never have to worry about the resource being leaked. It is managed by its wrapping RAII object, and the C++ lifetime and scope rules ensure that this wrapper object gets destroyed when it goes out of scope</p>

<div class="footnotes">
<hr />
<ol>

<li id="fn:1">
<p>A smart pointer is an object which behaves as a pointer (meaning that it overloads the <code>*</code> and <code>-&gt;</code> operators, so it can be dereferenced to yield the pointed-to value), but also enforces some kind of ownership semantics on the value. A plain pointer does nothing when it goes out of scope. If it pointed to some dynamically allocated memory, nothing happens to that memory. And if no one else have a pointer to it, then that memory is lost, and can not be reclaimed.
A smart pointer does <em>something</em> when it is destroyed. Some variants simply free the memory they point to (<code>boost::scoped_ptr</code>, <code>std::auto_ptr</code> or <code>std::unique_ptr</code> all fall into this category, although with some important differences), while others implement reference counting, so that the memory is only destroyed when <em>all</em> smart pointers pointing to it have been destroyed. <code>boost::shared_ptr</code> is by far the best known implementation of this concept. <a href="#fnref:1" rev="footnote">↩</a></p>
</li>

</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://jalf.dk/blog/2010/01/the-meaning-of-raii-or-why-you-never-need-to-worry-about-resource-management-again/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

