Being functional in an imperative language

By now, I’ve read an awful lot of papers about STM sys­tems, and cer­tain trends are really start­ing to stand out, not so much in terms of the algo­rithms used or the clever schemes invented to make trans­ac­tions appear atomic, but in how they inter­face with the actual language.

It has really under­lined to me just how deeply entrenched most Java, C and C++ pro­gram­mers are in the imper­a­tive mind­set. When we want to per­form a trans­ac­tion, we’ll need to exe­cute some­thing like the fol­low­ing pseudocode:

for (;;) {
  tx = new Transaction();
  try {
    // ... fill in body of the transaction ...
    tx.Validate();
    tx.Commit()
  }
  catch (InvalidTxException){
    tx.Rollback();
  }
}

And sev­eral STM sys­tems actu­ally require the user to write the above when­ever a trans­ac­tion is to be executed.

Some imple­menters have real­ized that this is both ver­bose, messy and error-prone. So they try to hide it behind a macro.

#define BEGIN_ATOMIC \
  for (;;) { \
    tx = new Transaction();\
    try { \

#define END_ATOMIC \
      tx.Validate();
      tx.Commit()
    }
    catch (InvalidTxException){
      tx.Rollback();
    }
  }

and now a trans­ac­tion looks like this instead, from the user’s point of view:

BEGIN_ATOMIC
// ... fill in body of the transaction ...
END_ATOMIC

Which is obvi­ously shorter and requires less dupli­cate code. On the other hand, it relies on macros. Eeeew, yuck!

In Java, most imple­menters have tried to extend the lan­guage, to add a language-level atomic state­ment, allow­ing code like this:

atomic {
  // ... fill in body of the transaction ...
}

And yes, this obvi­ously works too, and has the poten­tial to inte­grate much more closely with the com­piler and type-checker. But it’s also a very imper­a­tive kind of think­ing. “add another type of scope inside whichever func­tion wants to per­form the transaction”.

What aston­ishes me is that no one (apart from Pey­ton Jones, who imple­mented Haskell STM) have real­ized that what we really want is noth­ing more than a higher-order func­tion. A func­tion which imple­ments the trans­ac­tion infra­struc­ture, and then calls a user-defined function.

What we’d like to do, ide­ally, is to define a func­tion atomic, as in the fol­low­ing pseudocode:

T atomic(f : Tx -> T){
  for (;;) {
    tx = new Transaction();
    try {
      T ret = f(tx);
      tx.Validate();
      tx.Commit();
      return ret;
    }
    catch (InvalidTxException){
      tx.Rollback();
    }
  }
}

Of course, if we wanted to be really func­tional, we could use recur­sion instead of the loop, but let’s not get car­ried away. Ulti­mately, what we want is just to hide all the messy infra­struc­ture of a trans­ac­tion from the user.

The above declares a func­tion atomic which takes as its para­me­ter a func­tion which, given a trans­ac­tion (type Tx), returns type T.

atomic imple­ments the messy trans­ac­tion loop, and sim­ply calls the user-supplied func­tion inside it.

And presto, the user can now cre­ate a trans­ac­tion like this:

int myTx(Tx tx) {
  // do whatever
}
int result = atomic(myTx);

Or, in a lan­guage which sup­ports lambda expressions,

int result = atomic((tx) => {/* do whatever */ });

Bit nicer than mess­ing around with user-implemented loops or macros, isn’t it? Of course, I’ve so far sought refuge within the peace­ful realms of pseudocode, where any­thing is pos­si­ble. We obvi­ously can’t do this in Java or C++, can we?

No, and yes, in that order. In Java, I agree, we’re just screwed. Of course, we could just pass in an object instead of a func­tion, and if it imple­ments some ICallable­Trans­ac­tion­Body inter­face, atomic could call its RunTx() method. That would work, but it’s not quite as elegant.

In C++, though, the sit­u­a­tion is dif­fer­ent. A large part of the stan­dard library (all the STL algo­rithms) relies on higher-order functions. .

In C++, we can define atomic as follows:

template <typename Func>
void atomic(Func f){
  for (;;) {
    tx = new Transaction();
    try {
      f(tx);
      tx.Validate();
      tx.Commit();
    }
    catch (InvalidTxException){
      tx.Rollback();
    }
  }
}

First, I should men­tion that I cheated. I removed the return value. atomic now returns void. The rea­son is that it’s not quite triv­ial to infer the return type of a func­tion in C++. TR1 does intro­duce a func­tion std::result_of but it doesn’t work in every case. C++0x will fix this once and for all, by pro­vid­ing the nec­es­sary lan­guage fea­tures to imple­ment it. But until then, we could hack around it, either by requir­ing the user to explic­itly spec­ify the return type (atomic<int>(myTx)), or by using result_of and sim­ply telling the user to avoid the cases where it doesn’t work. But to keep the above code exam­ple sim­ple, I sim­ply removed the return type instead. But it can be solved, and my pro­to­type does it.

Now, on with the show. We have now defined our atomic func­tion. How do we call it?

In one of three ways, in decreas­ing order of familiarity:

First, with a func­tion pointer. Hope­fully every C++ pro­gram­mer knows about these:

void myTx(Tx tx) {... }
atomic(myTx);

Fairly clean and sim­ple. One down­side is that because the atomic func­tion is just passed a func­tion pointer, the com­piler may not be able to inline the call to myTx. It is doubt­ful that it’d make much dif­fer­ence here, but it is nev­er­the­less a valid concern.

The sec­ond approach fixes this, as well as enabling some more flex­i­bil­ity: Func­tors, or func­tion objects, which every C++ pro­gram­mer should know about, but unfor­tu­nately, not every­one does:

struct myTx {
  void operator()(Tx tx) {...}
};
atomic(myTx());

Now we’re pass­ing in an object of type myTx, which means that the com­piler knows exactly which func­tion to call: myTx::operator()(Tx). So it can inline this triv­ially, elim­i­nat­ing the above per­for­mance concern.

Per­haps more impor­tantly, our func­tion can now hold state. It is an object, so it can be given a con­struc­tor and destruc­tor. It can be given other mem­ber func­tions to allow the user to inter­act with it. It has become a lot more versatile.

Finally, we can wait for C++0x, which intro­duces lambda expressions:

atomic([](Tx tx) { ...});

And what’s note­wor­thy is that the same atomic def­i­n­i­tion works in all three cases! So the user is actu­ally given the choice of which method to use. Our STM library sup­ports them all.

Isn’t this cleaner than the orig­i­nal approach, of requir­ing the user to imple­ment the whole loop him­self? So why have no one used it before?

Ulti­mately, I can think of two pos­si­ble reasons:

  • the writ­ers, usu­ally very clever researchers and aca­d­e­mics, were not famil­iar with higher-order func­tions, or with func­tional pro­gram­ming in gen­eral, or
  • the writ­ers did not know that this was pos­si­ble in C++

I sup­pose the sec­ond option is the most for­give­able. C++ is a big messy lan­guage, and not every­one are famil­iar with every cor­ner of it. That’s fair enough, but even then, most of this could have been done in C too, by using func­tion point­ers. It’s not that eso­teric, is it? But I sin­cerely hope it is not the first case. I could under­stand if Joe Coder at Insignif­i­cant Soft­ware Inc. had never heard of func­tional pro­gram­ming, but most of these are com­puter sci­ence professors!

Inci­den­tally, this should also pro­vide a nice exam­ple for when an imper­a­tive pro­gram­mer asks “how would it ben­e­fit me to learn a func­tional language?”

Here we have a clear exam­ple of how using a very sim­ple func­tional tech­nique can sig­nif­i­cantly clean up the code and make it much more robust1, even in an imper­a­tive lan­guage like C++.


  1. Imag­ine what would have hap­pened in the orig­i­nal loop– or macro-based ver­sions, if the user had dared to put a break or return in their trans­ac­tion… But both are safe in the higher-order func­tion ver­sion. 

Share and Enjoy: These icons link to social book­mark­ing sites where read­ers can share and dis­cover new web pages.
  • Digg
  • del.icio.us
  • StumbleUpon
  • Reddit
  • Technorati

Tags: , , , , ,

2 Responses to Being functional in an imperative language

  1. Leck says:

    You’re such a nerd :p

  2. jalf says:

    Hope­fully that doesn’t come as too much of a surprise… :)

Leave a Reply

Name and Email Address are required fields. Your email will not be published or shared with third parties.