Saturday, January 24, 2009

Refactoring Scala Actors

I've been experimenting in my spare time with the Scala actors library for about a year now. Initially it was mostly as part of my participation in the Widefinder project, but for the past several months I've been working on building more complicated abstractions using the actor library. The challenge with Widefinder, if you chose to keep your parallelism implementation generic and reusable, was to spread line-oriented input stream processing across as many threads as possible (ok, some may disagree with that, but that was my simplified take). This was relatively straight forward because the lines didn't have to be processed in any given order. Actors worked well for the task, and I was able to build a fairly generic set of classes that could be used in a very natural way for any similar problem.

I was feeling pretty good about Widefinder so I decided tackle a more complicated problem: parallelizing XML parsing (I'll write more on that later). In the process of exploring various ideas to help parallelize the parsing of XML I started butting my head against limitations of the actors library. For example, I built an "asynchronous stream" class - something like a normal Scala stream but instead of being entirely lazy, it asynchronously evaluates itself ahead of what has already been consumed, effectively parallelizing computation of the stream. This worked, except that if the stream wasn't fully evaluated the application using the stream wouldn't terminate properly because it's actor was still active. That made sense, as in many actor based applications you wouldn't want them to terminate before all the actors were finished, so I tried to make a custom scheduler that used daemon threads and therefore would allow the application to terminate without all of the actor threads terminating. This seemed like a straight forward task, except that it turned out to not be possible given some current limitations (link to enhancement about schedulers). So I set myself to fixing/enhancing the actors library to meet my needs and submit some patches.

Making fixes and enhancements to the actors library turned out to be a much bigger challenge than I expected. The external API for actors, particularly the documented part, is pretty clean and flexible. However internally, at least to a new set of eyes, there is a lot of spaghetti (count methods and variables that are defined as private[actors] so all the classes in the package can muck with them), oddly named variables and methods, and over a dozen mutable variables on the Actor trait that must all be correctly set in the right combination in order for the actor to work - and completely lacking any sort of structured means of manipulating them. I braved this, submitted some patches, and even wrote several enhancements that I did not submit, but I kept running into subtle problems that seemed almost impossible to track down. I would sprinkle assertions throughout the code only to realize that even when things seemed to work much of what I assuming was really holding. In other words I was getting lucky - or unlucky - as challenges like this are pretty common when doing parallel programming.

By this time I had a reasonable, albeit incomplete and with some inaccuracies, idea of what all those magical instance variables within Actor do and how they correspond to meaningful states. Here is my take on it:

  1. waitingFor - the partial function that the actor is using to wait for a matching message
  2. shouldExit - used to signal that the actor should exit instead of doing normal processing
  3. isSuspended - seems to indicate that the actor is waiting while blocked on a thread
  4. isDetached - the actor is waiting and is not attached to a thread
  5. isWaiting - the actor is waiting in some way
  6. mailbox - queue of messages for the actor
  7. sessions - list used as a stack to keep track of the sender of messages while they are being processed
  8. received - the message that was received, used to pass around a received message, could be replaced with local variable within functions
  9. exitReason - the reason why the actor exited, default to 'normal when the actor is initialized and can be modified at any time, so it is there even when the actor has not yet exited.
  10. onTimeout - used for keeping track of a timeout action so that it can be canceled if a message is received within the timeout period
  11. kill - a function invoked by reaction prior to invoking exit on the actor. It is used by loop and loopWhile to bypass the normal exit process of reactions
  12. trapExit - if false (the default) then the actor will exit when an actor that it is linked to exits, if it's true then it will receive an Exit message
  13. rc - a reply channel - unclear why it is actually needed because it is never used other than to be set in freshReplyChannel on every invokation
  14. links - a list of linked actors. Linked actors are notified or shutdown when the actor that they are linked to shuts down
  15. continuation - the function to be executed when the actor resumes

All those were pretty confusing, and it took me weeks to make any sense out of them - and usually the sense I made contained subtle flaws, so I decided to do something drastic. In my opinion the actors library is in desperate need of being refactored into something that encapsulates its state better and makes it much more explicit so you do not need to be an advanced adept to figure out what is happening inside the library. So I decided to do that refactoring and see where it lead me.

I distilled much of the above instance variables into the following states, which I have represented as immutable objects:

  • NotStarted - the initial state of the actor
  • Active States
    • StartScheduled - the actor has been started but has not yet been attached a thread or started executing its body
    • Running States - states where the actor is bound to a thread and is executing its body
      • Running - the actor is bound to a thread and is executing its body
      • Looping - a special case of running where the body is repeated while a specified condition holds
      • Killing - when an actor is Running it cannot be directly stopped by the library, so it is shifted into the Killing state. When the library receives control while the actor is running it checks to see if the state has been changed to Killing. If the state has been changed to Killing, then it takes actions to regain control from user code so that the actor can be cleanly shutdown.
    • Wait States
      • BlockedWait - the type of wait performed within a receive block where the actor is waiting for a message while attached to a thread. There may be a timeout specified after which the actor will wakeup.
      • DetachedWait - the type of wait performed within a react block when no matching message is available.
    • ContinuationScheduled - similar to StartScheduled, only the processing of the message the actor was waiting for when in the DetachedWait state is scheduled instead of the starting of the actor.
  • Terminal States
    • Finished - the actor teriminated normally
    • Killed - the actor was killed before it could complete execution
    • UnhandledThrowable - an exception was thrown, usually from "user code," and not handled
    • InternalError - an error occurred within the actor library

...and now there are the following instance variables:

  1. state - the state of the actor
  2. mailbox queue of messages for the actor
  3. sessions list used as a stack to keep track of the sender of messages while they are being processed
  4. trapExit - if false (the default) then the actor will exit when an actor that it is linked to exits, if it's true then it will receive an Exit message
  5. links - a list of linked actors. Linked actors are notified or shutdown when the actor that they are linked to shuts down .

Reducing the statefulness of the actor, and more importantly making the primary state explicit instead of implicit, makes the code much easier to both debug and extend.  In working with actors, I found that it's relatively common to have an actor just seem to stop.  It could be because it's waiting for a message, it could be because it's stuck in a loop, it could be any number of things.  Without explicit state, it's very difficult to tell exactly what's happening with an actor.  With explicit state, it's right there in your debugger.

I have the code in a Bitbucket repository in the "actor_state_machine" branch.  Most of the changes are in the actor.scala file and I highly encourage you to take a look at the differences.  Right now it the Scala test suite completes with the same results as the trunk as of a couple months ago, and I've added a couple tests to catch problems that it didn't.  Next I'm going to do a little more refactoring and general code cleanup, then I'm going to focus on expanding the test suite.  The current Scala testing tool, partest, uses actors heavily so it gives them a workout, but there aren't any explicit tests for actors.  Finally, I'm going to try to decouple the states from one another and try to extract their code from the Actor trait so that they can be tested as independent units and are more extensible.  I'm not sure what I'm actually going to do with the code.  I'd be happy to contribute it back to the EPFL, but given it is such a substantial departure I'm not sure they would want it.  So I'll probably end up moving it into another namespace and distributing it as a separate library.

Sphere: Related Content

6 comments:

Nick said...

Have you noticed any change in performance incurred by your better-structured version? Unless you did, I would say the Scala team should seriously consider merging with your version.

Ant thanks for sharing your understanding of the original Actor implementation. Even after reading "Actors That Unify Threads and Events" the internals are rather obscure (even in comparison with j.u.c).

Erik Engbrecht said...

Nick - I intend to do some preliminary benchmarking shortly. I suspect performance will be similar for all but one or two use cases where mine will be higher throughput because my react implementation is very different.

Rich Dougherty said...

Looks like a big job! Have you talked about any of this with Philipp Haller?

I've also been using a state variable for some concurrency primitives that I've been writing. It really made the code easy to understand (in my opinion!).

Every method ends up looking like this:

def method = synchronized {
  state match {
case A => ...
case B(x) => ...
}
}

Like you say, you don't have lots of member variables hanging around that are unused. The state object contains the minimum set of variables that are currently used. Also the names of states provide excellent documentation.

Erik Engbrecht said...

Rich - I haven't talked to Philipp yet. It was such a big refactoring I was sure I'd get through it and that it would work. But I intend to engage him after I've done so more cleanup, have some more tests, and maybe some benchmarks.

Anonymous said...

The Actor impl does have lots of little nuances, and if your impl cleans that up, I'd say that is a win.

A great next blog post would be to step us through how various usages of Actors look in your system v Scala's default impl.

muebles baratos said...

I think everybody should glance at this.