Erik Engbrecht's Blog

Martin Odersky's Scala Levels

2011-01-16T22:42:00.003-05:00

I just saw Martin's post on levels of expertise in Scala, and Tony Morris's response. The concept of levels is something that many in the community have called for repeatedly, in order to help ease people - especially team members who may be a little put off by some of Scala's more advanced features - into the language more gently. I understand the motivation, but it has always struck me as somewhat odd. The reason is that I think one needs to have very compelling reason for such a significant change to be justified, especially a change from a mature, mainstream technology such as Java into one as new an immature as Scala. To being, I'll recount my experience with adding Python as a stable in my programming toolbox.

Learning Python

I dabbled it Python a lot before really using it extensively. Much of my initial usage basically amounted to using Jython as a repl for interactively poking at Java libraries, especially ones that weren't particularly well documented. I thought Python was neat, but at the time most of my programming experience was in C++ and Java. I valued static typing. I valued having a compiler. When I was in college, I had a professor who said something along the line so of "The compiler is your friend. Write your code so that the compiler is more likely to catch your mistakes. Don't try to trick it or bypass it." This was in the context of C++ programming, and given all the ways that one can shoot himself in the foot using C++, it always seemed like sage advice. So giving up have a compiler to catch my mistakes, and an explicit structure clearly present in my code, seemed like an awful lot to ask. Sure, Python is more concise, but in my opinion conciseness is often more a matter of how closely the libraries you're using match the problem that you are trying to solve. About a year ago I had some experimental Python code that I had thrown together that did some regular expression heavy text processing then some XML processing (I was processing output of an application that produced invalid XML, so the first phase was the fix it so the XML parser wouldn't crash and the second was to actually process it). I can't remember why, but I decided to convert it to Java. I think it was because I wanted to integrate it with something that integrated well with Java, but not as well with Python. Anyway, the point is that the Python code and the Java code weren't that different in length. Sure, Java has a lot more boiler plate, but once your code starts being dominated by actual logic instead of getters and setters, it's the libraries that make the difference.

Anyway, I didn't really get into Python until I discovered and thoroughly learned metaclasses and descriptors. In my mind, they took Python from being a toy with a clean syntax to a language that opened up an entire new dimension of abstraction. A dimension of abstraction that had been missing all me life. A dimension that I had glimpsed in C++ template metaprogramming but had never quite managed to enter. All of a sudden I had this tool that enabled me to drastically and repeatedly reduce code sizes all while making the business logic cleaner and more explicit. I had a means to encode my ideas such that they enabled my coworkers to use them without fulling understanding them (an admittedly dangerous proposition). It was beautiful! They remain an important part of my toolbox to this day.

Incidentally, my first uses metaclasses and descriptors was to concisely implement immutable, lazy data structures. This was long before I knew anything about functional programming.

Learning Scala

I also dabbled in Scala a lot before really becoming enamored with it. I was caught by the conciseness over Java, especially the absence of getters and setters, the presence of operator overloading, and multiple inheritance using traits. But I had operator overloading and multiple inheritance in both C++ and Python. I could usually abstract away explicit getters and setters in Python, and by the time I started picking up Scala I had already made a significant shift towards immutability and laziness, so setters weren't as necessary. Scala was neat, and the compatibility with Java was a real benefit, but at first it didn't seem compelling. In fact, when I first started experimenting with Scala it seemed too buggy to really be worth it. Something worth watching, but not yet worth using. Also, by the time I started learning Scala I had already learned Common Lisp and functional programming with full tail-call optimization seemed awkward (and still does).

What eventually sold me on Scala was higher-kinded types, companion objects, and objects-as-modules. You see, but the time I started picking up Scala I had already developed a habit of using a lot of generic programming. This is easy in Python where there's no compiler check a type system to tell you that you are wrong. But a lot of the patterns I commonly used couldn't be translated into Java generics (in fact, in my first foray into Java generics I crashed javac writing code that was compliant with the Java specification). Scala gave me concise programs and type safety! And with the introduction of collections that actually use higher-kinded types in 2.8.0, along with the leaps and bounds made in robustness, I think it finally reached threshold for serious use.

That's not to say these are the only key features. Understanding folds was a real eureka moment for me. I'm not sure how one can do effective functional programming without them, but perhaps there is a way. Implicit definitions are also key, and they are used so extensively in Scala libraries that I think writing Scala without a decent understanding of them is nothing short of voodoo programming. In short, there's a lot, and the features tend combine together in powerful ways.

Summary

I understand the motivation for establishing levels, but I personally think if one avoids many of the features that Martin has flagged as advanced, then the benefits of Scala don't justify the risk of using such a new technology and the need to retrain people. I think without them one is better off finding a better library or framework for their current stack. Also, between the fact that advanced features are omnipresent in the libraries, and teams will always have someone who wants to use the more powerful features, avoiding them works a lot better in theory than it does in practice. You can't not use them, unless you don't use Scala's standard library and avoid almost any library written explicitly for Scala. You can only pretend that you are not using them.

This is probably going to get me flamed, but reward must justify risk. In many circumstances Scala has the features to justify the risk associated with it, but the justification quickly vanishes as the more powerful features are stripped away. So while I think a programmer may be productive at Martin's lower levels, they don't justify Scala over Java.

Types of Project Failure

2011-01-12T00:00:00.001-05:00

I saw this question on StackExchange and it made me think about all the projects I've been on that have failed in various ways, and at times been both declared a success and in my opinion been a failure at the same time (this happens to me a lot). Michael Krigsman posted a list of six different types a while back, but they are somewhat IT centric and are more root causes than types (not that identifying root causes isn't a worthwhile exercise - it is). In fact, when if you Google for "types of project failures," the hits in general are IT centric. This may not seem surprising, but I think the topic deserves more general treatment. I'm going to focus on product development and/or deployment type projects, although I suspect much can be generalized to other areas.

Here's my list:

Failure to meet one or more readily measurable objectives (cost, schedule, testable requirements, etc)
Failure to deliver value consummate with the cost of the project
Unnecessary collateral damage done to individuals or business relationships

I think these are important because in my experience they are the types of failure that are traded "under the table." Most good project managers will watch their measurable objectives very closely, and wield them as a weapon as failure approaches. They can do this because objective measures rarely sufficiently reflect "true value" to the sponsor. They simply can't, because value is very hard to quantify, and may take months, years, or even decades to measure. By focusing on the objective measures, the PM can give the sponsor the opportunity to choose how he wants to fail (or "redefine success," depending on how you see it) without excessive blame falling of the project team.

Subjective Objective Failure

The objective measures of failure are what receive the most attention, because, well, they're tangible and people can actually act upon them. My definition of failure here is probably a bit looser than normal. I believe that every project involves an element of risk. If you allocate the budget and schedule for a project such that is has a 50% chance of being met - which may be a perfectly reasonable thing to do - and the project comes in over budget but within a reasonably expected margin relative to the uncertainty, then I think the project is still a success. The same goes for requirements. There's going to be some churn, because no set of requirements is ever complete. There's going to be some unmet requirements. There are always some requirements that don't make sense, are completely extraneous, or are aggressive beyond what is genuinely expected. The customer may harp on some unmet requirement, and the PM may counter with some scope increase that was delivered, but ultimately one can tell if stakeholders feel a product meets its requirements or not. It's a subjective measure, but it's a relatively direct derivation from objective measures.

Value Failure

Failure to deliver sufficient value is usually what the customer really cares about. One common situation I've seen is where requirements critical to the system's ConOps, especially untestable ones or ones that are identified late in the game, are abandoned in the name of meeting more tangible measures. In a more rational world such projects would either be canceled or have success redefined to include sufficient resources to meet the ConOps. But the world is not rational. Sometimes the resources can't be increased because they are not available, but the sponsor or other team members chooses to be optimistic that they will become available soon enough to preserve the system's value proposition, and thus the team should soldier on. This isn't just budget and schedule - I think one of the more common problems I've seen is allocation of critical personnel. Other times it would be politically inconvenient to terminate a project due to loss of face, necessity of pointing out an inconvenient truth about some en vogue technology or process (e.g. one advocated by a politically influential group or person), or idle workers who cost just as much when they're idle as when they're doing something that might have value, or just a general obsession with sunk costs.

This is where the "value consummate with the cost" comes in. Every project has an opportunity cost in addition to a direct cost. Why won't critical personnel be reallocated even though there's sufficient budget? Because they're working on something more important. A naive definition of value failure would be a negative ROI, or an ROI below the cost of capital, but it's often much more complicated.

It's also important to note here that I'm not talking about promised ROI or other promised operational benefits. Often times projects are sold on the premise that they will yield outlandish gains. Sometimes people believe them. Often times they don't. But regardless, and usually perfectly possible to deliver significant value without meeting these promises. This would be a case of accepting objective failure because the value proposition is still there, it's just not as sweet as it was once believed to be.

Collateral Damage

Collateral damage receives a fair amount of press. We've all heard of, and most of us have one time or another worked, a Death March project. In fact, when I first thought about failure due to collateral damage, I thought any project causing significant collateral damage would also fall under at least one of the other two categories. It would be more a consequence of the others than a type unto itself. But then I thought about it, and realized experienced a couple projects where I feel the damage done to people was unacceptable, despite the fact that in terms of traditional measures and value delivered they were successful. There's a couple ways this can happen. One is that there are some intermediate setbacks from which the project ultimately recovers, but one or more people are made into scapegoats. Another way arises from interactions among team members that are allowed to grow toxic but never reach the point where they boil over. In my experience a common cause is an "extreme" performer, either someone who is really good or really bad, with a very difficult personality, particularly when the two are combined on a project with weak management.

Now, you might be wondering what the necessary causes of collateral damage are. It's actually fairly simple. Consider a large project that involves a large subcontract. The project may honestly discover that the value contributed by the subcontract does not justify its cost, or that the deliverables of the subcontract are entirely unnecessary. In this case the subcontract should be terminated, which in turn is likely to lead to a souring of the business relationship between the contractor and the subcontractor, along with potentially significant layoffs at the subcontractor. Often times a business must take action, and there will be people or other businesses who lose out through no fault of their own.

Back to Trading Among Failure Types

A Project Manager, along with projects sponsors and other key stakeholders, may actively choose which type of failure, or balance among them, is most desirable. Sometimes this "failure" can even really be success, so long as significant value is delivered. Some of the possible trades include:

Failing to meet budget and schedule in order to ensure value.
Sacrificing value in order to meet budget and schedule...
...potentially to avoid the collateral damage that would be inflicted in the case of an overt failure
Inflicting collateral damage through practices such as continuous overtime in order to ensure value or completing on target
Accepting reduced value or increased budget/schedule in order to avoid the collateral damage of the political fallout for not using a favored technology or process

Ultimately some of these trades are inevitable. Personally, I strongly prefer focusing on value. Do what it takes while the value proposition remains solid and the project doesn't resemble a money pit. Kill it when the value proposition disappears or it clearly becomes an infinite resource sink. But of course I know this is rather naive, and sometimes preventing political fallout, which I tend to willfully ignore in my question for truth, justice, and working technology; has an unacceptable if intangible impact. One of my most distinct memories is having a long conversation with a middle manager about a runaway project, and at the end being told something like "Erik, I agree with you. I know you're right. We're wasting time and money. But the money isn't worth the political mess it would create if I cut it off or forcefully reorganized the project." I appreciated the honesty, but it completely floored me, because it meant not only that politics could trump money, but politics could trump my time, which was being wasted. Now I see the wisdom in it, and simply try to avoid such traps when I see them and escape them as quickly as possible when I trip over them.

StackOverflow and Scraper Sites

2011-01-03T12:49:00.001-05:00

I recently noticed that Google searches were turning up a lot of sites that mirror StackOverflow content as opposed to the originals. It appears that I'm not alone. This morning Jeff Atwood blogged about how they're having increasing problems with these sites receiving higher Google rankings. His post, and especially the comments, are filled with righteous indignation about how it's the end of the Internet as we know it. Of course, I can't remember a time when search results weren't polluted faked-out results, so I don't understand why he's so surprised.

But I do think this situation is different. The difference is that in many cases the presentation is far different from my previous exposure to link farms. Historically, my impression is that pages on such sites generally have the following attributes:

The recycled content is poorly formated and often truncated
Unrelated content from multiple sources is lumped together in a single page (usually there is some common keyword)
Advertisements are extremely dominant and often unrelated to the content
The link-to-content ratio is often very high, and the links are to unrelated content

In fact, I think if one were to develop a set of quantitative metrics that could be automatically measured for pages based on the above criteria, I think StackOverflow would perform worse than some of the sites that mirror its content. It's rather ironic, but if you think about it, if you wanted to defeat an algorithm that was developed to find the "best" source for some common content, you'd do everything you could to make your scraper site look more legitimate than the original. Let's compare a question on StackOverflow with it's cousin on eFreedom.com.

Analysis

The primary content is essentially the same with similar formatting. The two major differences are that eFreedom only directly displays the selected answer, as opposed to all of the answers with the selected answer on the top, and none of the comments are displayed. This may help avoid triggering the "unrelated content" rule, because the selected answer is probably the most cohesive with the question, and comments frequently veer off-topic. But I suspect the affect is minimal.

Now consider the advertisements. eFreedom has a few more, but they are more closely tied to the content (using Google AdWords, which probably helps). The advertisements on StackOverflow are for jobs identified via geolocation (I live in Maryland), and the words in them don't correlate particularly well to the primary content, even though they are arguably more relevant to StackOverflow users that run-of-the-mill AdWords ones.

Now let's consider the related links. StackOverflow has links to ~25 related questions in a column along the side. The only content from the questions in the title, and the majority seem to be related based on matching a single tag from the question. eFreedom, on the other hand has links to 10 related questions (they appear to be matching on both tags), puts them inline with the main content, and includes a brief summary. As a human being I think the StackOverflow links are much more noisy and less useful. If I try to think like an algorithm, what I notice is StackOverflow has a higher link-to-content ratio, and the links are to more weakly related content.

The only other major difference is that eFreedom has a "social network bar" on the page. I'm not sure how this would affect ranking. It probably helps with obtaining links.

If you look at the HTML for each page, both use Google Analytics, both use what I'm assuming is a CDN for images, and StackOverflow appears to have a second analysis service. On casual inspection, neither appear to be laden with external content for tracking purposes, although without deeper inspection there's no way to be sure. But I presume a having a large number of tracking links would make a site look more like spam, and having few of them make it look more legit.

Conclusion

I don't think either site looks like spam, but between the two, StackOverflow has more spam-like characteristics. eFreedom's content and links are considerably less noisy than StackOverflow's. Is eFreedom being a leach? Yes, certainly, but it, and I believe some of the other sites replicating StackOverflow's content, don't look like traditional link-spam sites. In fact, for a person who is just looking for content, as opposed to looking to participate in a community, then eFreedom is at least as good, if not slightly better. There may be a moral argument that the original source should be given priority over replications, but from a pure content quality perspective StackOverflow and its copiers are essentially identical, and computer algorithms aren't generally in the business of making moral judgements. Also, there are many forums and mailing list archives out there that have atrocious user interfaces where the casual searcher is likely better off being directed to a content aggregator than to the original source, so I don't think a general rule giving preference to original sources would be productive. Ultimately, I think open community sites like StackOverflow are going to have to compete with better SEO and perhaps better search and browsing UI's for non-contributors, rather than relying up search engines to perform some miracle, because the truth is that from a content consumption perspective the replicated sites are just as good.

Easier object inspection in the Scala REPL

2010-12-28T18:54:00.003-05:00

If you're like me, you spend a fair amount of time poking at objects of poorly documented classes in a REPL (Scala or otherwise...). This is great compared to writing whole test programs solely for the purpose of seeing what something does or what data it really contains, but it can be quite time consuming. So I've written a utility for use on the Scala REPL that will print out all of the "attributes" of an object. Here's some sample usage:

scala> import replutils._
import replutils._

scala> case class Test(a: CharSequence, b: Int)
defined class Test

scala> val t = Test("hello", 1)
t: Test = Test(hello,1)

scala> printAttrValues(t)
hashCode: int = -229308731
b: int = 1
a: CharSequence (String) = hello
productArity: int = 2
getClass: Class = class line0$object$$iw$$iw$Test

That looks fairly anti-climatic, but after spending hours typing objName. to see what's there, and poking at methods, it seems like a miracle. Also, one neat feature of it is that if the class of the object returned is different from the class declared on the method, it prints both the declared class and the actual returned class.

Code and further usage instructions is available on BitBucket here.

So what exactly is an attribute?

I haven't quite figured that out yet. Right now it is any method that meets the following criteria:

the method has zero arguments
the method is public
the method is not static
the method's name is not on the exclude list (e.g. wait)
the method's return type is not on the exclude list (e.g. Unit/void)
the method is not a "default argument" method

Please post any feedback in the comments here, or better yet in the issue tracker on Bitbucket.

Playing with Scala Products

2010-12-27T21:25:00.003-05:00

Have you ever wanted something like a case class, that magically provides decent (for some definition of decent) implementations of methods like equals and hashCode, but without all of the restrictions? I know that I have. Frequently. And recently I ran into an answer on StackOverflow that gave me an idea for how to do it. The trick is to play with Products. I'm not sure how good of an idea it is, but I figure the Internet will tell me. Here's the basic idea:

import scala.runtime.ScalaRunTime

trait ProductMixin extends Product {
  def canEqual(other: Any) = toProduct(other) ne null
  protected def equalityClass = getClass
  protected def equalityClassCheck(cls: Class[_]) = equalityClass.isAssignableFrom(cls)
  protected def toProduct(a: Any): Product = a match {
    case a: Product if productArity == a.productArity && equalityClassCheck(a.getClass) => a
    case _ => null
  }
  override def equals(other: Any) = toProduct(other) match {
    case null => false
    case p if p eq this => true
    case p => {
      var i = 0
      var r = true
      while(i < productArity && r) {
 if (productElement(i) != p.productElement(i)) r = false
 i += 1
      }
      r
    }
  }
  override def productPrefix = try {
    getClass.getSimpleName()
  } catch {
    //workaround for #4118, so classes can be defined in the REPL that extend ProductMixin
    case e: InternalError if e.getMessage == "Malformed class name" => getClass.getName()
  }
  override def hashCode = ScalaRunTime._hashCode(this)
  override def toString = ScalaRunTime._toString(this)
}

Basic Usage

There's a couple ways to use ProductMixin. The first, and probably the simplest, is to extends one of the ProductN classes and provide the appropriate defs:

class Cat(val name: String, val age: Int) extends Product2[String, Int] with ProductMixin {
  def _1 = name
  def _2 = age
}

The second way is to provide them yourself:

 class Dog(val name: String, age: Int) extends ProductMixin {
  def productArity = 2
  def productElement(i: Int) = i match {
    case 0 => name
    case 1 => age
  }
}

Either way, the result is something that has many of the benefits of case classes but allows for more flexibility. Here's some sample usage:

scala> val c1 = new Cat("Felix", 2)
val c1 = new Cat("Felix", 2)
c1: Cat = Cat(Felix,2)

scala> val c2 = new Cat("Alley Cat", 1)
val c2 = new Cat("Alley Cat", 1)
c2: Cat = Cat(Alley Cat,1)

scala> val c3 = new Cat("Alley Cat", 1)
val c3 = new Cat("Alley Cat", 1)
c3: Cat = Cat(Alley Cat,1)

scala> c2 == c3
c2 == c3
res0: Boolean = true

scala> c1 == c2
c1 == c2
res1: Boolean = false

Dealing with Inheritance

Equality in the presence of inheritance is very tricky. But, possibly foolishly, ProductMixin makes it easy! Let's say you have a hierarchy, and you want all the classes under some root class to be able to equal each other. Here's how it would be done by overriding the equalityClass so that it is the root of the hierarchy (using a not-so-good example):

scala> abstract class Animal(val name: String, val age: Int) extends Product2[String, Int] with ProductMixin {
     | override protected def equalityClass = classOf[Animal]
     | def _1 = name
     | def _2 = age
     | }
defined class Animal

scala> class Dog(name: String, age: Int) extends Animal(name, age)
class Dog(name: String, age: Int) extends Animal(name, age)
defined class Dog

scala> class Cat(name: String, age: Int) extends Animal(name, age)
class Cat(name: String, age: Int) extends Animal(name, age)
defined class Cat

scala> val c = new Cat("Felix", 1)
val c = new Cat("Felix", 1)
c: Cat = Cat(Felix,1)

scala> val d = new Dog("Felix", 1)
val d = new Dog("Felix", 1)
d: Dog = Dog(Felix,1)

scala> c == d
c == d
res0: Boolean = true

The reverse can also be accomplished by overriding equalityClassCheck such that it checks that the classes are equal instead of using isAssignableFrom. That would mean two objects could be equal if and only if they are the same class.

Wrapup

I don't know to what extent this is a good idea. I've only tested it a bit in the REPL. It's neat, but there is one problem that comes to mind: performance. Any primitive members that are elements of the product will end up being boxed prior to usage in the hashCode and equality methods, the extra layers of indirection aren't free, and neither is the loop. That being said, case classes use the exact same method for hashing, so in what way they pay the same penalty, and unless code is really hashing and equality heavy it probably wouldn't be noticeable.

Scala is for VB programmers

2010-08-26T07:44:00.003-04:00

Normally I don't go for flame bait titles. But I haven't finished my morning coffee yet so I can't help myself. There's once again a debate raging across the internet about whether Scala is more or less complex than Java, along with the more nuanced argument that yes, it is more complex but only framework and library developers have to contend with this complexity. This argument usually happens when someone posts a little bit of code like this:

def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That

And then someone responds like this:

Why do not people realize that Java is too difficult for the average programmer? That is the true purpose of Scala, to escape the complexity of Java code! Framework code in Scala, with heavy use of implicit keywords and all kinds of type abstractions, is very difficult. This is correct, but this code is not meant for innocent eyes. You do not use that sort of code when you write an application.

I've seen this type of thinking before. A few years ago I had a bout of insanity and lead an ASP.NET project using ASP.NET 2.0. I had no .NET experience prior to this project. The project failed, although the reasons for that failure are legion and unimportant here. But I noticed something about ASP.NET developers: they have no clue how the technology works. It's a black box. Do you why? Because it is a black box. I searched and searched and couldn't even find a good illustration of the lifecycle for an ASP.NET page that's using events. This type of information is put front and center in the Java world. It's absolutely buried in the Microsoft world. Or at least the parts of it that target the hoards of VB programmers that are undyingly loyal to MS. The framework is magic dust created by the great wizards in Redmond so that you can build solutions for your customers. Do not question the dust. Think about VB. Or, no don't, it might damage your brain. My coffee hasn't quite kicked in, so I should have some immunity, so I'll do it for you. VB is a black box (well, at old school VB6). It was designed to allow people who do not know how to really program, and who will probably never know how to program, to create applications. It's completely flat, opaque abstraction. The dichotomy between the application developer and the framework developer is as high and as thick as the gates of Mordor.

There are many people in the Scala community that claim Scala's complexity can be hidden from the application program. I don't believe them, but there's a chance that they are right. It's technically feasible, and I can see how it could happen if Scala started attracting lots of VB programmers. I can't see how it's going to attract lots of VB programmers, but apparently many people in the Scala community think Scala is for VB programmers. So we'll just have to wait and see...

Scala Actors: loop, react, and schedulers

2010-08-22T21:41:00.003-04:00

One of the unfortunate aspects of many of the "published" (meaning blogged) Scala Actor benchmarks out there is that they rarely pay much attention, if any, to the affects of seemingly idiomatic patterns on performance. Some of the main culprits are:

react versus receive (event-based versus threaded)
loop/react versus recursive react
loop/receive versus receive/while
tweaking (or failing to tweak) the scheduler

I've been working on setting up a "benchmarking framework" in conjunction with experimenting with modifications to the underlying thread pool so that all the possible permutations are automatically tested. What I have right now is a classic "ring" benchmark setup to permute the schedulers and loop/react versus recursive react. The loop/react pattern is more idiomatic (or at least more common), but higher overhead, and it looks something like this:

loop {
  react {
    case Msg(m) => // do stuff
    case Stop => exit()
  }
}

The reason it is high-overhead is that both loop and react raise control flow exceptions that result in the creation of new tasks for the thread pool when they are hit, so for each loop, two exceptions are raised and two tasks are executed. There's overhead in both of the operations, especially raising the exceptions. The recursive react pattern looks like this, so it can avoid the extra exception/task:

def rloop(): Unit = react {  //this would be called by the act() method
  case Msg(m) => {
    // do stuff
    rloop()
  }
  case Stop => // just drop out or call exit()
}

Using loop instead of recursive react effectively doubles the number of tasks that the thread pool has to execute in order to accomplish the same amount of work, which in turn makes it so any overhead in the scheduler is far more pronounced when using loop. Now, I should point out that the overhead really isn't that large, so if the actor is performing significant computations it will be lost in the noise. But it's fairly common to have actors do fairly little with each message. Here's some results from the ring benchmark using 10 rings of 10,000 actors passing a token around them 100 times before exiting. I'm using multiple rings because otherwise there is no parallelism in the benchmark. These are being run on my dual core Macbook.

Scheduler	ReactMethod	Time (sec)
ManagedForkJoinScheduler	LoopReact	45.416058
ManagedForkJoinScheduler	RecursiveReact	25.509482
ForkJoinScheduler	LoopReact	65.268584
ForkJoinScheduler	RecursiveReact	45.85605
ResizableThreadPoolScheduler	LoopReact	98.084794
ResizableThreadPoolScheduler	RecursiveReact	53.379757

The fork/join schedulers are faster than the ResizableThreadPoolScheduler because rather than have all of the worker threads pull tasks off of a single, shared queue; each thread maintains its own local dequeue where it can place tasks directly onto if they are generated while it is running a task. This creates a kind of "fast path" for the tasks that involves much less overhead.

I believe the primary reason ManagedForkJoinScheduler is faster because ForkJoinScheduler does not always leverage the "fast path," even when in theory it could be used. I'm unsure about some of the rationale behind it, but I know some of the time the fast path is bypassed probabilistically in order to reduce the chances of starvation causing deadlock in the presence of long running or blocking tasks. ManagedForkJoinScheduler escapes this particular issue by more actively monitoring the underlying thread pool, and growing it when tasks are being starved. The second reason, and I'm somewhat unsure of the actual degree of the affects, if that ForkJoinScheduler configures the underlying thread pool so that the threads work through the local dequeues in FIFO order, while ManagedForkJoinScheduler configures the pool such that the local dequeues are processed in LIFO order. Processing in LIFO order allows the pool to take advantage of locality with regard to the tasks, basically assuming that the last task generated is the most likely to use data that's currently in cache, and thus reduce cache misses.

The benchmark outputs a lot more information than I captured in the above table. If you'd like to run it, you can obtain the code here. The project uses sbt, so you'll need to have it working on your computer. After you run update in sbt to download all of the dependencies, you can run the ring benchmark as follows:

$ sbt
[info] Building project ManagedForkJoinPool 1.0 against Scala 2.8.0
[info]    using ManagedForkJoinPoolProject with sbt 0.7.4 and Scala 2.7.7
> ringbenchmark
[info] 
[info] == compile ==
[info]   Source analysis: 1 new/modified, 0 indirectly invalidated, 0 removed.
[info] Compiling main sources...
[info] Compilation successful.
[info]   Post-analysis: 79 classes.
[info] == compile ==
[info] 
[info] == copy-resources ==
[info] == copy-resources ==
[info] 
[info] == ringbenchmark ==
[info] RingBenchmark ManagedForkJoinScheduler LoopReact 2 ....output truncated...

You can tweak the benchmarks by modifying the sbt project file. If you do run them, I'm very interested in the results.

Concurrency Benchmarking, Actors, and sbt tricks

2010-08-21T15:09:00.006-04:00

Have you ever noticed that other people's microbenchmarks tend to be hard to run and often impossible to duplicate? And are frequently caveated to the hilt? When it gets down to it, a benchmark is really an experiment, and ideally a scientific experiment. That means all factors that are relevant to the results should be clearly recorded, and the tests should be easy for others to duplicate.

Custom sbt actions for benchmarks

In order to test and run benchmarks on the work I'm doing around creating a managed variant of the JSR-166y ForkJoinPool along with supporting infrastructure for use with Scala Actors, I'm creating a test harness that captures a host of environmental factors about how it was run, and writing sbt actions to make it easy to run the benchmarks and automatically permute the variables.

It still needs a lot of work, but I had some trouble figuring out a really basic task so I thought I'd share it. Basically I wanted to build a Task object that consists of several tasks based on information in the project definition and permuted parameters. It actually pretty easy, as you can see in the snippet below from my project definition:

  /** this task executes the PingPong benchmark using each available scheduler */
  lazy val pingpongbench = pingpongTaskList
  /** produces a sequence of run tasks using all the available schedulers  */
  def pingpongTaskList = {
    val pairs = 100
    val messagesPerPair = 10000
    val tasks = for(sched <- schedulers) yield pingpongTask(sched, pairs, messagesPerPair)
    tasks.reduceLeft((a, b) => a && b)
  }

You can see the whole file here. Basically Task has an && operator that essentially allows you to concatenate one task with another task. This allows you to build a whole chain of tasks. In the example above, I'm having it run the benchmark once for each scheduler configuration. Soon, I'm going to make it permute other parameters. But right now my test harness isn't playing nicely with the schedulers included in the Scala distribution, so first things first.

There's also one other little customization, which is documented, but I think it's important for benchmarking. By default, sbt runs your code in its own process. This can cause problems with multithreaded code, especially if it doesn't terminate properly. It also means the next benchmark to run has to content with any junk that the previous benchmark left around. So I configured sbt to fork new processes. It just required one line:

override def fork = forkRun

Important variables

Here's what I'm capturing for each run right now so that the results can all be dumped into a big spreadsheet for analysis. I'd like to capture more information about the host machine, such as more information about the CPUs and the loading when the benchmark is being run, but haven't got that far yet. Currently these are all captured from within the benchmark process, mostly using system properties and the Runtime object.

Test Name - obviously needed so that results from multiple benchmarks can be stored in the same file
Scheduler - this is my primary variable right now, I want to run each benchmark with each scheduler while holding everything else constant
# of Cores/Processors - essential so that anyone looking at the results has an idea about the hardware used
Java VM Name - different VMs can perform quite differently
Java VM Version - performance characteristics change from version to version (usually getting better)
Java Version - same reason as above, but this is probably the more publicly known version number
Scala Version - this could be important in the future, as it becomes more common for different projects to be on different version of Scala
OS Name and version - again, it can affect performance
Processor Architecture
Approximate Concurrency (number of simultaneously alive actors) - this allows us to examine concurrency levels versus resource consumption, more concurrency does not necessarily mean that more cores or threads would be helpful
Approximate Parallelism (number of simultaneously runnable actors) - this measures how many cores/threads the benchmark can really keep busy
Approximate Total Messages - this estimates the amount of activity that takes place during the benchmark, generally the benchmarks I'm looking at contain very little logic because they are intended to measure overhead introduced by the framework
Total Wall Clock Time (seconds) - as measured using nanoTime within the benchmark process
Initial Thread and Maximum Observed Thread Count - used to examine automatic expansion of the thread pool
Initial Free Memory and Minimum Observed Free Memory - threads use a fair amount of memory, so performance impacts may show up as pressure on the GC as well has contention for the CPU
Initial and Maximum Observed Total Memory - threads use a lot of memory, so it's important to track usage
Verbose - debugging output pretty much invalidates any of these tests

Improving Schedulers for High-level, Fine-grained Concurrency Frameworks

2010-08-08T18:33:00.007-04:00

Quite a while ago, Greg Meredith made a comment that really made me stop and think, and that has been lingering in the back of my mind ever since:

Erik, Alex, John, et al, i'm loathe to hijack this thread -- which is a good one -- but the experience with the lock up described below is really just the tip of the iceberg. Unless specific scheduling contracts and semantics can be worked out, a pluggable scheduling architecture is just asking for disaster. Worse, it means that a whole host of compiler support that could be provided to apps that use actor-based concurrency is null and void because those solutions (be they model-checking or types-based) will critically depend on specific ranges of scheduling semantics. Actors and other higher level concurrency constructs are a major appeal of languages like scala. If they prove themselves to be "not ready for prime time" the impact on perception might not be contained to just that usage of the language. Best wishes, --greg

That wasn't the first time that Greg made a comment along those lines, and it certainly wasn't the last. At one point he offered to allow a few individuals look at some older code that he believed could help. Greg's a really smart guy. If he's worried, we should all be worried.

Empirical Evidence of the Problem

Contrary to what many may think this is hardly an academic problem. A significant portion of the issues people new to Scala Actors have trace to the fact that they expect the scheduler to be making guarantees that it simply does not make. Here's a sampling:

Actors not creating enough threads
Scala Actors different behavior on JRE 1.5 and 1.6
Actor starvation problem (related to previous)
Actor as new thread

There are others, including cases where people have both rightly and wrongly blamed the default scheduler for serious problems with their programs. Most of the above can be traced back to the fact that the default scheduler for Scala Actors uses a slightly modified version of the JSR166y ForkJoinPool, which has issues described below (source):

Like all lightweight-task frameworks, ForkJoin (FJ) does not explicitly cope with blocked IO: If a worker thread blocks on IO, then (1) it is not available to help process other tasks (2) Other worker threads waiting for the task to complete (i.e., to join it) may run out of work and waste CPU cycles. Neither of these issues completely eliminates potential use, but they do require a lot of explicit care. For example, you could place tasks that may experience sustained blockages in their own small ForkJoinPools. (The Fortress folks do something along these lines mapping fortress "fair" threads onto forkjoin.) You can also dynamically increase worker pool sizes (we have methods to do this) when blockages may occur. All in all though, the reason for the restrictions and advice are that we do not have good automated support for these kinds of cases, and don't yet know of the best practices, or whether it is a good idea at all.

The Scope of the Issue

The issue here is neither limited to Scala nor to actor based concurrency. The general consensus in the concurrency community is that locks and threads are not the right abstractions for concurrency in application code (or hardly any code, for that matter), but there isn't any consensus on what the right abstractions are, or if there is consensus it is that the right abstractions are problem specific. There's no one-size-fits-all concurrency abstraction.

The one trait that all high-level or higher-order concurrency abstractions have in common is that under the hood they rely on some sort of managed thread pool and/or scheduler. For purposes here, when I say "scheduler," I mean the layer with which a concurrency framework interacts directly and likely contains framework-specific logic. When I say "thread pool," I mean the layer that directly manages the threads and is concurrency framework agnostic and may even be shared by several schedulers (sharing will be problematic, but I think ultimately necessary). This isn't a hard line, and often times they may be lumped into one. However I think it's a useful dichotomy. I'm also assuming the scheduler and thread pool rely on cooperative thread sharing, and that preemptive multitasking is left to the virtual machine and/or operating system.

The point is, any concurrency abstraction where the user can submit arbitrary code to be executing can ultimately run into serious issues if that user code does something it does not expect. For example, the fork-join scheduler from JSR-166y that the Scala Actors library uses (I think Clojure uses it as well, but it may not) doesn't automatically enlarge its pool in the presence of unmanaged blocks (including normal blocking IO) or simple long-running computations. This results in the majority of the problems I previously cited, because blocking operations are extremely common tasks, and thus a very leaky abstraction.

Key Characteristics of a Good Scheduler

Unfortunately my command of type theory is rudimentary at best, so while Greg could probably describe a well-behaved scheduler using types, I'm going to have to resort to plain English:

The user should not have to think about choosing a scheduler.
The user should not have to think about the size of the thread pool managed by the scheduler.
The user should be able to write code naturally without worrying about any assumptions the scheduler may make
The user should not have to worry about starting or stopping the scheduler
Multiple flavors of concurrency abstraction should be able to share the same thread pool.
The scheduler should impose minimal overhead.

There's probably others, but I think this is a good start.

The Performance Problem

So why hasn't this problem already been solved? Perhaps it has been already and I just don't know about it. If someone has, please let me know. But as far as I can tell it's because there's a perceived nasty performance trade. Here's a quote from Philipp Haller:

The starvation detection mechanism in Scala 2.7 was too expensive to support in the new default scheduler. However, there is a starvation detection mechanism for thread-blocking methods of the `Actor` object (`receive` etc.).

Similar answers have been provided when people have questioned the wisdom of using the FJ framework over the classes included with the JDK. I've tested it a bit and it's true, the FJ framework does yield a significant improvement for certain classes of tasks over JDK classes (although I believe using exceptions for control flow imposes a far larger penalty). Basically it appears that the overhead associated with known solutions to the problem overcomes the introduced benefits of fine-grained concurrency,

ForkJoinPool in a nutshell

What makes ForkJoinPool so fast (other than highly optimized code) is that uses a two-level architecture for its task queues. There's one queue for the pool, and the one dequeue per thread. Most of the data associated with the dequeues is only updated from the owning thread so that tasks can be added and removed without synchronization and minimal volatile and CAS operations. The worker threads also steal work from one another in order to spread out the load. When a worker is deciding the next task to execute, it performs the following checks, using the first one to return a task:

The thread's own local dequeue
The dequeues of other threads in the pool, scanned in a psuedo-random pattern with guiding hints
The common task queue

Threads are only added to the pool if a ManagedBlocker that the pool can see if used to block causing the target concurrency to not the be met. The default target concurrency for the ForkJoinPool is set to the number of available processors. In Scala Actors this is overridden to be twice the number of available processors. The idea behind this is simple: If you're keeping all your processors busy, then adding more threads will just slow you down. As we've seen, problems arise when a higher level of concurrency is needed due to unmanaged blocks, long running computations, or simply the required concurrency being inherently higher than the default concurrency. This happens because as long as a worker is busy or has something in its local dequeue, it won't look elsewhere for tasks. If all the workers are in this state, tasks can simply accumulate, resulting in starvation.

A Crack at Solving the Problem

As you might have guessed by now, I'm trying to solve this problem. Right now I wouldn't use the code for anything beyond testing the code, but despite the fact I'm still wrestling with the internals of ForkJoinPool I've experienced good results so far. My approach is simple: I added in a manager thread that monitors the changes in size of the various queues in the pool, and if they've been flushed since the last check, and grows the pool it tasks appear to be being starved. The overhead imposed on each worker is minimal, as it only has to update a single additional volatile field when it clears out its queue. The overhead of the manager thread is something, but in the benchmarking I've done so far it doesn't seem to add noticeable overhead. Ironically, Scala Actors already maintain a dedicated thread for the scheduler, its just that the heuristic check on the size of the pool were removed in 2.8 (although honestly they never worked that well).

I have two simple tests/micro-benchmarks using Scala Actors with a custom scheduler built on my ManagedForkJoinPool. The SleepyActor benchmark performs extremely poorly if the scheduler doesn't significantly grow the pool, because each actor sleeps on the thread. The PingPong benchmark spawns a bunch of pairs of actors that simply message each other back-and-forth. It is the type of use case where the ForkJoinScheduler shines, at least in terms of throughput. The results on my MacBook are as follows (these are NOT scientific):

Benchmark	Default Scheduler	ManagedForkJoinScheduler	ResizeableThreadPoolScheduler
SleepyActor	92 sec	30 sec	not tested
PingPong	59.49 sec	47.17 sec	66.02 sec

As you can see, performance actually improved with my scheduler. This is because the default scheduler for Scala Actors does not always use the "fast path" and put new tasks on the dequeue of the thread creating them. It only does it about half the time. So for the PingPong test you can see how the thread local queues impact performance.

Conclusions

It's too early to draw solid conclusions, but based on what I've done so far I think I can develop a solid heuristic for managing the thread pool size, and that the overhead will be negligible. The key is to not impose any more overhead on the worker threads, and to keep the computational overhead of the manager thread a low as possible. This means that the heuristic needs to be simple, and that the wait times between checks should be a long as possible, probably dynamically sized based on how smoothly the pool seems to be running.

What I need right now are more tests and benchmarks to cover "real world" scenarios. I also need to test on machines with more cores. My MacBook doesn't exactly present a powerful system. If anyone has suggestions, please post them!

Is the Market for Technical Workers a Market for Lemons?

2010-08-01T08:55:00.004-04:00

I saw this article about Amazon Mechanical Turk this morning, and it struck me that the market for technical talent, and especially in software engineering and IT talent, is a market for lemons. For a little more critical (and hopeful!) look at the idea, take a look at this post at the Mises Institute.

What's a Lemon Market?

The basic idea behind a market for lemons is that the seller has significantly more information about the quality of a product than the buyer (e.g. an information asymmetry). The consequence of this is that buyers are unwilling to pay for more than the perceived average value of the goods on the market, because the buyer does not know whether he is going to get a cherry (a high quality good) or a lemon (a low quality good). This then creates disincentives to sell cherries, because buyers will not pay their market value, and incentives to sell lemons, because they can be sold for more than they are worth. This creates a sort of vicious cycle where high quality goods are withdrawn for the market, thus lowering the average quality of the available goods and consequently the prices buyers are willing to pay. The ultimate result is a race to the bottom in terms of prices, with high quality goods vanishing as collateral damage.

There are five criterion for a lemon market (paraphrased and reordered from Wikipedia):

Asymmetry of information such that buyers cannot accurately assess the quality of goods and sellers can.
There is an incentive for sellers to pass off low quality goods has high quality goods.
There is either a wide continuum in the quality of goods being sold, or the average quality is sufficiently low such that buyers are pessimistic about the quality of goods available.
Sellers have no credible means of disclosing the quality of the goods they offer.
There is a deficiency of available public quality assurance.

The Lemon Market for Technical Talent

The market for technical talent is similar. The information available about most job candidates for technical positions is superficial at best, and completely inaccurate at worst. Furthermore, even when information is available, the buyers often do not have the knowledge required to evaluate it. This even extends existing, long term employees and contractors. Finally, the quality of an employee is often highly contextual - environmental and cultural factors can significantly boost or dampen a person, and those same factors may have the opposite effect on another. So let's rundown the different factors:

Information Asymmetry: Resumes are short, shallow, and often deceptive. Technical skills are very difficult to assess during an interview, and cultural fit can be almost impossible to assess.
Incentives for sellers to deceive buyers: Resumes are deceptive for a reason. It's often stated that you need to pad your resume, because the person looking at it is automatically going to assume it is padded. Furthermore, for almost two decades now technology has been a source of high paying jobs that can be obtained with marginal effort (just turn on the radio and listen for advertisements for schools offering technology training).
Continuum in the quality of goods: This exists in all professions.
Sellers have no credible means of disclosing quality: This is largely but not entirely true. Most paid work that technology professionals do cannot be disclosed in detail, and even when it is there is reason for sellers to doubt the credibility. Employment agreements may also place restrictions of the disclosure of closely related work, even if it is done on one's on time with one's own resources.
Deficiency of public quality assurance: Employment laws and potentials for litigation (at least here in the U.S.) make the disclosure of employee or even outsourcer performance very risky. Individual workers cannot effectively provide guarantees, and those provided by outsourcers are generally meaningless.

All this really amounts to is: Sellers have no good way of providing information regarding their high quality products, and buyers have no good way of obtaining reliable information about a seller's products. The problem centers entirely around making information available and ensuring its accuracy.

What Technology Professionals can do

Technology Professionals need to build up public reputations. We need to make samples of our work, or at least proxies of samples, publicly available. We need to build more professional communities with broader participation and more local, face-to-face engagement.

I think broader participation is the key. If you look at other professions (yes, I know, I'm lumping all "technology professionals" together and I frequently rant against such a lumping), members are much more active in various groups and associations. In many it's expected and even explicitly required by employers. Sure, there are tons of groups on the internet. There are numerous, and often times enormous technology conferences. There are countless technology blogs and community websites. But I think the conferences are generally too big to be meaningful because most attendees essentially end up being passive receptacles for sales pitches (that's the purpose of these conferences) and maybe they do a little networking and learning on the side. I think even the most active community sites really represent very small slices of the professionals they are intended to serve. Most professionals are passive participants in them at best, just like with conferences.

But there are millions of engineers, software developers, and IT professionals out there. Millions! How many of them do you find actively participating in professional communities of any kind? Not many when you really think about it. This is a problem because as long as the vast majority technology professionals have no public, professional identity, the vast majority of employers aren't going to see professional reputation as a useful search criterion or even measure when considering candidates.

What Employers can do

Employers can do one of two things:

Expect applicants for new positions to provide meaningful evidence of their skills and take the time to evaluate such evidence
Just outsource everything to the cheapest company possible. Information on quality is largely unavailable and unreliable, but information on price is readily available and relatively accurate.

You can see which one of those strategies is dominating the marketplace. One requires effort and involves going against the tide. The other is easy (at least to start, it's not easy to make work), and involves running along with the herd.

Except in the limited number of cases were employers, and the hiring managers who work for them, genuinely believe they can achieve a competitive advantage by actively seeking out candidates with demonstrated qualifications (as opposed to a padded resume and fast talk during an interview), I don't think employers are going to seriously consider reputation and objective evidence of skills until such information is easily obtainable and fairly reliable.

Is there any hope?

Yes!

We live and work in an information economy where new forms of information become available everyday. There is no reason to believe just because today employers mostly hire on faith and expect luck to carry them through, the in the future there won't be much more information available. I'm sure there are several companies working on aggregating such information right now. The market is huge. While companies will rarely put much effort into obtaining information and aggregating it into a useful form, they will often pay large quantities of money for it. The key is to make sure the information is there for them to use.

Also, if your resume makes it through all the wickets to a real hiring manager, if you provide an easy way for him to find more good information about you, he'll probably use it. Interviewing people is a lot of work. Deciding among candidates can be very hard. Extra information will almost certainly be used. It just has to be easy to obtain. People are lazy.

But what about old fashioned networking?

I think the numbers involved are too large and relying on old-fashioned networks tends to yield too poor of results. Recommendations and referrals are certainly useful and lead to many, many hires. But they tend to be made more based on personal relationships than based on real qualifications and therefore need to be checked. Schmoozing isn't a real technical skill. That being said, it could quite likely get you a job. It's just that in general I don't want to work with such people in a technical capacity, so I'm not going to recommend people go do it in order to obtain technical jobs. I'm selfish that way.

Higher-Level versus Higher-Order Abstraction

2010-07-27T21:35:00.009-04:00

Engineering in general, and software engineering in particular, is all about abstraction. The creation and utilization of abstractions is a core part of the daily activities of every good engineer, and many engineers are on a never ending quest to increase their level of abstraction. Abstraction both increases productivity and increases the complexity of problems that can be tackled. Few would argue that increased abstraction is a bad thing.

But people do argue about abstraction, and often condemn abstractions that they view as unfit or too complex. It is common to hear a software developer praise the merits of one means of abstraction and then demean another in the same breath. Some abstractions are too general. Some aren't general enough. Some are simply too complex or, ironically, too abstract. So while engineers almost universally agree that abstraction is good, and more abstraction is better; they often disagree fiercely about what genuinely constitutes "better."

What is an abstraction?

Simply put, an abstraction is something that hides the details involved in realizing a concept, and thus frees the engineer to focus on the problem at hand rather than the details of some other concern. This can take many, many forms. The simplest and most common, so common that the term is often used to describe almost all "good" abstractions, is higher-level abstractions.

Higher-level Abstractions

Higher-level abstractions are fairly simple: they encapsulate details so that they can be used without knowledge or concern about the details. Let's consider a simple example in Scala 2.8 (you can copy and paste these examples directly into the Scala 2.8 REPL):

scala> def sum(data: Array[Double]): Double = {
     |   var i = 0
     |   var total = 0.0 
     |   while(i < data.length) {
     |     total = total + data(i)
     |     i = i + 1
     |   }
     |   total
     | }
scala> def mean(data: Array[Double]): Double = sum(data) / data.length

So we have two functions: one that computes the sum of an array of doubles, and one that computes the mean. Pretty much any programmer (professional or otherwise) would feel confident both writing and using such functions. They are written in a modern multi-paradigm programming language yet I bet if you went back in time and showed them to programmers in some of the very first high-level languages they would recognize exactly what they do. They clearly encapsulate the details of performing certain computations. But what's more interesting about them is what's missing from them:

Polymorphism
Closures or first-class functions
Usage or creation of higher-order functions
Other Scala magic such as type classes and implicit parameters

In other words, they provide a simple, layered, hierarchical abstraction in a very bounded way. If we step away from software for a minute, you can imagine a digital designer placing a symbol for an adder or a register on a schematic without worrying about the arrangement of transistors that will be required to realize them in an actual circuit, or a mechanical engineer selecting a standard screw or clamp. These are parts that can be simply built, tested, and composed into larger devices.

Higher-Order Abstraction Take 1: The Mathematics of Fruit

Imagine I have some apples. But unfortunately I'm here, off in the internet, show I can't show them to you. I need an abstraction to tell you about them. For example, I could say I have two apples. Two is a number, and numbers are abstractions. I can use the same number two to describe the number of McIntoshes in my refrigerator or the number of Apple Macintoshes I own. Now let's say I also want to talk about my strawberries and blueberries. I have 16 strawberries and 100 blueberries. How many pieces of fruit do I have? 118! How did I figure that out? I used arithmetic, which is an abstraction of higher order than numbers. How let's say I want to know how many days it will be before I will have eaten all my strawberries. I can write an equation such as: current_fruit - pieces_per_day * number_of_days = fruit_after_number_of_days. I can do even more with this by solving for different variables. This is algebra, and it is a higher-order abstraction than arithmetic. Now let's say I want to build upon that so that I can study the dynamics of the amount of fruit in my refrigerator. I purchase fruit and put it in my refrigerator. I take fruit out and eat it. I forget about fruit and it gets moldy, the I remove it and throw it away. I can capture all of these as a system of differential equations, and using calculus describe all sorts of things about my fruit at any given point in time. Calculus is a higher-order abstraction that algebra. In fact, abstractions similar to the ones built with calculus are what I mean when I say "higher-order abstraction."

At each step up the chain of mathematics both the generality and number of concepts that can be conveyed by a single abstraction increased. In the case of calculus it becomes essentially infinite, and that's the essence of higher-order abstractions: they deal with the infinite or near-infinite. Also, observe that almost everyone from relatively small children on up understand numbers and arithmetic, most adolescents and adults can stumble through applying algebra, and only a very small portion of the population knows anything about calculus, much less can effectively apply it. Also observe that the functions I defined earlier are just below algebra in terms of their order of abstraction.

Higher-Order Abstraction Take 2: Programming in Scala

Let's see if we can sprinkle some higher-order abstraction into our previously defined functions:

scala> def sum(data: Traversable[Double]) = data.foldLeft(0.0)(_ + _)

This definition of sum exposes one higher-order abstraction (polymorphism - it now can use any Traversable[Double] instead of just an Array[Double]), and it uses higher-order functions in its implementation to perform the summation. This new definition is both much shorter and much more general, but it's still relatively approachable, especially for use. Calling it with an Array[Double] works as before, and now it can be used with any number of collections, so long as the collections contain doubles. But forcing the collections to contain doubles is very limiting, so let's see if we can do better:

scala> def sum[T](data: Traversable[T])(implicit n: Numeric[T]): T = {                                  
     |   import n._
     |   data.foldLeft(zero)(_ + _)
     | }

Ahhh, that's better! Much more general. Not only will this work for any numeric type defined in the Scala standard library, but for any numeric type for which the Numeric type class has been defined! It doesn't even need to be in the same class hierarchy! In order to introduce this new level of generality, we've also introduced the following higher-order abstractions:

Classic polymorphism (Traversable instead of Array)
Parametric polymorphism (the type parameter T for various classes)
Higher-order functions and closures (foldLeft and it's argument that does addition)
Type classes (well, Scala's flavor of type classes, e.g. Numeric)

Now, this is the point where people start screaming that abstraction has gone too far. Many professional programmers would look at it and think "WTF?" They could still guess what it does, but the mechanics are quite elusive for anyone that doesn't know Scala reasonably well. That being said, the code is still far more compact than the original imperative version and is extremely general (to which someone replies "Yes, but it's slow a heck compared to the original!"). At this point I would say the order of abstraction has went from being just below algebra to being on par with integral calculus. Just like with mathematics, we see a significant drop off in the number of people who readily understand it.

Higher-Order Abstraction Take 3: Conclusions

Let's consider a short, incomplete list of higher-order abstractions, means of abstraction, and fields that rely upon higher-order abstractions:

Higher-order functions
Polymorphism and parametric polymorphism
Metaclasses such as in Python and Smalltalk
Rich macros such as in Lisp or C++ Templates
Calculus and all the mathematics built upon it (and various other forms of advanced math)
First-order Logic and its kin, and all the higher-order logics
Any physics or other sciences that are built upon calculus
Any engineering or other applications that are built upon physics

Higher-order abstractions tend to exhibit one or more of the following traits:

They deal in infinities (e.g. integration and differentiation in calculus, universal and existential quantification in first-order logic, polymorphism and type classes in programming)
They deal in transformations (e.g. integration and differentiation in calculus, metaclasses and macros in programming)
They rely upon composition rules (e.g. chain rule and integration by parts in calculus, higher-order functions in programming)
Learning to use any given form of higher-order abstraction requires significant effort and discipline, therefore they tend to be the tools of specialists
They form the foundation of our modern, technological society.

The last two can be converted into conclusions:

Higher-order abstractions are critical for most interesting forms of science and technology
You can't expect a colleague or customer from another field to understand the higher-order abstractions in your field, even if his own field is totally immersed in higher-order abstraction

Windows 7 Upgrade (so far)

2010-07-24T20:32:00.002-04:00

A few weeks ago my wife's Lenovo laptop caught a really nasty virus. Symantec antivirus couldn't clean it up. But I had recovery media (although it was Vista instead of XP), so I used Knoppix to low-level the HD and flashed an updated BIOS (I've had previous experience with viruses that could be stopped via nothing less). Unfortunately the restore media didn't work...it would boot up, think a while, and the reboot. I also tried a XP install using an old XP disk I had laying around, but the XP installer doesn't get along with the new SATA controller in the laptop, so it BSODs. Working around this problem seems to require creating a special driver disk, which seems to require Windows, which creates a problem because all I have are Leopard (about 2 crashes in almost 3 years, a no viruses, versus what seems like a plague on my wife's laptop every 6 months), and Knoppix. So, being a person who value's his time more than his money, I went out and bought a Windows 7 Professional upgrade. All the Microsoft related blogs assured me this would work, and indeed I have a working install the went almost a smooth as butter. The "almost" is the activation process. I have yet to successfully activate. Going into this my assumption was I would have to call Microsoft in order to activate. What I didn't realize is that their human systems don't seem any smarter than their automated systems. So first I called the phone number that Microsoft's support website told me to call. Several layers deep this told me to call a special activation line. I want to commend Microsoft at this point for avoiding such advanced technology as call transfers. You really can't trust technology that's decades old. It's better to let people transfer themselves. So I call the line, and speak to a very friendly automated system. It's asking me for my "installation id," but there isn't one. There isn't one because the computer is connected to the internet, and some genius decided that if one has the internet one would never use the phone system, which would normally be true if some brilliant licensing policy maker hadn't decided that the only way to activate a "clean install" is via the phone system. Anyway, the nice automated system transfers me to a human being. Guess what the human asks me for? An installation id. I tell him I don't have one. I read the various things on my screen to him. He then asks if I am connected to the internet. I say yes. He tells me to disconnect. I disconnect and start over, ultimately ending up in the same place with no installation ID. The support person gives up and tells me I have a deep technical problem, which will require technical support, not activation support (I'm now imaging myself setting up a conference call with tech support, activation support, and the nice automated system lady). Tech support is of course only a US working hours affair, not a Saturday evening affair. It turns out the automated support guy just wasn't persistent enough. After hanging up, I reboot the computer and try again without internet. It appears to get me to the same place, except when I hit the magic "next" button, instead of receiving a message telling me that my product key is not valid, I receive an installation ID and am told to call the activation line (which is a 3rd phone number, but gets me to the same place as the first). I want to take a moment to point out some huge flaws here:

The automated system just assumed I would have an installation ID, and had no explanation of how I was supposed to obtain one or why I might not see one, while the activation software is explicitly designed to not provide one to someone connected to the internet
The human being had a vague idea that being connected to the internet was a problem, although he didn't realize it until repeating his requests half a dozen times, and even though he had a vague idea how to induce Windows 7 into providing one, he didn't actually know and gave up quickly
About 2 minutes would of poking after 30 minutes of bouncing around on the phone rewarded me with an installation id

Ok, so let's try again. This time I provide the installation id to the automated system, and the automated system informs me that it is invalid, and sends me to another human being. After 5 or 10 minutes the human being informs me that I need to have a Microsoft OS already installed in order to use an upgrade, and that I cannot use a clean install. He says it is impossible that Windows XP would refuse to install, and thinks it's completely reasonable that I would install one OS just to install another. I rant at him for a few minutes about how my product packaging says nothing about not being able to do a clean install (in fact, it's the only way to upgrade from XP according to the included instructions, although maybe it does some magic if the installer reformats the HD instead of the user already doing it). Anyway, I explain this all to him while trying to remain calm. He puts me on hold. Then when he comes back he tells me that he gave me wrong information because a server is down, and that I need to call back in 30 minutes when the server is up. So I did that. This time after I read my installation ID to the support person, which he told me was invalid. He told me to exit activation and enter my product key again. I dutifully did this, then he put me on hold while he waited for my product key to come up. This of course would never happen, because I wasn't connected to the internet, and it fact cannot obtain an installation id when connected to the internet. After a few minutes on hold, I was transfered to Microsoft technical support, which of course was closed. So let's recap so far:

None of the Microsoft activation people or systems know what you need to do in order to obtain an installation id, all of them expect you to just have one
In order to obtain an installation id, as far as I can tell you can't be connected to the internet (certain options only appear if you are not connected)
Some support people seem to assume that you are connected to the internet, even though the information they request will never come up if you are not connected to the internet
Support people will transfer you to a line that will not answer without even telling you that you are being transfered
That line seems to only be open during US business hours, which would mean they seem to assume that either you professionally support Microsoft products or are unemployed.

If I can't make this work tomorrow, I think I'm going to obtain an updated version of MS Office for my Mac and give it to my wife, then just install Ubuntu on her laptop.

Changing Tastes

2010-01-16T10:31:00.000-05:00

When I was a kid, I hated onions, green peppers, and mushrooms. I used to tell people I was allergic to mushrooms so they wouldn't try to make me eat them. I hated any sort of chunky sauce or really textured meat. I think I wanted everything to have either the consistency of a chicken nugget or ketchup. My parents used to tell me that when I was older my tastes would change. That I liked crappy food, disliked good food, and eventually I would realize it. They were right.

So kids like chicken nuggets and ketchup. Wow, huge revelation. What does this have to do with technology? I'm on the steering committee for an internal conference on software engineering that my employer is holding. I'm the most junior person on the committee, and most of the members are managers who have more managers reporting to them. Our technical program committee (separate, more technical people on it, but all very senior) just finished abstract selection and we've been discussing topics and candidates for a panel discussion. During this process I have been absolutely shocked by how my tastes differ from those of my colleagues.

I've noticed that within the presentations selected topics on process, architecture, and management are over-represented. On the other side, many of the abstracts that I thought were very good and deeply technical fell below the line. I can't quite say there was a bias towards "high level" topics, because I think they were over-represented in the submissions. Given the diversity of technology that's almost inevitable. A guy doing embedded signal processing and a guy doing enterprise systems would most likely submit very different technical abstracts, but ones on management or process could be identical. It's almost inevitable that topics that are common across specialties will have more submissions.

There's a similar story with the panel discussion. I wanted a narrower technical topic, preferably one that is a little controversial so panelists and maybe even the audience can engage in debate. My colleagues were more concerned with who is going to be on the panel than what they would talk about, and keeping the topic broad enough to give the panelists freedom.

What's clear is that my colleagues clearly have different tastes in presentation content than me. I think they are genuinely trying to compose the best conference they can, and using their own experiences and preferences as a guide. I think their choices have been reasonable and well intentioned. I just disagree with many of them. If I had been the TPC chair, I would have explicitly biased the selection criteria towards deeper, technical topics. Those are the topics I would attend, even if they are outside my area of expertise. I would use my preferences as a guide. But that leaves me wondering, in another five years or ten years are my tastes going change? My tastes have certainly changed over the past decade, so I have no reason to believe they won't change over the next. Will I prefer process over technology and architecture over implementation? Will I stop thinking "show me the code!" and "show me the scalability benchmarks!" when see a bunch of boxes with lines between them? I don't think so, but only time will tell. When I was a kid, I would have never believed that I would ever willingly eat raw fish, much less enjoy it, but today I do.

Your Company's App

2010-01-05T22:59:00.001-05:00

Tim Bray just posted a blog about how Enterprise IT is doing it wrong. I can't really argue with that. He goes on to explain that Enterprise IT needs to learn from those companies building Web 2.0, because they deliver more functionality in less time and for a whole lot less money. This is where his argument breaks down. The problem is the type of Enterprise Systems he's talking about aren't Twitter, they're your company's app.

I work at a pretty big, stodgy, conservative company. I'm pretty sure, as far as things like ERP and PLM are concerned, my employer is exactly the type of company Tim is talking about, and like I said - he's probably right about the problem. But based on my own experiences and observations at my one lonely company, I think he's wrong in his suggestions. Comparing ERP and Facebook is like comparing apples and Apple Jacks.

The reason I think this is that in terms of deploying Enterprise 2.0 applications I think my employer has done fairly well. We have...

A wiki
A "business networking" site, kind of like Linkedin
A microblogging site
A question & answer board similar to Stack Overflow
An on-demand cloud computing environment, kind of like Amazon E2C

...and probably more stuff that I'm not aware of yet. Some of the above were bought, some were built, some were cobbled together with stuff from a variety of sources. I think most of them were built and deployed, at least initially, for costs about as competitive as possible with Web 2.0 startups and in reasonable timeframes. Of course, this means intranet scale at a Fortune 100 company (or in some cases a subset of it), not successful internet scale.

The problems the above face is much like the problem your typical startup faces: attracting users, keeping investors (sponsors) happy, dealing with the occasional onslaught of visitors after some publicity. But these problems are very different from the problems a traditional Enterprise Application faces. There is SOX compliance. There are no entrenched stakeholders. There are no legacy systems, or if there are they aren't that important. If only 10% of the potential user community actually uses the application, it's a glowing success. Negligible adoption can be accepted for extended periods while the culture adjusts, because the carrying cost of such applications is low.

But enterprise systems needs to deal with SOX. They have more entrenched stakeholders than you can count. These days there's always at least one legacy system, and often several due to disparate business units and acquisitions. If these systems fail, business stops, people don't get paid, or the financials are wrong. If only 90% of your buyers use your ERP systems to process purchase orders, it's an abject failure and possibly endangering the company.

A year or two ago someone "discovered" that a very common, important record in one of our internal systems had 44 (or something like that) required fields, and decided that this was ridiculous. A team was formed to streamline the processes associated with this record by reducing the number of fields. A detailed process audit was conducted. It turned out that every single one of them played a critical role in a downstream process. All the fields remained, and some old timers smiled knowingly.

As many humorless commenters pointed out on Eric Burke's blog, your company's app is your company's app for a reason, and often it's a good reason. These applications aren't complicated because of the technology. Internet applications are complex due to extrinsic forces - the sheer number of users and quantity of data that can deluge the application at any given moment. Enterprise systems tend to be the opposite. Their complexity is intrinsic due to the complexity of the diverse processes the support. The technology complexity (most of which is accidental) is secondary. Tim's suggestions provide a means of addressing technological complexity, and of building green field non-business critical applications in the face of uncertain requirements. They don't provide a means for dealing with critical systems laden with stakeholders, politics, and legacy.

I think the solution, or at least part of the solution, to the problem with enterprise systems lies in removing much of the functionality from them. There are things that must be right all of them time, but most of business (and engineering, etc) exists on a much fuzzier plane. The problem comes from coupling the precise (e.g. general ledger) with the imprecise (e.g. CM on an early stage design), and thus subjecting the imprecise to overly burdensome controls and restrictions. Only after this separation has been carefully implemented can functionality evolve in an agile fashion.

Pondering Actor Design Trades

2009-06-21T21:20:00.002-04:00

There's been a lot of discussion of the Scala actors library lately, much of it critical, and a recent flurry of alternate implementations. The alternate implementations (except my languishing state-based one ;-) all have one thing in common: They are several orders of magnitude simpler. Writing a basic actor implementation is actually pretty trivial, especially given java.util.concurrent classes that provide a decent chunk of the functionality in Scala actors, all for free on JDK5+. So this begs the question few questions:

Why is the standard Scala actor implementation so complex when others have done it in a such simpler fashion?
Is it better to have one, big actor library that supports a wide variety of use cases, or a bunch of smaller ones targeted at specific niches and programming styles?
If there are to be a bunch, should they just be conceptually similar (e.g. all based on the actor model), or should there be interoperability among them?

I'm not going to answer these questions now. Instead, I'm going to try to start laying out some of what I believe to be the key characteristics of an actor implementation, and how they detract or enforce one another. So here it goes:

Guarantees
Expressivity
Extensibility
Performance
Scalability

Guarantees

The purpose of a concurrency framework is to make concurrency easier. Concurrency is hard largely because it is extremely difficult to reason about, and thus concurrent code tends to be hard to write, laden with bugs, and subject to various odd pitfalls. By providing various guarantees, a concurrency framework makes it easier to reason about concurrent code. Actors are intended to free the programmer from worrying about things like locks, semaphores, thread management, etc. by encapsulating all that complexity behind a simple interface, assuming the programmer follows some basic rules like "no shared mutable state among actors."

The problem with guarantees is that in they tend to break down in the presence of limited CPU and memory resources.

Expressivity

Expressivity is difficult to define. For purposes here, I'm going to define it as the degree to which a concise, natural expression of the programmer's intent is supported, and illustrate it by comparing Scala Actor to Lift Actor. Scala Actors allow you to execute logic independent of message processing (note: this a violation of the theoretical model for actors) by simply placing it in the act method. Lift Actors, on the other hand, are only triggered when they receive of message (this is consistent with the theoretical model). For example, this makes it so that Scala Actors can do things such as perform possibly costly setup operations in their own thread before they start listening for messages. In order to accomplish this in the Lift model, the programmer must create the actor and then send it some sort of "init" message. The same effect can be achieved with both implementations, but it is more naturally supported by Scala Actors. Of course there is a tradeoff here, as deviating from the theoretical model potentially weakens any guarantees that the model may provide. The Scala Actor way also implies that an Actor has an explicit lifecycle, which as we'll see later has other significant implications.

Another example is what I'll call the "nested react pattern." It is relatively common to want an actor to take on a different behavior after processing a message, thus altering which messages are ignored and how the received messages are processed.

loop {
 react {
    case 'foo => { 
      // do some stuff...
      react {
        case 'bar => // do some other stuff... 
      } 
    } 
  } 
}

The code above alternates between processing 'foo messages and 'bar messages. This can be done with Lift Actor as well, but the expression is a little less natural:

class MyActor extends LiftActor {
  private val fooMode: PartialFunction[Any, Unit] = {
    case 'foo => {
      // do some stuff
      mode = barMode
    }
  }
  private val barMode: PartialFunction[Any, Unit] = {
    case 'bar => {
      // do some other stuff...
      mode = fooMode
    }
  }
  private var mode = fooMode
  protected def messageHandler = mode
}

Finally, Lift Actors exclusively use an event-based model and have no support for blocking on a thread while waiting for a message, and thus looses the ability to express patterns such as the following:

loop {
  react {
    case 'converToNumber => {
      val i: Int = receive {
        case 'one => 1
        case 'two => 2
        case 'three => 3
      }
      reply(i)
    }
  }
}

Extensibility

For purposes here, I'm going to use "extensible" to mean that a piece of software is extensible if capabilities can be added without modifying the core or breaking its semantics in a amount of effort proportional to the size of the extension. This is narrower than the traditional definition of extensibility, which also covers the ability of a system to evolve internally. A good example of extensibility is the ability of both Scala Actors and Lift Actors to allow the user to specify a custom scheduler. Other examples could include adding control structures, using a different data structure for a mailbox.

The challenge with extensibility is that in order to enable it, what could otherwise be treated as the internal bits of the library must instead have well defined interfaces for components along with appropriate hooks for inserting them. For example, a while ago I did some work to make the MessageQueue used for the mailbox overrideable (it has temporarily been overcome-by-events due to other changes). This is a small example, but it shows how extensibility requires a greater degree of forethought.

Extensibility also benefits substantially from simplicity. Scala Actors are almost impossible to extend from outside the scala.actors package because of their heavy reliance on package-private methods and state (mostly fixed here, but I broke remote actors in the process so no patch yet). Lift Actors, on the other hand, are very extensible, at least within the bounds of their design (purely event-based actors with no explicit lifecyle). Many of the flow control mechanisms could be implemented on top of the baseline approach.

At this point we see that extensibility has an interesting relationship with expressivity. I previously claimed that Scala Actors were more expressive because the wide variety of control structures they provide (and I didn't even touch on some of the DSL-like functionality that enables all sorts of interesting things). However, given Lift Actors far simpler and more extensible foundation, there is much more opportunity to create custom control structures as extensions to Lift Actors without modifying the core. Thus, if you are willing to do some lower-level programming, it could be argued that Lift Actors are in reality more expressive due to their extensibility.

Performance and Scalability

For purposes here, I'm going to treat performance as the rate a which an actor can receive and process messages at a relatively small, fixed number of simultaneous actors. This means that improving performance in largely a matter of reducing the time it takes from when a message is initially sent to when user-code within the actor begins processing the message, including minimizing any pause between when an actor finishes processing one message and is available to start processing the next. For moderate numbers of actors, performance is often maximized by having one thread per actor, and having the actor block while waiting for a message. Given enough actors, the memory requirements of using a thread for each actor will eventually cause more slowdown than cost of scheduling a new reaction for each message. This is illustrated in Philipp Haller's paper, "Actors that Unify Threads and Events" in the following graph:

Note that the above graph covers a microbenchmark running a simple, non-memory intensive task, and that the thread line is not a measurement of thread-bound actors, but rather of a simple threaded implementation. However, my own benchmarking has shown that receive-based (ones that block on a thread) compare to event-based actors in almost the same way as threads to event-based actors in the above graph. Also, remember that given a real application where heap space is needed for things besides the stacks of thousands of threads the point where the JVM throws an OutOfMemoryError will be much farther to the left. There are also more subtle issues. One of my first experiences with the Scala Actors library was creating a deadlock. I created more thread-bound actors than the scheduler wanted to create threads, and thus actors were stuck blocking on threads waiting for messages from an actor that hadn't started yet because there were no available threads. In other words, blocking can lead to situations such as deadlock, starvation, and simply extreme forms of unfairness with respect to how much CPU time is allocated each actor. These all go against highly desirable guarantees that a actor library should provide outside of extreme circumstances.

Ultimately event-based actors make the better model. For one, part of the reason why event-based Scala Actors are so expensive is that they suspend by throwing an exception to return control from user code to the library. While exceptions have been heavily optimized in the JVM, especially in recent versions, they are still substantially slower than normal return paths. Scala Actors need to use exceptions to suspend is a consequence of their expressivity. Basically, because the library as little or no knowledge of what an actor is doing within a reaction, it cannot rely on traditional returns without introducing special control structures (see reactWhile numbers in one of my previous blogs). Lift Actors, on the other hand, have do not need to use exceptions for control flow because the message processing cycle is essentially fixed - user code cannot intersperse weird (or even not-so-weird) patterns within it, or mix in blocking receives with event-based ones. Another potential optimization of event-based actors is to have them block if there are plenty of threads available, and then release it if the thread they are on is needed by the scheduler. To my knowledge this optimization is not implemented anywhere, but I think it would be relatively straight forward. The only problem is that the actor becomes more tightly bound to its scheduler.

Parting Thoughts

Ultimately, time and community willing, I'd like to evolve what is here, plus solid treatment of a lot of lower-level details, into a Scala Improvement Document (SID). There are a lot of subtle trades involved, and I think producing a general-purpose actors library is at least an order-of-magnitude more difficult than producing a special-purpose one. I also believe that if an actor implementation is part of the standard library, then it should provide the necessary extension points for when users need something special-purpose they can create it and still leverage components of the standard library and interoperate with other actors. In order words, I think it should define both the interface portion of an API along with providing a solid implementation. I don't think we'll even get their without a clear and common understanding of the various considerations involved.

I'm on Twitter!

2009-05-25T22:42:00.001-04:00

For those of you with ADD or it's internet induced equivalents, I've started posting on Twitter. I long avoided it because I feel like the last thing people need is yet another half-baked information stream, but then people seem to like it so I'm giving it a shot. I'll post links to bugs, patches, and other comments regarding my efforts (and those of others) with Scala actors...along with other less important matters. http://twitter.com/ErikEngbrecht

Refactoring Scala Actors: Progress Update

2009-05-25T22:32:00.002-04:00

It's been a while since I've posted, so I thought I'd give everyone a status update. This post covers several different semi-disjoint topics at a fairly high level. I plan on diving into some of the issues later this week and beyond, but for now...

State-machine Based Actors

A while back Philipp Haller, the original author and current maintainer of the Scala actors library, contacted and basically said he found the changes I was making really interesting, but he really needed a smaller, more gradual set of patches. It's a perfectly reasonable request, as I had pretty much completely ripped apart his library. I had rethought my approach, anyway, so I went about moving my state-machine based actor implementation into its own package and rewiring some of the pieces so that they could share common base traits, common infrastructure, and interoperate with one another as if they were the same library. So I shoved my code into scalax.actors, and started hacking insertion points for my code into the main library.

The first thing I thought I needed was a base trait that defines the basic structure and operations of an actor, so created a BaseActor in between AbstractActor and Actor (as well as my own StateActor):

trait BaseActor extends AbstractActor {
  def react(f: PartialFunction[Any, Unit]): Nothing
  def reactWithin(msec: Long)(f: PartialFunction[Any, Unit]): Nothing
  def receive[A](f: PartialFunction[Any, A]): A
  def receiveWithin[R](msec: Long)(f: PartialFunction[Any, R]): R
  /*def loop(body: => Unit): Nothing */
  /*def loopWhile(cond: => Boolean)(body: => Unit): Nothing */
  protected[actors] def mailbox: MessageQueue[Message[Any]]
  private[actors] final def mailboxForChannel: MessageQueue[Message[Any]] = mailbox
  def mailboxSize: Int
  def send(msg: Any, replyTo: OutputChannel[Any]): Unit
  def forward(msg: Any): Unit
  def reply(msg: Any): Unit
  /*protected[actors]*/ def sender: OutputChannel[Any]
  def ? : Any
  def start(): AbstractActor
  def freshReplyChannel: Channel[Any] = new Channel[Any](this)
  def scheduler: IScheduler //TODO: restrict access to scheduler??
}

I don't think BaseActor is going to be a permanent fixture because its contents probably belong in AbstractActor instead, but for now it serves its purpose. One of the first things you should notice is that way to much stuff in there is public. Most of it should be protected, or perhaps somewhere else entirely (like an InputChannel encapsulated by the actor).

Reworking MessageQueue

There's also the issue of the mailbox, which is a rather important and a den of mutable data that is passed all around with private[actors] qualifiers. Basically it separates the Message from the elements within the MessageQueue, so that the MessageQueue can keep its internal structure private, and thus facilitating making it a trait so that an actor can provide its own specialized implementation. I was about to submit a patch for the change, but a fix for a memory leak in FJTaskRunner came about that relied on clearing mutable fields in the message when a task is done processing. I have an alternative fix by changing pieces of FJTaskScheduler2, but schedulers in general and FJTaskScheduler2 in specific are in flux right now due to bugs (here and here and probably elsewhere), and I want to tweak the design a bit, so I'm holding off.

Fixing Schedulers

Which brings me to schedulers... Problems with plugging in custom schedulers (mostly fixed) are what originally caused me to dive into the guts of the actor library. Closely related to schedulers is ActorGC, which is absolutely essential to actors (almost) transparently abstracting threads, but can also be problematic due to it's fundamentally non-deterministic nature (it relies on the garbage collector for some of its more advanced capabilities). That being said, now in trunk ActorGC is optional, so environments that don't require an implicit shutdown of the actor worker threads can avoid the added complexity. I intend to cover the details of ActorGC very soon. There should also be a default scheduler with daemon semantics coming, which has a number of use cases.

Closing Matter

There's a lot more going on. Some recent flare-ups on Scala Internals mailing, despite being a tad melodramatic, brought a welcome focus on actors for the next release of Scala. The issue has also given rise to two minimalistic actor implementations, one in Lift and the other in Scalaz. They both make interesting data points for design and potential interoperability (remember: one of my primary goals is an actor implementation that lets you plug in what you need). There's issues around ActorProxy that I think will be a little hairy to sort out, but I'm confident they will be. And finally, there's the omnipresent issue of ensuring actor's really make the guarantees that they claim (right now I think they do, but I wouldn't place money on it until I have tests to prove it).

That's it for now. I'm going to try to take the time to blog about many of the above issues and more in depth in the coming weeks, and hopefully gain some insights from out in the cloud.

McKinsey and Cloud Computing

2009-04-21T22:39:00.002-04:00

McKinsey has created a tempest-in-a-teapot by denouncing the economics behind both in-the-cloud-cloud such as Amazon E2C and behind-the-firewall clouds for large enterprises. At a high level I think their analysis is actually pretty good, but the conclusions misleading due to a semantic twist. They use Amazon E2C as a model, and their conclusions go something like this:

Amazon E2C virtual CPU cycles are more expensive than real, in-house CPU cycles
You waste 90% of those in-house CPU cycles
You'll waste almost as many of those virtual cloud CPU cycles, only they cost more, so they are a bad deal
You stand a decent shot at saving some of those real CPU cycles through virtualization, so you should aggressively virtualize your datacenter
You're too inept to deliver a flexible cloud behind-the-firewall, so don't even try

I'll let you ponder which of the above statements is misleading while I address some related topics.

The goals of cloud computing are as old as computing itself. They are:

Reduce the time it takes to deploy a new application
Reduce the marginal cost of deploying a new application over "standard" methods
Reduce the marginal increase to recurring costs caused by deploying a new application over "standard" methods

Back in the days of yore, when programmers were real men, the solution to this was time sharing. Computers were expensive and therefore should be run at as a high of utilization as possible. While making people stand in line a wait to run their batch jobs was a pleasing ego trip for the data center operators, the machines still wasted CPU time while performing slow I/O operations and waiting in line generally made users unhappy. Thus time sharing was born, and in a quite real sense the first cloud computing environments, because in many cases a large institution would purchase and host the infrastructure and then lease it out of smaller institutions or individuals.

The problem here is that the marginal cost equations end up looking like a stair-step function. If you had a new application, and your enterprise / institution had excess mainframe capacity, then the marginal cost of letting you run your application was near zero. But if there was no spare capacity - meaning the mainframe was being efficiently utilized - then the marginal cost was high because either someone else had to be booted off or you needed an additional mainframe.

Now fast-forward a couple decades to the PC revolution. Somewhere along the way the cost curves for computers and people crossed, so it became appropriate to let the computer sit idle waiting for input from a user rather than having a user sit idle while waiting for a computer. Now you could have lots of computers with lots of applications running on each one (although initially it was one application at at time, but still, the computer could run any number of them). This smoothed out the non-recurring marginal cost curve, but as PCs proliferated it drove up recurring costs through sheer volume.

Unfortunately this had problems. Many applications didn't work well without centralized backends, and some users still needed more compute power than could be reasonably mustered on the desktop. So the new PCs were connected to mainframes, minicomputers, and eventually servers. Thus client-server computing was born, along with increasingly confusing IT economics. PCs were cheap, and constantly becoming cheaper, but backend hardware remained expensive. The marginal non-recurring cost becomes completely dependent on the nature of the application, and recurring costs simply begin to climb with no end in sight.

Now fast forward a little more. Microsoft releases a "server" operating system that runs on suped up PCs an convinces a whole bunch of bean counters that they can solve their remaining marginal non-recurring cost problems with Wintel servers that don't cost much more than PCs. Now more expensive servers. No more having to divide the cost of a single piece of hardware across several project. Now if you want to add an application you can just add an inexpensive new Wintel server. By this time the recurring cost equation had already become a jumbled mess, and the number of servers was still dwarfed by the PC on every desk, so there no tying back the ever increasing recurring costs. This problem was then further exacerbated by Linux giving the Unix holdouts access to the same cheap hardware.

Thus began the era of one or more physical servers per application, which is where we are today, with McKinsey's suggestion for addressing: virtualization behind the firewall. The problem with this suggestion is that, for a large enterprise, it isn't really that different from the cloud-in-the-cloud solution that they denounce as uneconomical. One way is outsourcing a virtualized infrastructure to Amazon or similar, and the other is outsourcing it to their existing IT provider (ok, not all large enterprises outsource their IT, but a whole lot do).

Virtualization, in the cloud or otherwise, isn't the solution because it doesn't address the root cause of the problem - proliferation of (virtual) servers and the various pieces of infrastructure software that run on them, such as web servers and databases. Hardware is cheap. Software is often expensive. System administrators are always expensive. Virtualization attacks the most minor portion of the equation.

Virtualization is the right concept applied to the wrong level of the application stack. Applications need to be protected from one another, but if they are built in anything resembling a reasonable way (that's a big caveat, because many aren't) then they don't need the full protections of running in a separate OS instance. There's even a long standing commercially viable market for such a thing: shared web hosting.

It may not be very enterprisey, but shared web site/application hosting can easily be had for about $5 per month. The cost quickly goes up as you add capabilities, but still - companies are making money by charging arbitrary people $5 per month to let them run arbitrary code on servers shared by countless other customers running arbitrary code. How many enterprise IT organizations can offer a similar service at even an order-of-magnitude greater cost?

Not many, if any. Yet do we see suggestions pointing out that Apache, IIS, Oracle, SQL Server, and countless other pieces of infrastructure can relatively easily be configured to let several applications share compute resources and expensive software licenses? Nope. They suggest you take your current mess, and virtualize it behind the firewall instead of virtualizing it outside the firewall.

Refactoring Scala Actors: Rethinking the Approach

2009-02-10T20:17:00.002-05:00

When I started refactoring Scala's actor library, I really had several goals:

Reduce the coupling within the library so that the implementation can be more easily extended and customized
Create a more transparent and "programmer friendly" actor implementation
Improve performance of various use cases
Make actors that interact better with their environment, particularly non-actor code
Maintain API compatibility with the existing library to the maximum extent practical

Thus far I've done my work by completely overhauling the Actor trait directly in the library. While I have plans for how this will reduce coupling, the haven't come to fruition yet. I think my state machine model is considerably more transparent and programmer friendly than the current implementation. The state an actor in is always clear, transition are reasonably well defined, and performing correct locking and unlocking is now pretty straight forward. I've substantially improved the performance of event-based actors for the common case where an actor loops while it receives messages until some termination criterion is reached. I haven't done anything with making them interact better with their environment yet, as I believe Philipp Haller is in the process of incorporating some changes for which I submitted patches several months back that will help considerably (he doesn't appear to be using the patches directly, but the changes I've seen are close enough).

A few days ago David MacIver asked me a couple interesting questions on IRC:

Have you considered using Kilim?
Have you looked at the continuations support that appears to be emerging as a compiler plugin?

Using Kilim would almost certainly disqualify the changes I'm making from incorporation into the standard library because of the external dependency on a prerelease framework that uses bytecode manipulation, and I don't think the fledgling continuation support is mature enough to experiment with yet (or maybe just not well documented enough yet, who knows...). That being said, both of these would be interesting avenues of development. Event based actors rely heavily on continuations, and the performance of those continuations has a substantial effect on the performance of the actors. Ultimately a properly decoupled implementation would allow someone to build an API compatible actor implementation on either of these technologies, or something else entirely.

I also received a recent nudge on the mailing list, when someone pointed out that it would be easier to experiment with my library if it was in it's own package. I somewhat disagree with that. It would be easier to experiment with it if you didn't have to do an entire build of the Scala distribution that has my code in it, and unfortunately for the foreseeable future I'm going to be depending on both code that is being committed into the trunk and on the ability to modify said code. Also, the way I have it setup today, if someone wanted to test their Scala application/library/framework against my state-based actors, they would just have to pull my Mercurial repository, build it, and rebuild against the resulting distribution. On the downside, it's a pain to do side-by-side comparisons between the two implementations, because you have to switch distributions and rebuild every time.

That being said, decoupling is one of my primary goals, and Martin & Company have already set the precedence of doing major work in a separate package with their efforts on redesigning the Scala collections library. So as soon as I finish up some of my immediate tasks and have a good build again (I'm in the middle of redesigning the control flow pattern to minimize the stack size when exceptions are thrown), I'm going to push most of the public and protected interface on Actor into a new trait that will be between Actor and AbstractActor as abstract methods, move my code out of scala.actors and into scalax.actors in my own directory. I definitely keep it within the same repository as the Scala distribution code, and will probably just make another directory for it. This means I'll have to mess with Sabbus to add my portion to the build, which won't be fun, but shouldn't be too hard. I'm sure there's going to be a lot more to do to extract it out, so I'll be adding items to my issues list as I think of them.

The end result should be my state based actors and the existing actors being able to live side-by-side and interact with one another in the same program. Assuming I can at least get some patches into the main library, it will also mean that the future of my work will not be dependent on being incorporated into the standard distribution. If it is, great, if not, I can distribute it separately. I would have done this from the start, but initially I was highly dependent on the existing code and infrastructure into order to get something working reasonably quickly, and to be able to smoke it out to see if it indeed worked. Back in December I was basically breaking partest with every change. But now things are reasonably stable, I rarely break partest, and I don't think it will take much for my code to be able to stand on its own.

So what does everyone think? Is this a good direction to head in?

Profiling the ExceptionBlob

2009-02-06T22:37:00.002-05:00

Last week on the Scala IRC channel (#scala) I chatted with several people about the need to substantially reduce the use of exceptions for flow control within the Scala Actors library. My remarks were met with a certain amount of healthy skepticism, as significant progress has been made on improving the performance of Java exceptions, as stated by John Rose in his blog Longjumps Considered Inexpensive.

I highly suggest you go read the post. I don't doubt anything stated in it, and in fact noticed a substantial performance improvement when I stopped benchmarking against Java 5 and started doing it against Java 6 (I usually work on a Mac, and Java 5 is the default VM), and expect even more improvements with Java 7. Once you're done, I'd like to call your attention to a couple fragments of it (additional emphasis mine):

The Hotspot JIT can optimize a throw of a preallocated exception into a simple goto, if the thrower and catcher are together in the same compilation unit (which happens because of inlining).

Here is a concrete example of non-local return, implemented efficiently via a cloned exception. I have observed the Hotspot server compiler emitting a machine-level goto for the throws. With current server VM optimizations of this code, the cost of a non-local return is less than three times the cost of a plain (local) return; with the client VM the ratio is ten. See the code example and benchmark for details: NonLocalExit.java (javadoc).

These statements have a couple implications:

Reducing a throw to a long jump requires a certain amount of HotSpot magic take place to ensure that the throw and catch are in the same compilation unit. The more complex the code is, the less likely this is to happen.
Long jumps are still substantially slower than normal returns

I think John's post received a lot of attention, and people (including myself) saw it, didn't read it carefully, and assumed that HotSpot now performed miracles with cached exceptions. What we missed was a post to the MLVM development mailing list about a year later by Charles Nutter, who had noticed that his non-local returns in JRuby where not being compiled down into long jumps, and subsequent response from John Rose. There's a lot of very good technical discussion in there, but I think the gist is that HotSpot's ability to optimize throws into long jumps is still limited, it often needs help to do so, and very smart people are working at improving it.

Of course that all still leaves the question: Just how much slower are cached exceptions than a normal return? I can't really answer that in the general case, but I did put together a couple mircobenchmarks in Java that somewhat simulate the type of stack creation that happens within my state-based actors. The benchmark is pretty simple. It recursively descends down, incrementing a member variable, and then on the way back up it decrements another member variable. In the case of the exception-based return, it does the decrements within a finally block. This admittedly is rather pathological code, but it is also simple and easy to understand.

Here's the normal return version:

public class NormalReturn {
    public static void main(String[] args) {
        NormalReturn m = new NormalReturn();
        m.doTest(10000000);
    }
    public void doTest(int times) {
        for(int i = 0; i < times; i++) {
            down = 0;
            up = 0;
            deepRecurse(1000);
        }
        System.out.println(down);
        System.out.println(up);
    }
    private int down = 0;
    private int up = 0;
    public void deepRecurse(int i) {
        down++;
        if (down <= i) {
            deepRecurse(i);
            up--;
        }
    }
}

...and here's the version that uses a cached exception:

public class TryFinally {
    private static class Signal extends RuntimeException {
        @Override
        public Signal fillInStackTrace() { return this; }
    }
    private Signal signal = new Signal();
    public static void main(String[] args) {
        TryFinally m = new TryFinally();
        m.doTest(10000000);
    }
    public void doTest(int times) {
        for(int i = 0; i < times; i++) {
            try {
                down = 0;
                up = 0;
                deepRecurse(1000);
            } catch (Signal s) {
                // do nothing
            }
        }
        System.out.println(down);
        System.out.println(up);
    }
    private int down = 0;
    private int up = 0;
    public void deepRecurse(int i) {
        try {
            down++;
            if (down == i) throw signal;
            else deepRecurse(i);
        } finally {
            up--;
        }
    }
}

With profiling turned off on and -server on Java 6 on my Mac, the normal return version takes about 60 seconds to complete, while the TryFinally version takes over 10 minutes. That's certainly much better than the 30x figure John Rose cited for the client VM, but it's still an order-of-magnitude worse than a normal return. One interesting thing about the benchmark is that if you decrease the recursion depth and increase the number of times deepRecurse() is executed the total time stays relatively constant.

So with this in mind, I'm going to work hard to refactor the actors library to keep exception usage to a minimum, an where it must be used to minimize the depth of the stack.

Scala Actors versus the ExceptionBlob

2009-02-06T18:03:00.003-05:00

I've begun some very preliminary benchmarking of my state-based actors library against the current standard actors library and seen some rather interesting results. The test I'm running spawns four pairs of actors and sends about 40 million messages between each pair. The only logic in the actors is to count how many messages they have received, ensure they are received in order, and occasionally send back a "More" message. This is so the mailbox doesn't overfill and cause out-of-memory errors.

The benchmark has three different incarnations, each testing a different way of processing messages until some termination condition occurs. The versions are:

loop
recursive react
reactWhile (only available with the State-based actors, but relatively easy to port to "classic" actors)

The code for the loop version is below:

class LoopPingSender(val totalPings: Int, val pingReceiver: LoopPingReceiver) extends Actor {
  private var pingsSent = 0
  def act() {
    Actor.loop {
      pingReceiver ! Ping(pingsSent)
      pingsSent += 1
      if (pingsSent == totalPings) {
        println("Sender Done")
        exit("sender completed successfully")
      } else if ((pingsSent % 1000) == 0) react {
        case More => 
      }
    }
  }
}
class LoopPingReceiver(val totalPings: Int) extends Actor {
  private var pingsReceived = 0
  def act() {
    Actor.loop {
      react {
        case Ping(n) if n == pingsReceived => {
          pingsReceived += 1
          if (pingsReceived == totalPings) {
            println("Receiver Done")
            exit("receiver completed successfully")
          } else if ((pingsReceived % 1000) == 0) {
            reply(More)
          }
        }
        case Ping(n) => {
          println("error")
          exit("invalid n value: " + n)
        }
      }
    }
  }
}

The non-scientifically gathered results are as follows, all times in seconds:

Version	Standard Actors	State-based Actors
loop	298.22	96.84
Recursive react	90.25	113.57
reactWhile	N/A	40.63

Standard Actor `loop`

Let's consider the loop version first. loop in the standard actor implementation is rather convoluted. Rather than directly supporting looping, the standard actor implementation provides a package private hook for a function that is executed when the actor is terminating. loop uses this hook to subvert the termination process by throwing an exception and subsequently restarting execution of the actor. This makes it so that two exceptions are thrown per message processed rather than the normal one. Here's a small snippet from the profiling results:

     Compiled + native   Method                        
 40.0%  5148  +     0    ExceptionBlob
 13.9%   658  +  1133    scala.actors.Scheduler$$anon$2.run
 13.1%   391  +  1299    scala.actors.Actor$.loop
  3.4%   441  +     0    scala.actors.FJTaskRunner.run
--- snip ---
 73.7%  6705  +  2771    Total compiled

         Stub + native   Method                        
 20.6%     0  +  2654    java.lang.Thread.isInterrupted

What you can see is that the ExceptionBlob is dominating the runtime, followed by Thread.isInterrupted. ExceptionBlob is a class in the JVM that observes the unwinding of the stack. As far as I could discern through so separate microbenchmarking (which I'll discuss another day, after I've spent some time in HotSpot code), the amount of time consumed by ExceptionBlob is proportionate depth of the stack being unwound by exceptions. In other words, throwing fewer exceptions doesn't do any good if you use them to unwind deeper stacks, and unwinding shallower stacks doesn't do any good if you throw more exceptions. Caching exceptions doesn't help, either, but I digress. The bottom line is that loop is pretty inefficient in the standard actors implementation.

State-based Actor `loop`

In my state-based actors, there are two states to handle looping, both of which are subclasses of Running. The first is Looping, which is initially entered when the actor starts looping. The second is ResumeLooping, which is used to resume looping after the actor has been suspended. Both run the body of the react in a tail-recursive loop until their are no more messages to process, at which time they suspend the actor. This means that many message can potentially be processed before any exception-based signaling is used. The following is a snippet from a profile:

     Compiled + native   Method                        
 67.5%  3233  +     0    ExceptionBlob
  6.4%   306  +     0    scala.actors.Actor$class.withLock
  4.9%    44  +   189    actorbench.LoopPingSender$$anonfun$act$1.apply
  4.8%    97  +   135    scala.actors.Actor$$anonfun$send$1.apply
  4.5%   116  +    99    scala.actors.Actor$Running.tr$1
  3.8%    36  +   145    scala.actors.Actor$ResumeLooping.enter
--- snip ---
         Stub + native   Method                        
  0.2%     0  +     9    java.lang.Thread.isInterrupted

Again, ExceptionBlob dominates the execution time, although the total time it consumes is lower. Unfortunately, despite using significantly fewer exception signals, my state-based actors tend to build up significantly deeper stacks than the actors in the standard library.

Standard Actor recursive `react`

Recursive react seems to be the best performing looping pattern for the standard actor library. It's a pretty simple pattern:

class PingSender(val totalPings: Int, val pingReceiver: PingReceiver) extends Actor {
  def act() {
    def recurse(i: Int): Unit = {
      if (i > 0) {
        pingReceiver ! Ping(i)
        if ((i % 1000) == 0) {
          react {
            case More => recurse(i - 1)
          }
        } else {
          recurse(i - 1)
        }
      } else {
        println("Sender Done")
      }
    }
    recurse(totalPings)
  }
}

As you can see, the function containing the react block is simply recursively called as needed. Note that despite the fact that the recursive call appears to be in a tail position, it is not a tail recursive call and will not be optimized by the compiler. The PartialFunction defined within the react block is actually a separate object with an apply method, so while recurse and the PartialFunction are mutually recursive, they are not tail recursive.

     Compiled + native   Method                        
 19.1%   456  +     0    ExceptionBlob
 13.0%    89  +   221    scala.actors.Reaction.run
  8.6%    53  +   152    scala.actors.Actor$class.send
  7.9%    61  +   127    scala.actors.Actor$class.react
  5.3%    59  +    67    scala.Iterator$class.toList
--- snip ---
 68.1%   862  +   765    Total compiled

         Stub + native   Method                        
 26.4%     0  +   631    java.lang.Thread.isInterrupted
--- snip ---
 26.9%     0  +   643    Total stub

This time Thread.isInterrupted dominates, with ExceptionBlob not too far behind.

Actor States recursive `react`

It turns out that the "hybrid trampoline" I implemented to facilitate recursive reacts without giving up the thread provides pretty bad performance.

 64.9%  3714  +     0    ExceptionBlob
 11.4%   568  +    85    scala.actors.Actor$class.withoutLock
  8.4%   478  +     1    scala.actors.Actor$class.withLock
  5.1%    90  +   201    scala.actors.Actor$$anonfun$reactWithin$1.apply
  4.2%   115  +   125    scala.actors.Actor$$anonfun$send$1.apply
--- snip ---
 97.8%  5006  +   595    Total compiled
         Stub + native   Method                        
  0.1%     0  +     6    java.lang.Thread.currentThread
--- snip ---
  0.2%     0  +    12    Total stub

The problem is that when I wrote it, I thought that the cost of throwing exceptions was primarily proportionate to the number of exceptions thrown rather than the amount of stack that is unwound. Consequently, I made the trampoline only bounce once every n invocations in order to reduce the number of exceptions being thrown. As it turns out, what I should do is find a place where the stack is likely to be the shallowest, and throw an exception every time.

Actor States `reactWhile`

reactWhile addresses the pitfalls presented in the previous versions and provides roughly twice the performance of next best performer, recursive react on the standard actor library.

     Compiled + native   Method                        
 30.0%   171  +   331    scala.actors.Actor$Running.tr$1
 24.4%   263  +   146    actorbench.ReactWhileReceiver.withLock
 12.4%    26  +   181    actorbench.ReactWhileSender.sendPings
  8.5%   142  +     0    scala.actors.Actor$class.withoutLock
--- snip ---
 79.0%   660  +   661    Total compiled

         Stub + native   Method                        
 15.1%     0  +   252    java.lang.Thread.isInterrupted
  0.9%     0  +    15    java.lang.Thread.currentThread
 16.0%     0  +   267    Total stub

ExceptionBlob has completely disappeared from the profile. That is because reactWhile makes minimal use of exceptions for signaling. Consequently, its performance is dominated by some of the complicated for my (worthless) hybrid trampoline, my completely unoptimized lock implementation, and calls to Thread.isInterrupted that are made when the bound thread is returned to the pool when the actor suspends. I'll write a blog on how reactWhile works at a later date, after I cleanup some of the other problems that I've listed.

Parting Matter

It really annoys me when people post benchmark results without providing details of their setup along with all of the code required to duplicate them, and that's just what I've done. I've also benchmarked a very narrow set of behavior, failed for warm up the JVM, and not run multiple trials. Developing a lightweight framework for benchmarking actor performance, along with a more comprehensive set of benchmarks is on my to-do list. Once I have that, I'll start doing more rigorous benchmarking. But I don't think I quite need it yet because bottlenecks are showing up quickly and consistently in my profiling, and in most cases they seem to have reasonably straight forward solutions. All the above profiles against my library were done against revision 49658b3475e2 in my bitbucket repository, so you can at least access that code.

Google is Paranoid

2009-01-31T10:16:00.005-05:00

Apparently Google has recently become extremely paranoid about what may harm my computer. Here's a screenshot of some search results:

Looks like every result is dangerous...now here's what happens when I click on one of these dangerous links:

Must be Apple's developer website is out to get me...

Note: The behavior seems to have stopped. Quick failed Google experiment?

Refactoring Scala Actors

2009-01-24T15:52:00.002-05:00

I've been experimenting in my spare time with the Scala actors library for about a year now. Initially it was mostly as part of my participation in the Widefinder project, but for the past several months I've been working on building more complicated abstractions using the actor library. The challenge with Widefinder, if you chose to keep your parallelism implementation generic and reusable, was to spread line-oriented input stream processing across as many threads as possible (ok, some may disagree with that, but that was my simplified take). This was relatively straight forward because the lines didn't have to be processed in any given order. Actors worked well for the task, and I was able to build a fairly generic set of classes that could be used in a very natural way for any similar problem.

I was feeling pretty good about Widefinder so I decided tackle a more complicated problem: parallelizing XML parsing (I'll write more on that later). In the process of exploring various ideas to help parallelize the parsing of XML I started butting my head against limitations of the actors library. For example, I built an "asynchronous stream" class - something like a normal Scala stream but instead of being entirely lazy, it asynchronously evaluates itself ahead of what has already been consumed, effectively parallelizing computation of the stream. This worked, except that if the stream wasn't fully evaluated the application using the stream wouldn't terminate properly because it's actor was still active. That made sense, as in many actor based applications you wouldn't want them to terminate before all the actors were finished, so I tried to make a custom scheduler that used daemon threads and therefore would allow the application to terminate without all of the actor threads terminating. This seemed like a straight forward task, except that it turned out to not be possible given some current limitations (link to enhancement about schedulers). So I set myself to fixing/enhancing the actors library to meet my needs and submit some patches.

Making fixes and enhancements to the actors library turned out to be a much bigger challenge than I expected. The external API for actors, particularly the documented part, is pretty clean and flexible. However internally, at least to a new set of eyes, there is a lot of spaghetti (count methods and variables that are defined as private[actors] so all the classes in the package can muck with them), oddly named variables and methods, and over a dozen mutable variables on the Actor trait that must all be correctly set in the right combination in order for the actor to work - and completely lacking any sort of structured means of manipulating them. I braved this, submitted some patches, and even wrote several enhancements that I did not submit, but I kept running into subtle problems that seemed almost impossible to track down. I would sprinkle assertions throughout the code only to realize that even when things seemed to work much of what I assuming was really holding. In other words I was getting lucky - or unlucky - as challenges like this are pretty common when doing parallel programming.

By this time I had a reasonable, albeit incomplete and with some inaccuracies, idea of what all those magical instance variables within Actor do and how they correspond to meaningful states. Here is my take on it:

waitingFor - the partial function that the actor is using to wait for a matching message
shouldExit - used to signal that the actor should exit instead of doing normal processing
isSuspended - seems to indicate that the actor is waiting while blocked on a thread
isDetached - the actor is waiting and is not attached to a thread
isWaiting - the actor is waiting in some way
mailbox - queue of messages for the actor
sessions - list used as a stack to keep track of the sender of messages while they are being processed
received - the message that was received, used to pass around a received message, could be replaced with local variable within functions
exitReason - the reason why the actor exited, default to 'normal when the actor is initialized and can be modified at any time, so it is there even when the actor has not yet exited.
onTimeout - used for keeping track of a timeout action so that it can be canceled if a message is received within the timeout period
kill - a function invoked by reaction prior to invoking exit on the actor. It is used by loop and loopWhile to bypass the normal exit process of reactions
trapExit - if false (the default) then the actor will exit when an actor that it is linked to exits, if it's true then it will receive an Exit message
rc - a reply channel - unclear why it is actually needed because it is never used other than to be set in freshReplyChannel on every invokation
links - a list of linked actors. Linked actors are notified or shutdown when the actor that they are linked to shuts down
continuation - the function to be executed when the actor resumes

All those were pretty confusing, and it took me weeks to make any sense out of them - and usually the sense I made contained subtle flaws, so I decided to do something drastic. In my opinion the actors library is in desperate need of being refactored into something that encapsulates its state better and makes it much more explicit so you do not need to be an advanced adept to figure out what is happening inside the library. So I decided to do that refactoring and see where it lead me.

I distilled much of the above instance variables into the following states, which I have represented as immutable objects:

NotStarted - the initial state of the actor
Active States
- StartScheduled - the actor has been started but has not yet been attached a thread or started executing its body
- Running States - states where the actor is bound to a thread and is executing its body
  - Running - the actor is bound to a thread and is executing its body
  - Looping - a special case of running where the body is repeated while a specified condition holds
  - Killing - when an actor is Running it cannot be directly stopped by the library, so it is shifted into the Killing state. When the library receives control while the actor is running it checks to see if the state has been changed to Killing. If the state has been changed to Killing, then it takes actions to regain control from user code so that the actor can be cleanly shutdown.
- Wait States
  - BlockedWait - the type of wait performed within a receive block where the actor is waiting for a message while attached to a thread. There may be a timeout specified after which the actor will wakeup.
  - DetachedWait - the type of wait performed within a react block when no matching message is available.
- ContinuationScheduled - similar to StartScheduled, only the processing of the message the actor was waiting for when in the DetachedWait state is scheduled instead of the starting of the actor.
Terminal States
- Finished - the actor teriminated normally
- Killed - the actor was killed before it could complete execution
- UnhandledThrowable - an exception was thrown, usually from "user code," and not handled
- InternalError - an error occurred within the actor library

...and now there are the following instance variables:

state - the state of the actor
mailbox queue of messages for the actor
sessions list used as a stack to keep track of the sender of messages while they are being processed
trapExit - if false (the default) then the actor will exit when an actor that it is linked to exits, if it's true then it will receive an Exit message
links - a list of linked actors. Linked actors are notified or shutdown when the actor that they are linked to shuts down .

Reducing the statefulness of the actor, and more importantly making the primary state explicit instead of implicit, makes the code much easier to both debug and extend. In working with actors, I found that it's relatively common to have an actor just seem to stop. It could be because it's waiting for a message, it could be because it's stuck in a loop, it could be any number of things. Without explicit state, it's very difficult to tell exactly what's happening with an actor. With explicit state, it's right there in your debugger.

I have the code in a Bitbucket repository in the "actor_state_machine" branch. Most of the changes are in the actor.scala file and I highly encourage you to take a look at the differences. Right now it the Scala test suite completes with the same results as the trunk as of a couple months ago, and I've added a couple tests to catch problems that it didn't. Next I'm going to do a little more refactoring and general code cleanup, then I'm going to focus on expanding the test suite. The current Scala testing tool, partest, uses actors heavily so it gives them a workout, but there aren't any explicit tests for actors. Finally, I'm going to try to decouple the states from one another and try to extract their code from the Actor trait so that they can be tested as independent units and are more extensible. I'm not sure what I'm actually going to do with the code. I'd be happy to contribute it back to the EPFL, but given it is such a substantial departure I'm not sure they would want it. So I'll probably end up moving it into another namespace and distributing it as a separate library.

Countdown to the U.S.S.R.

2008-11-05T07:08:00.003-05:00

Normally I don't blog on politics. I think they tend to draw away from technical content. But it's the day after the election, so I can't help myself. The countdown to the birth of the U.S.S.R. has begun. That's the United States Socialist Republic. For years now Bush has eroded our personal freedoms in the name of physical safety from terrorists. Now Obama will launch down the same path in pursuit of economic safety. In the end, America stands to lose all that has made it great, not from powers from without, but wasted away due to fears from within. Of course it's not too late. We stand to lose our greatest strengths, but they are not lost yet, and what is lost can be regained. We are still a democracy, by and large we are still free, and the voice of liberty will still be heard if we have the courage to raise it.

Languages Readability and Tool Support

2008-08-04T22:29:00.004-04:00

I received a few comments on my blog about type inference and its affect on readability saying that the problem isn't really a problem if you have proper tool support. You have API docs, IDE based assistance, and even interesting tools like the OCaml Browser. The problem is that these don't really address the problem. Programming requires a lot of concentration and is best done in a state of flow. This means that anything that causes distraction or disruption is the enemy. Flipping to another window in order to see some documentation requires a non-value added thought. So does moving the cursor so that an IDE will display a popup with the inferred type. Thoughts simply flow better if the code is readily readable, and code that requires a special tool to read is not readable.

There's also less benefits to having code that is readable without external assistance. While code may spend most of its life being displayed in an IDE, it certainly doesn't spend all of its life there. Books, articles, blogs, and other such media often contain code as well. Despite the ubiquity of the internet, I think having at least one book in dead tree format is still essential for a programming languages to be successful (and in some cases even taken seriously), and the last time I checked dead trees don't have popup windows. Most online postings don't have intelligent help, either, although I suppose it would be possible if someone really wanted to put in the effort. Regardless, the readability of a language in these formats will have a major impact on how easy a language is to learn, and ultimately how well it is accepted.

The bottom line is that despite all the great and useful tools there are out there, it is still critical for a language to stand on its own without major tool support.