Saturday, August 21, 2010

Concurrency Benchmarking, Actors, and sbt tricks

Have you ever noticed that other people's microbenchmarks tend to be hard to run and often impossible to duplicate? And are frequently caveated to the hilt? When it gets down to it, a benchmark is really an experiment, and ideally a scientific experiment. That means all factors that are relevant to the results should be clearly recorded, and the tests should be easy for others to duplicate.

Custom sbt actions for benchmarks

In order to test and run benchmarks on the work I'm doing around creating a managed variant of the JSR-166y ForkJoinPool along with supporting infrastructure for use with Scala Actors, I'm creating a test harness that captures a host of environmental factors about how it was run, and writing sbt actions to make it easy to run the benchmarks and automatically permute the variables.

It still needs a lot of work, but I had some trouble figuring out a really basic task so I thought I'd share it. Basically I wanted to build a Task object that consists of several tasks based on information in the project definition and permuted parameters. It actually pretty easy, as you can see in the snippet below from my project definition:

  /** this task executes the PingPong benchmark using each available scheduler */
  lazy val pingpongbench = pingpongTaskList
  /** produces a sequence of run tasks using all the available schedulers  */
  def pingpongTaskList = {
    val pairs = 100
    val messagesPerPair = 10000
    val tasks = for(sched <- schedulers) yield pingpongTask(sched, pairs, messagesPerPair)
    tasks.reduceLeft((a, b) => a && b)
  }

You can see the whole file here. Basically Task has an && operator that essentially allows you to concatenate one task with another task. This allows you to build a whole chain of tasks. In the example above, I'm having it run the benchmark once for each scheduler configuration. Soon, I'm going to make it permute other parameters. But right now my test harness isn't playing nicely with the schedulers included in the Scala distribution, so first things first.

There's also one other little customization, which is documented, but I think it's important for benchmarking. By default, sbt runs your code in its own process. This can cause problems with multithreaded code, especially if it doesn't terminate properly. It also means the next benchmark to run has to content with any junk that the previous benchmark left around. So I configured sbt to fork new processes. It just required one line:

override def fork = forkRun

Important variables

Here's what I'm capturing for each run right now so that the results can all be dumped into a big spreadsheet for analysis. I'd like to capture more information about the host machine, such as more information about the CPUs and the loading when the benchmark is being run, but haven't got that far yet. Currently these are all captured from within the benchmark process, mostly using system properties and the Runtime object.

  1. Test Name - obviously needed so that results from multiple benchmarks can be stored in the same file
  2. Scheduler - this is my primary variable right now, I want to run each benchmark with each scheduler while holding everything else constant
  3. # of Cores/Processors - essential so that anyone looking at the results has an idea about the hardware used
  4. Java VM Name - different VMs can perform quite differently
  5. Java VM Version - performance characteristics change from version to version (usually getting better)
  6. Java Version - same reason as above, but this is probably the more publicly known version number
  7. Scala Version - this could be important in the future, as it becomes more common for different projects to be on different version of Scala
  8. OS Name and version - again, it can affect performance
  9. Processor Architecture
  10. Approximate Concurrency (number of simultaneously alive actors) - this allows us to examine concurrency levels versus resource consumption, more concurrency does not necessarily mean that more cores or threads would be helpful
  11. Approximate Parallelism (number of simultaneously runnable actors) - this measures how many cores/threads the benchmark can really keep busy
  12. Approximate Total Messages - this estimates the amount of activity that takes place during the benchmark, generally the benchmarks I'm looking at contain very little logic because they are intended to measure overhead introduced by the framework
  13. Total Wall Clock Time (seconds) - as measured using nanoTime within the benchmark process
  14. Initial Thread and Maximum Observed Thread Count - used to examine automatic expansion of the thread pool
  15. Initial Free Memory and Minimum Observed Free Memory - threads use a fair amount of memory, so performance impacts may show up as pressure on the GC as well has contention for the CPU
  16. Initial and Maximum Observed Total Memory - threads use a lot of memory, so it's important to track usage
  17. Verbose - debugging output pretty much invalidates any of these tests

Sphere: Related Content

No comments: