Thursday, December 06, 2007

Adventures in Widefinding: Performance

I've done some basic performance tests against:

  1. Scala Actor-based parallel Widefinder
  2. Scala serial Widefinder using a BufferedReader
  3. Tim Bray's Ruby Widefinder
The test platform was my 2.2 GHz MacBook with 4GB of RAM using a 6 million line file.  The times were as follows:
Scala Parallel:
  • real 0m14.588s
  • user 0m24.541s
  • sys 0m1.383s
Scala Serial:
  • real 0m20.095s
  • user 0m18.821s
  • sys 0m1.441s
Ruby:
  • real 0m14.301s
  • user 0m12.485s
  • sys 0m1.813s
The good news is that the parallel Scala version is noticeably faster than the serial version.  The bad news is that it is roughly the same speed as the Ruby version, and takes significantly more CPU.  The Scala versions are doing substantially more work, because they have to transcode the 8-bit ASCII contained in the input file into 16-bit Unicode strings.  This requires a full scan of the data.  I believe a fair amount of performance could be gained by combining the transcoding pass over the input with the line-splitting pass.
For those that are curious, the source code for the parallel widefinder is available here: the parallel IO code the actual widefinder code

Sphere: Related Content

No comments: