Wednesday, November 05, 2008

Countdown to the U.S.S.R.

Normally I don't blog on politics. I think they tend to draw away from technical content. But it's the day after the election, so I can't help myself. The countdown to the birth of the U.S.S.R. has begun. That's the United States Socialist Republic. For years now Bush has eroded our personal freedoms in the name of physical safety from terrorists. Now Obama will launch down the same path in pursuit of economic safety. In the end, America stands to lose all that has made it great, not from powers from without, but wasted away due to fears from within. Of course it's not too late. We stand to lose our greatest strengths, but they are not lost yet, and what is lost can be regained. We are still a democracy, by and large we are still free, and the voice of liberty will still be heard if we have the courage to raise it.

Sphere: Related Content

Monday, August 04, 2008

Languages Readability and Tool Support

I received a few comments on my blog about type inference and its affect on readability saying that the problem isn't really a problem if you have proper tool support.  You have API docs, IDE based assistance, and even interesting tools like the OCaml Browser.  The problem is that these don't really address the problem.  Programming requires a lot of concentration and is best done in a state of flow.  This means that anything that causes distraction or disruption is the enemy.  Flipping to another window in order to see some documentation requires a non-value added thought.  So does moving the cursor so that an IDE will display a popup with the inferred type.  Thoughts simply flow better if the code is readily readable, and code that requires a special tool to read is not readable.

There's also less benefits to having code that is readable without external assistance.  While code may spend most of its life being displayed in an IDE, it certainly doesn't spend all of its life there.  Books, articles, blogs, and other such media often contain code as well.  Despite the ubiquity of the internet, I think having at least one book in dead tree format is still essential for a programming languages to be successful (and in some cases even taken seriously), and the last time I checked dead trees don't have popup windows.  Most online postings don't have intelligent help, either, although I suppose it would be possible if someone really wanted to put in the effort.  Regardless, the readability of a language in these formats will have a major impact on how easy a language is to learn, and ultimately how well it is accepted.

The bottom line is that despite all the great and useful tools there are out there, it is still critical for a language to stand on its own without major tool support.

Sphere: Related Content

Sunday, August 03, 2008

The Costs of Generality

I've been pondering the results of the Cedric's Code Challenge, and wondering just how much benefit is derived from optimized, purpose-specific solutions as opposed to solutions that rely on a more general libraries or frameworks.  It's fairly common to see debates where one person (or group) insists that general constructs from a standard library or other such solutions represent an unacceptable overhead, and the other side claims that the overhead is meaningless compared to runtime optimizations performed by HotSpot and the cost of programmer time.  These debates can be rather painful to watch, as both sides generally have good points, yet often seem to be arguing right past one another.  Consequently, I think a little exploration of various potential optimizations and what their respective impacts on performance would be beneficial.

For purposes here, I'm going to say that a solution based on a general framework would be one that uses a general purpose library to generate permutations of digits, filters out the ones with a leading zero, converts the permutations to numbers, and then collects the desired statistics.  A purpose specific solution would be one such as Crazy Bob's that is tailor-made for generating numbers based on permuted digits.

The General Solution

I'm not aware of a combinatronics library for Scala, but it is simple enough to write a generic permutation generating function:

  def permute[E](s: Set[E], n: Int)(f: List[E] => Unit): Unit = {
    def p(s: Set[E], r: List[E]): Unit =
      if (r.length == n) f(r) else for(e <- s) p(s - e, e :: r)
    p(s, Nil)
  }

This recursively generates all of the possible permutations.  When it as generated a complete permutation, it passes it to the function specified by the caller.  If s is an ordered set, then the permutations will be generated in a predictable order. This can then be used to generate the permutations of digits for the code challenge, as follows:

  val digits = TreeSet(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)
  def main(args: Array[String]): Unit = {
    val start = java.lang.System.nanoTime
    var last, cnt, jumpLow, jumpMagnitude = 0L
    for(d <- 1 to 10) permute(digits, d) { p =>
      val xs = p.reverse // digits are generated in the wrong order so must be reversed
      if (xs.head != 0L) {  // make sure this is a valid set of digits
        val cur = xs.foldLeft(0L)((z, a) => (z * 10L) + a)
        val dif = cur - last
        if (dif > jumpMagnitude) {
          jumpLow = last
          jumpMagnitude = dif
        }
        last = cur
        cnt = cnt + 1L
      }

    }
    val end = java.lang.System.nanoTime
    println("Count: " + cnt)
    println("Jump: " + jumpMagnitude + " (from " + jumpLow + " to " + (jumpLow + jumpMagnitude) + ")")
    println("Time: " + ((end - start) / 1000000L) + " ms")
  }

This solution takes about 13 seconds on my MacBook.

Generate Only Valid Permutations

The above permutation function can be tweaked as follows to generate only valid permutations (ones without the leading zero), and thereby saving about 10% execution time.

  def digitPermute(n: Int)(f: List[Long] => Unit): Unit = {
    def p(s: Set[Long], r: List[Long]): Unit =
      if (r.length == n) f(r) else for(e <- s) p(s - e, e :: r)
    for(first <- (digits - 0L)) p(digits - first, first :: Nil)
  }

The above solution executes in about 12 seconds.

Accumulating Numbers Instead of Lists

Both of the methods above construct lists of numbers which are later assembled into numbers.  This wastes memory and cycles, because only the resulting numbers are required and they can be accumulated much more efficiently.  Doing so, as shown below, reduces execution time to about 7 seconds.

  val digits = TreeSet(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L)
  def digitPermute(n: Int)(f: Long => Unit): Unit = {
    def p(s: Set[Long], d: Int, r: Long): Unit =
      if (d == n) f(r) else for(e <- s) p(s - e, d + 1, r * 10L + e)
    for(first <- (digits - 0L)) p(digits - first, 1, first)
  }

Long Set without Boxing

The above implementations all use TreeSet from Scala's standard library, which imposes a few performance penalties. For one, it is "generic." This means that it requires both type-erasure and boxing instead of using primitives.  Second, if you look carefully at the definition of TreeSet, you'll notice that it doesn't require its contents to be Ordered, but rather uses a (potentially implicit) view converting the contained type into an Ordered.  This adds an extra layers of indirection and therefore an extra cost.

  final class LongSet (val contents: Array[Long]) {
    private def indexOf(v: Long, min: Int, max: Int): Int = {
      if (min > max) -1
      else {
        val mid = (min + max) >>> 1
        val midVal = contents(mid)
        if (midVal < v) indexOf(v, mid + 1, max)
        else if (midVal > v) indexOf(v, min, mid - 1)
        else mid
      }
    }
    def foreach(f: Long => Unit) {
      var i = 0
      val max = contents.length
      while (i < max) {
        f(contents(i))
        i = i + 1
      }
    }
    def -(v: Long): LongSet = {
      val max = contents.length - 1
      if (indexOf(v, 0, max) < 0) this
      else {
        val a = new Array[Long](max)
        var i, j = 0
        while (i <= max) {
          val cur = contents(i)
          if (cur != v) {
            a(j) = contents(i)
            j = j + 1
          }
          i = i + 1
        }
        new LongSet(a)
      }
    }
  }
  val digits = new LongSet(Array(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L))
  def digitPermute(n: Int)(f: Long => Unit): Unit = {
    def p(s: LongSet, d: Int, r: Long): Unit =
      if (d == n) f(r) else for(e <- s) p(s - e, d + 1, r * 10L + e)
    for(first <- (digits - 0L)) p(digits - first, 1, first)
  }

This implementation brings the execution time down to ~1.7 seconds, representing a substantial savings over TreeSet. The comparison isn't quite fair, as TreeSet uses a Red-Black balanced tree and the code above uses a sorted array, but the difference is still substantial and shows that having a more targeted data structure can improve performance significantly.  At this point you might be thinking "Well no sh*t sherlock!  Of course a data structure tuned for a specific type is faster than one that is written to handle any type!"  That's true...to a point.  Not all languages implement generics using type erasure and require boxing of values within parameterized classes.  For example, C++ was designed to ensure that data structures implemented using templates imposed little or no overhead above more raw ones.

Special Purpose Set for Permutation Generation

Another approach is to use a more special-purpose data structure in the permutation function without reducing its generality.  The linked set used in Crazy Bob's solution can be generalized to generating permutations of any type.  Unfortunately, this structure is mutable, and mutates on every invocation.  This means that while it would be possible to pass it directly to client code, it would be extremely dangerous because the client code may maintain a reference to the rapidly changing data structure.  Consequently, the structure needs to be copied into a list or similar structure before being passed to client code.  The solution built around the code below completes in ~5 seconds, which is slower than using an structure explicitly coded for dealing with longs and generating longs, but over twice as fast as generating permutations using the standard TreeSet class.

  private final class Element[E](val value: E, var next: Element[E], var previous: Element[E]) {
    /** remove an element from the set*/
    def use() {
      if (previous ne null) previous.next = next
      if (next ne null) next.previous = previous
    }
    /** put an element back in the set */
    def yieldValue() {
      if (previous ne null) previous.next = this
      if (next ne null) next.previous = this
    }
  }
  private def buildElements[E](s: Set[E]): Element[E] = {
    val iter = s.elements
    val first = new Element(iter.next, null, null)
    var cur = first
    while(iter.hasNext) {
      cur.next = new Element(iter.next, null, cur)
      cur = cur.next
    }
    first
  }
  def permute[E](s: Set[E], n: Int)(f: List[E] => Unit): Unit = {
    def p(start: Element[E], head: Element[E], r: List[E]): Unit = {
      def take(current: Element[E]): Unit = {
        if (current ne null) {
          val newR = current.value :: r
          if (newR.length == n) {
            f(newR)
            take(current.next)
          } else {
            current.use()
            val newHead = if (current eq head) head.next else head
            p(newHead, newHead, newR)
            current.yieldValue()
            take(current.next)
          }
        }
      }
      take(start)
    }
    val first = buildElements(s)
    p(first, first, Nil)
  }

Conclusion

The various implementations here represent a sampling of various ways that Cedric's Code Challenge can be implemented in Scala, and the effects they have on performance.  A relatively direct port of Crazy Bob's solution to Scala completes in ~0.4 seconds, making it by far the fastest solution and about 30 times faster than the solution using standard data structures with a generic permutation generator.  That's not really surprising, so what can we conclude?  The most obvious conclusion is that avoiding the construction of intermediate objects yields a substantial speedup.  This can be seen in two places.  The first is in the switch from constructing a List to represent the permutation to accumulating the Long directly.  The second is in using a special-purpose mutable data structure to generate the permutations, thereby avoiding repeated allocations of Set objects.  Finally, reducing overhead due to things like boxing and the casts associated with type erasure does make a noticeable difference in performance.  On the flip side, Scala's closure based constructs, such as nested functions and for loops, added negligible overhead, if any at all. Using more general constructs instead of more specific ones clearly has a substantial performance cost, but it's also worth mentioning that the cost is trivial compared to the benefit received in the transition from a brute-force solution to an optimal algorithm.

Sphere: Related Content

Tuesday, July 15, 2008

System.identityHashCode

A recent thread on the the Scala IRC channel piqued my curiosity about exactly how System.identityHashcode works.  It looks like as less heated remarks have flushed out the conversation more the definition was irrelevant, but regardless, I think it's interesting.  Thankfully Sun open-sourced the their implementation of the Java platform so it was pretty easy to find out exactly how it works.

It turns out the Sun JVM contains three different algorithms for calculating the identity hash, none of which are guaranteed to yield unique hash codes.  That's not surprising, because on a 64 bit architecture, doing that would be almost impossible.  I say almost because one could just generate them as sequential numbers, and then crash the JVM when it runs out of 32 bit hash codes (an OutOfHashCodesError ;-).

From share/vm/runtime/synchronizer.cpp:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ; 
  if (hashCode == 0) { 
     // This form uses an unguarded global Park-Miller RNG, 
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.  
     value = os::random() ; 
  } else
  if (hashCode == 1) { 
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.  
     intptr_t addrBits = intptr_t(obj) >> 3 ; 
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; 
  } else 
  if (hashCode == 2) { 
     value = 1 ;            // for sensitivity testing
  } else { 
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ; 
     t ^= (t << 11) ; 
     Self->_hashStateX = Self->_hashStateY ; 
     Self->_hashStateY = Self->_hashStateZ ; 
     Self->_hashStateZ = Self->_hashStateW ; 
     unsigned v = Self->_hashStateW ; 
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; 
     Self->_hashStateW = v ; 
     value = v ; 
  } 

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ; 
  assert (value != markOopDesc::no_hash, "invariant") ; 
  TEVENT (hashCode: GENERATE) ;
  return value;
}

What's interesting is that there's an algorithm in there (hashCode == 2) where the identity hash code turns out to always be the same all objects.  I'm pretty sure it's there solely for testing purposes, so they can run tests against the JVM and standard library and ensure no piece of code is relying on the uniqueness of the identity hash code.

This relates back to my previous point about transparency.  Because Sun's JVM is open source, it's possible for anyone to peek inside and see how the various bits work, and what assumptions the Sun JVM engineers are and are not making about those pieces.

Sphere: Related Content

Trust in Authority vs Trust in Transparency

This morning Murph posted a blog on the "censorship" of Wikipedia by over-zealous article owners, citing a posting by Lawrence Solomon about an experience with editing an article related to global warming.  Murph uses this to support one of his common conspiracy theories that Wikipedia, and social media in general, is doomed because of this kind of censorship and deliberate distortion of the facts.

What's interesting is that this censorship took place in the open.  Anyone who knows where to look can see exactly what changes were made by Solomon, and their disposition relative to the current official article or any other version of it.  Here's Solomon's version versus the current version.

The core issue here isn't censorship.  Editors will always censor.  Their job is to act as filters.  Furthermore, every editor is subject to biases, whether they be his own or those imposed on him by a third party such as his employer.  With most publications the editorial process happens behind closed doors with unseen forces.  With Wikipedia the process happens right before the eyes of the world.

The real issue here is that Murph trusts authority more than transparency.  Someone corrected an article, and the correction was subsequently removed due to political bias.  I'm sure that's happened time and time again in every encyclopedia that has ever been published.  The difference is in this case the change was transparent, with the politics open for all to judge, where with a traditional model we would have never known.

If history has taught us anything, it's that placing too much faith in authority is a bad idea.  Our authority figures are all human, and our authoritative organizations are still organizations of people.  They are both fallible and corruptable.  We can't entirely strip authority from our society because that would lead to anarchy, but we can make authority more transparent, and with transparency we can judge for ourselves.

In this particular case the editorial process of Wikipedia probably failed to yield the most accurate article possible, at least as the article stands today.  I'm not an expert on the subject matter, but I think Solomon's corrections were most likely correct.  That being said, I think the process has succeeded.  The changes Solomon made have not be entirely censored, they have merely been driven from the main page.  Discourse continues with regards to their validity.  The biases of the editors have been made public.  The last thing we want to do is reseal that process behind closed doors, simply because in this case we didn't like some of the results.

Sphere: Related Content

Sunday, July 13, 2008

Love, Hate, and Type Inference

Ever since I started using Scala, I've had somewhat of a love-hate relationship with type inference.  On the local level, type inference can make code much more concise, easier to read, and easier to refactor.  It makes code easier to read because in cases where the type of variable is obvious, type annotations just add noise that distract from the meaning of the code.  It makes code easier to refactor, because often times changes to the type of something in one place in a program can ripple throughout the rest of the program through a simple recompile, assuming the new type supports the methods being used on the original type.  This leads to something akin to structural typing without the associated overhead.

However, at a more macro level I think type inference can make code much more difficult to read.  For example, consider the following code from the Scalax library (full source here):

//  Scalax - The Scala Community Library
//  Copyright (c) 2005-8 The Scalax Project. All rights reserved
abstract class InputStreamResource[+I <: InputStream] extends CloseableResource[I] {
    def buffered : InputStreamResource[BufferedInputStream] =
        new InputStreamResource[BufferedInputStream] with Wrapper {
            type Handle = BufferedInputStream
            override def wrap(is : SelfHandle) = new BufferedInputStream(is)
            
            override def buffered = this
        }

    def slurp() = for (is <- this) yield StreamHelp.slurp(is)
    
    /* Obtains a Reader using the system default charset. */
    def reader =
        new ReaderResource[Reader] with Wrapper {
            type Handle = Reader
            // XXX: should be UTF-8 by default instead of OS default
            // practically, here in Russia I never used default charset
            override def wrap(is : SelfHandle) = new InputStreamReader(is)
        }
    
    def gunzip =
        new InputStreamResource[GZIPInputStream] with Wrapper {
            type Handle = GZIPInputStream
            override def wrap(is : SelfHandle) = new GZIPInputStream(is)
        }
    
    /** Obtains a Reader using the supplied charset. */
    def reader(charset : String) = {
        // Do this lookup before opening the file, since it might fail.
        val cs = Charset.forName(charset)
        new ReaderResource[Reader] with Wrapper {
            type Handle = Reader
            override def wrap(is : SelfHandle) = new InputStreamReader(is, cs)
        }
    }
    
    def lines = reader.lines
    
    def lines(charset : String) = reader(charset).lines
    
    def readLines() = reader.readLines()
    
    def readLine() = reader.readLine()
    
    def pumpTo[O <: OutputStream](osr : OutputStreamResource[O]) {
        // Note InputStream should be opened before OutputStream
        for (is <- this; os <- osr) StreamHelp.pump(is, os)
    }
}

This is an example of where type inference makes code easier to refactor yet more difficult to read.  If one were to change the return type of the reader methods, or the subsequent lines and/or readLines methods on that type, then the return types of these methods would automatically change on recompile.  However, now try to figure out the return types of the lines and readLines methods.  In order to do that, you need to know what the return type of the reader method is, and the structure of that type. Figuring out the return type of reader is reasonably straight forward, as it is defined in the same.  However, the base class for the return type is not, so in order to figure it out you need to trace through several class definitions and potentially in other source files.  I doubt this is a big deal for those people who are intimately familiar with the code, but I pity the new guy who has to sort it out when he's trying to create a new subclass of this abstract class.  Of course, there's always the API docs, so users of the code, and the poor new guy, do have a place to turn.

So that's Scala, which supports a relatively limited form of type inference.  Now, let's consider a snippet of OCaml, which has much fuller type inference, that I stole from Mauricio Fernandez:

   (* Copyright (C) 2008 Mauricio Fernandez  http//eigenclass.org
      Solution to the coding challenge at
      http://beust.com/weblog/archives/000491.html *)
   open Printf
   module S = Set.Make(struct type t = int let compare = (-) end)
   let (--) set elm = S.remove elm set
  
   let permutations f zero digits len =
     let base = S.cardinal digits in
     let rec aux n digits = function
         0 -> f n
       | len -> S.iter (fun s -> aux (n * base + s) (digits -- s) (len-1)) digits
     in S.iter (fun s -> aux s (digits -- s) (len - 1)) (digits -- zero)
 
   let () =
     let max = 10_000_000_000 in
     let digits = List.fold_right S.add [0; 1; 2; 3; 4; 5; 6; 7; 8; 9] S.empty in
  
     let count = ref 0 and prev = ref 0 and maxj = ref 0 and base = ref 0 in
     let report () = printf "Found %d numbers under %d.\n" !count max;
                     printf "Max jump: %d (%d -- %d).\n" !maxj !base (!base + !maxj)
   
     in try
       for i = 1 to 10 do
         permutations
           (fun num ->
              if num >= max then raise Exit;
              (* printf "%d\n" num; *)
              incr count;
              let jump = num - !prev in
                if jump > !maxj then (maxj := jump; base := !prev);
                prev := num)
           0 digits i
       done;
       report ()
     with Exit -> report ()

It's not entirely fair that I am picking on this OCaml code, because I don't know the language.  Also, this is a short, self-contained program that I personally would be likely to write in Python without any type annotations at all.  So in a way, for short programs like this, I think full type inference is great.  You get all of the protection of static typing with none of the hassle.  That being said, I think it would be extremely difficult to approach a large code base that is this devoid of type annotations.

All this adds up to my love-hate relationship with type inference.  I love it when I'm writing my own code, and I hate it (aside from local variables) when I am reading other people's code.  Over the past months I've decreased my use of it in Scala, preferring instead to explicitly specify types wherever they would not be entirely obvious from looking at the code, even though all my Scala code is hobby-work so no one else has to read it.  Of course, being hobby work, I often go days, weeks, or even months without looking at a chunk of code, so I often need help remembering what I wrote.  Overall I would say that excessive reliance on type inference obfuscates code, and that there are plenty ways for programmers to obfuscate their code, so it is a style issue as opposed to a language design issue.  I also think it will be interesting to see how many dynamic language people migrate over to languages with strong type inference as these languages gain more attention.  As you can see from the OCaml code above, there is no extra "typing burden" placed on the programmer.  If the program is correctly typed, it will compile and run quickly.  If not, it won't.  In theory this should satisfy the "type annotations are too much work/add too much noise to my program" camp, but not the "I don't want the compiler telling me what I can and can't do" camp.  My guess is that more people fall into the latter than the former, even the ones that claim they don't, but only time will tell.

Update: I just had a thought. Try thinking of type inference like using pronouns in natural languages. Imagine talking to a person who rarely uses pronouns. It's slow, cumbersome, and occasionally makes you think the speaker thinks you are an idiot who can't remember what he said 30 seconds ago. Likewise, when someone speaks entirely in pronouns, especially when they use a pronoun to refer to something that isn't part of the immediate conversation, it leads to utter confusion. Context is key, and the compiler can usually keep a much more distant context in its working memory than a programmer can. Type annotations provide context. Type inference assumes context.

Update 2: Here's a link to some OCaml API docs, where, just like in Scala, you can clearly see the inferred types.

Sphere: Related Content

Saturday, June 28, 2008

Cedric's Code Challenge

Cedric Buest issued a interesting little challenge this morning:

Here is an interesting coding challenge: write a counter function that counts from 1 to max but only returns numbers whose digits don't repeat.

For example, part of the output would be:

  • 8, 9, 10, 12 (11 is not valid)
  • 98, 102, 103 (99, 100 and 101 are not valid)
  • 5432, 5436, 5437 (5433, 5434 and 5435 are not valid)

Also:

  • Display the biggest jump (in the sequences above, it's 4: 98 -> 102)
  • Display the total count of numbers
  • Give these two values for max=10000

He welcomed brute-force solutions, but really the challenge here is in coming up with something more efficient and elegant. There are basically three general approaches:

  1. Run through all the numbers from 0 to n, test each for no repeating digits, and track the above statistics while you do it. This is the brute force method.

  2. Permute the digits in a fashion that generates numbers sequentially and track the statistics. Alternatively you could generate them in any order, sort them, and then calculate the statistics.

  3. Derive a heuristic that proves a given sequence of numbers will all contain repeated digits and can therefore be skipped.

I think #2 is probably the ideal fashion, but I didn't think of it until I was mostly done coding #3.

Finding Repeated Digits

The first step in solving this problem, no matter what the approach, is to come up with an algorithm to detect repeated digits. Commenters on Cedric's blog came up with a number of ways to do this, most of which centered around converting the integer into a string and then finding repeated characters. This is a rather frighteningly inefficient approach. There is certainly no need to convert the number into a string in order to know its digits. A much simpler approach is allocate a ten element array of booleans initialized to false, and start generate the digits from lowest to highest by moding the number by ten. The first time you encounter a digit, you flip it's associated array element to true. The second time, you exit because you have detected a repeat. The array is essentially serving as a thrifty man's map. Here it is in Scala:

  def containsRepeatedDigit(i: Long): Boolean = {
    val digits = new Array[Boolean](10) // elements default to false
    def f(i: Long): Boolean = {
      if (i == 0L) false // all digits have been processed
      else {
        val d = (i % 10L).asInstanceOf[Int]
        if (digits(d)) true
        else {
          digits(d) = true
          f(i / 10L)
        }
      }
    }
    if (i < 11L) false else f(i)
  }

The Heuristic

Consider the number 2201. It has repeating digits. The question is: What's the next number without repeating digits? It is 2301. You could calculate it using brute-force, but you'd end up scanning an extra 99 numbers. Notice that the repetition is in the upper digits. This means that you will cannot get a number with non-repeating digits until the second digit (counting from the left) changes. Now consider the number 2200. In this case changes need to occur in both the lower digits and the upper digits, however addressing the upper digits allows us to skip a much larger section of the search space. Finally, consider 22200. In this case, you still want the second digit. However, you are searching from the right, so algorithms what detect the first repeat won't work. Here's Scala code to find the appropriate digit. Notice that it looks very similar to the repeated digit test above.

  def max(array: Array[Int]): Int = {
    def f(idx: Int, m: Int): Int = {
      if (idx == array.length) m
      else if (array(idx) > m) f(idx + 1, array(idx))
      else f(idx + 1, m)
    }
    f(1, array(0))
  }

  def repeatedDigit(i: Long): Int = {
    val prevIdx = new Array[Int](10)
    val recentIdx = new Array[Int](10)
    def f(i: Long, idx: Int) {
      if (i > 0) {
        val d = (i % 10L).asInstanceOf[Int]
        if (recentIdx(d) > 0) prevIdx(d) = recentIdx(d)
        recentIdx(d) = idx
        f(i / 10L, idx + 1)
      }
    }
    f(i, 1)
    Math.max(max(prevIdx), 0)
  } 

Now that we have an algorithm for finding the highest digit that needs to be changed, we need one that will take that information and turn it into the next possible number containing no repeating digits. This simply requires basic arithmetic.

  def nextPossibleNonRepeating(i: Long): Long = 
        nextPossibleNonRepeating(i, repeatedDigit(i))

  def nextPossibleNonRepeating(i: Long, idx: Int): Long = {
    if (idx == 0) i + 1L
    else {
      val p = Math.pow(10.0, (idx - 1).asInstanceOf[Double]).asInstanceOf[Long]
      val r = i % p
      val d = p - r
      i + d
    }
  }

Given this, it is easy to generate a sequence:

  def nextNonRepeating(i: Long): Long = nextNonRepeating(i, repeatedDigit(i))
  def nextNonRepeating(i: Long, idx: Int): Long = {
    val p = nextPossibleNonRepeating(i, idx)
    val d = repeatedDigit(p)
    if (d == 0) p else nextNonRepeating(p, d)
  }

Solution

Once this is all done, the solution is pretty straight-forward. It takes the general form of the function use to generate the next number with non-repeating digits, only it has to keep track of a bunch of extra information.

  def printNonRepeatingReport(start: Long, stop: Long, last: Long, gapLow: Long,
                              gapHigh: Long, cnt: Long): Unit = {
    if (start > stop) {
      println("max: " + last)
      println("max gap: " + (gapHigh - gapLow) + " between " + 
              gapLow + " and " + gapHigh)
      println("count: " + cnt)
    } else {
      val d = repeatedDigit(start)
      if (d == 0L) {
        //println(start)
        val gap = start - last
        val (gl, gh) = if (gap > (gapHigh - gapLow)) (last, start) 
                       else (gapLow, gapHigh)
        printNonRepeatingReport(start + 1L, stop, start, gl, gh, cnt + 1L)
      } else {
        printNonRepeatingReport(nextPossibleNonRepeating(start, d), stop, last, 
                                gapLow, gapHigh, cnt)
      }
    }
  }

Results

I'm not going to list all the numbers here, just the statistics for 1-10,000:

  • max: 9,876
  • max gap: 105 between 1,098 and 1,203
  • count: 5,274

Of course I haven't checked it against a brute-force solution or other posted solution, so I owe a beer/coffee/tea, depending on your persuasion, to anyone who can point out a bug and provide the solution.

Just for kicks, here's to 10,000,000,000:

  • max: 9,876,543,210
  • max gap: 104,691,357 between 1,098,765,432 and 1,203,456,789
  • count: 8,877,690

Which took 1 minute 23 seconds on my MacBook. Try that with a brute-force approach.

Sphere: Related Content

Monday, June 23, 2008

Google AppEngine

There's a lot of buzz out there about how Google AppEngine is a game changer. My question is: Which game? AppEngine reduces the cost of hosting a web application down to zero. Zero is a pretty profound number. You don't even need to provide a credit card number in case they might want to charge you some day. All you need to a mobile phone number capable of receiving text messages. This means that not only is there no up-front cost, but also that there is no risk that you will suddenly incur huge fees for exceeding transfer quotas when you are lucky enough to be Slashdotted. Your application will just be temporarily unavailable. Hey, if that service level is good enough for Twitter, it's good enough for you, right?

The Open Source Game

Starting an open source project has been free for years. Many project hosting communities exist, providing projects with source code management, issue tracking, wikis, mailing lists, other features, and even a little visibility within their directories. Web hosting is not exactly expensive, especially light-weight scripting languages like PHP, Perl, and Python; but it still costs something and therefore requires a sponsor. Often times for a fledgling project the sponsor is also the founder, who also happens to be the lead programmer, help desk, and promoter. There are some who do an amazing job of doing this, but for the most part I think project founders choose to wait.

Note: If I am wrong about the availability of good, free hosting for open source web applications, please provide links to them in the comments section. I would love to be proven wrong on this.

The Startup Game

If Paul Graham were to comment on AppEngine, it probably be that it eliminates one of the last remaining excuses anyone may have to launching a web startup. If you have an idea, some time, and can hack Python – you can launch a web application with AppEngine. You don't need money, infrastructure, long-term commitment, or any of those other things that may scare a person away from a startup. With AdSense, you even monetize your application without hardly any effort.

Of course there are limitations. For one thing, AppEngine is very limited in what it can do. Pretty much any idea with even moderate bandwidth, storage, or computation requirements that cannot be delegated to another web application (e.g. embedding YouTube functionality) is out. So, unless you plan on building a business based on mashing together existing applications, then AppEngine probably is not your free lunch. That being said, I fully expect that Google will gradually add APIs for all of its applications to AppEngine, thereby providing a very powerful base for new-but-thin ideas.

The R&D Game

Just for a moment, let's say there was a company that required all of its employees to spend 20% of their time working on a project other than their primary job. This is a company that is betting that it cannot predict which idea is going to be the next big thing, so it must try lots and lots of things and see what sticks. As this company grows, providing infrastructure for those projects would become a real challenge. Especially providing infrastructure that allows them to be testing in the wild as opposed to simply internally, and allows the projects to instantly scale if they just happen to turn out to be a success. This company would also want to ensure their employees weren't slowed down by having to deal with muck. Such a company would need an infrastructure that could scale to support many, many applications without burdening infrastructure teams. Such a company would need AppEngine.

Now extend the idea further. Let's say it doesn't really matter whether the next big thing was developed by an employee or not. What matters is that the next big idea is bound to the company, for example by using the company standard authentication mechanism or, more importantly, the company standard monetization mechanism.

Ok, so we all know what company I'm talking about. AppEngine allows Google to outsource some of their R&D at very low cost, and given that most successful AppEngine developers will probably choose AdSense to monetize their creations, Google stands to profit regardless of whether they ever pay any fees or not. In cases where creators do pay hosting fees, the great Google gets paid twice.

The Recruitment Game

Distinguishing among potential recruits is very challenging. The accomplished academic is almost certainly smart, may fall apart when asked to work with a significant team on something important to the company rather than a research interest. The industry veteran may have great corporate experience, but political skills could be masking shallow or outdated technical skills. The bottom line is recruiting is hard because in most cases you never see a direct sampling of an individual's work. At best you can see what a team he was on produced and take at educated guess as to his contribution. Open source projects can provide more information, but for most programmers there is no real motivation to participate in such projects.

AppEngine provides more motivation to the programmer, because he can more readily show his creation to other people without incurring any cost and there is always a chance that he will make some money. There are probably a lot of talented “almost founders” out there who would start a company, but perhaps lack some of the personality traits necessary to do so or found themselves in a situation where they need a steady income stream that simply isn't available for the founder of an early-stage startup. These individuals, and others, will make great pickings for Google.

Conclusion

Long term, in order for Google to grow, it has to attract more people to spend more time on sites displaying AdSense advertisements. Over the past few years Google has come out with countless new online services, most of which on still in beta, and none of which has yielded anything close to the monetization potential of their search business. AppEngine allows them to vicariously service the long tail without continuously expanding their already highly diverse R&D efforts. As a nice added bonus it will provide the opportunity to acquire future high-value startups before they are even real startups by simply hiring the developers and maybe tossing some stock options their way. On the flip side, I don't think AppEngine is going to have much effect on mainstream web application hosting. The API is too constrained and for a company with resources the ties to Google are most likely too tight. So I predict AppEngine will be more of a muted success for Google. The infrastructure built for it will be useful internally, it will help them get more sites up more quickly addressing eclectic needs without burdening employees, and it will provide a proving ground for future hires and possibly very early stage acquisitions. This could all add up to AppEngine being a very significant success, but it also means the success may out of the public eye.

Sphere: Related Content

Wednesday, June 18, 2008

What does I/O bound really mean?

There's a lot of old folk wisdom out there that justifies the use slow languages and runtimes on the basis that the impact on performance doesn't matter. Here's a sampling of them:

  • Most applications are I/O bound so the performance of the programming language doesn't matter

  • The database does all the heavy lifting, so performance of the application code doesn't matter

  • Computer time is cheap compared to programmer time

Like most folk wisdom, these all have an element of truth. They are often brought up in discussions about the merits of strongly-typed compiled languages versus dynamic languages, and natively compiled languages versus virtual machine based languages versus interpreted languages. I could write about any one of them, and many more, but today I'll address I/O because of the “work” I've been doing on the WideFinder 2 project.

WideFinder 2

The original WideFinder project was, at least on first inspection, quite clearly I/O bound (well, if you had an input file smaller than your main memory...) The WideFinder 2 is a little better because it is a little more computationally complex, but not by much. The benchmark is really simple: process a big (42 GB) web server log file and report some basic statistics. In order to perform the benchmark, the program must parse the file, store information from each line on several maps, and finally extract some statistics out of them.

Tim benchmarked sequential block input on the test system at just shy of 150M/s. This means that the I/O alone required to process the 42GB file should take about 5 minutes, meaning that if the benchmark truly is I/O bound then the program shouldn't take much more than 5 minutes to run. Consequently, if we are to judge based on Tim's reference implementation of the WF2 in Ruby then it quite clearly isn't I/O bound.

I/O Takes CPU, too

If you look closely at the Bonnie benchmark results you'll notice that doing raw block I/O – meaning just reading files into a buffer – consumes almost 80% of the a single core of the CPU. That means that I/O alone comes pretty close to being bound by the CPU as well as being bound by the disk. In fact, experimentation uncovered that placing the system high load reduces maximum I/O throughput. In order to achieve maximum throughput, you actually have to ensure that you keep one core free to manage the I/O.

Application I/O is more than I/O

Another key factor is that what most application programmers think of as I/O involves a lot more than shuttling bits from disk into memory. For example, the Java and other unicode based platforms have to decode the character encoding of the file into the native character encoding of the runtime. In the case of the JVM, this not only requires that every byte be processed, but also frequently doubles the memory consumption of each character and requires the data be copied into a separate buffer. Furthermore, application code usually deals with files line-by-line in the form of some standard (often immutable) string object, thereby requiring yet another pass over the data. So when your code is simply iterating over lines in a file, it isn't really just doing I/O. A fair amount of CPU/memory bound work is required to deliver those nice, neat, unicode strings.

Large datasets change the rules

Memory management isn't free, and even with an industrial strength garbage collector it isn't always simple. A common practice with immutable objects is to have them share some internal state in order to avoid excess data copying. For example, when you take a substring of a Java string, the two string objects share an underlying character array. Often times this saves memory, and it changes generating a substring of a string from a linear time and space operation (with respect to string length) to a constant time and space operation. Unfortunately if the substring is much longer lived that the original string, and especially if it is much smaller, you end up with something that feels awful lot like a memory leak (I think it could be debated whether it is a leak or not).

Even if you're not carrying around a lot of excess baggage in the form of substrings, processing gigabytes of data consumes a lot of memory. In the case of WF2, a little bit of pretty much every line needs to be stored for the calculations at the end. The cast majority of my “debugging” time for WF2 was spent figuring out what JVM parameters will actually allow the program to finish without slowing to a crawl (pre-tuning there were points were it would spend 90% of its time in GC) or consuming every bit of memory available (at least a few WF2 solutions hog tons of memory).

All this means that when dealing with large files other parts of the system need to do more work, which takes away time that could be spent on “real” processing or on I/O (which we've already seen can be a CPU hog). Furthermore, I/O routines and routines commonly used along side heavy I/O (like regular expressions) must be very careful about their memory usage.

So is WF2 really I/O bound?

It depends. Go take a good hard look at the WF2 results page. If you look at Tim Bray's results you would conclude that no, WF2 clearly is not I/O bound. However, if you look at the top you'll see some very impressive numbers that indicate that WF2 indeed is I/O bound (side note: I really need to take a hard look at OCaml). Of course, you could argue that even in the OCaml case it really is CPU bound, because making the task take about as long as the required I/O saturated 8 cores and 32 hardware threads. Scanning down the list would seem to indicate that in most cases I/O does not represent a significant bottleneck, but then it's hard to really tell. The disk may not be the bottleneck, but the I/O routines within the libraries or runtime may be. Consequently, from the application programmer perspective, WF2 is I/O bound.

Redefining “I/O bound” and future impacts

For a long time “I/O bound” primarily referred to hardware or possibly operating system limitations. That was a useful definition, but it is time for it to change. Most software being developed today has a very tall stack of abstractions sitting between it and the hardware. Operating systems schedule I/O and have to split limited resources among many competing processes. Virtual machines sit on top of operating systems, isolating the programmer from the underlying OS and hardware. Libraries and frameworks isolate the programmer from the virtual machine and even other libraries and frameworks. I/O from the programmer's perspective is, or at least should be if he is working on top of good abstractions, that I/O is “everything that happens between when my program requests some data and when it receives it.” Consequently, libraries and runtimes should go to great lengths to ensure that being I/O bound is as close to its original meaning as possible. Prior to multicore systems becoming pervasive that was largely true, but today's I/O libraries fail to take advantage of them, and consequently force I/O into being a bottleneck when it should not be.

That's why in my Widefinder submissions I've worked to separate parallelized I/O into a library-like module. It quite clearly is not done, but it is relatively successful. I reused my the parallel I/O code that I developed for WF1 on WF2 without changing any of the logic. It doesn't provide many features, and it could use a fair amount of optimization and simplification (the line joining logic is an atrocity), but it works.

Sphere: Related Content

Wednesday, April 09, 2008

Multiprocess versus Multithreaded...

...or why Java infects Unix with the Windows mindset.

Recently Paul Murphy, the king of the Sun zealots, blogged about Java bringing the Windows mentality to Windows, all the while slamming Java. In response, John Carrol, a Microsoft employee, rose to the defense of Sun's self-declared crown jewel. Talk about weird.

The funny thing is they are both right, although Murph's arguments are pretty weak.

A little history

Unix and Windows evolved with a very different definition of what the primary unit of isolation should be. On Windows, it is (or was) the node. Each Windows user (and DOS user before him) occupied exactly one node. The worst that could happen is the user destroys his own workspace, so interactive performance reigned supreme over system integrity. You have a node. You have a user. The node does what the user wants as fast as it can. Initially this applied to running a single application at a time, then to allowing several to be open at once but with the one in the foreground receiving primary resources, and finally to allow several applications to run simultaneously. Multithreading reigned king because it was lower overhead and focused on making that foreground process more responsive. Threads were optimized, while processes were neglected.

Unix evolved to be fundamentally multiuser, and its primary unit of isolation is the process. Unix systems were intended to be shared, so it was important that one user could not dominate over another. Furthermore, and slew of processes (daemons) all ran as the same under the same account, while providing services to multiple users, so in order for users to share processes must share. Unlike on Windows, one process crashing the entire system was not acceptable, because that would destroy multiple users' data. As a result, processes were designed to represent a strong level of isolation and heavily optimized to make sure people used it. Threads were largely ignored, or simply treated as processes with a shared heap space, because several cheap processes could simply be chained together to accomplish the same thing in a simpler manner.

The Unix Way

I want you to consider good old-fashioned CGI programs for a moment. Imagine one written in C. First, you may think "Oh my God, running a web application in a non-managed environment. The resource leaks! The memory leaks! The memory consumption of all those processes! Oh the horror!." Of course, you would be wrong. Repeating launching and terminating a Unix process is dirt cheap. Especially a simple program written in C. The OS will cache an image of the executable in memory which can be shared among invocations. The individual process can leak all the resources it wants, because as soon as it terminates all the resources will be automatically freed by the OS, not matter how incompetent the programmer. If the process fails to terminate your friendly neighborhood sysadmin can kill it without hurting any other process.

This method works for producing super-available applications despite incredibly crappy code. I've seen it, both in the for of CGI and in the form of much more sophisticated applications. It works. Users get upset about lost transactions, but the application as a whole almost never goes down.

Enter Java

Java took cheap Unix processes and made them expensive. To compensate, it provided primitives for multithreading. It provided a garbage collector to at least slow memory leaks. It turned all those transient application processes into one big JVM process not only serving all the transactions for a given user, but serving all the transactions for an entire application or even multiple applications. Java made it more difficult to make destructive program errors, but it also made the consequences much more severe. Your friendly neighborhood sysadmin is powerless against a runaway thread or a slow memory leak. All he can do is kill the process, bumping out all of the users, killing all of their sessions.

It's so bad, the process might as well be a node. Unix becomes Windows. The JVM is practically an operating system, but without all of the features of an operating system and a whole lot less mature.

Enter Java Frameworks

This is really what Murph was railing against, although he didn't name it and he conflated it with the core language by labeling "Business Java." Frameworks evolved for a myriad of reasons which are often summarized as "taking care of the plumbing to the developer can focus on the business logic." The "plumbing" is a lot of things, including managing certain resources and generally ensuring the application code executes within a well defined life cycle where it is unlikely to do damage. In other words, instead of giving the user a simple, uniform mechanism like a process to protect the world from his mistakes, he is given dozens of hooks where he can implement little snippets of focused and hopefully bug-free functionality. All this involves a lot of learning above and beyond "the Java you learned in school" (meaning the core language and libraries), putting a cognitive load on the programmer and additional runtime load on the machine.

Multiprocess versus Multithreaded

Most Unixes have evolved efficient threading, and Windows has come a long way in becoming a multiprocess, multiuser environment. Consequently, developers needs to be able to intelligently decide when to use multiple processes, when to use multiple threads, and when to use a hybrid approach. For example, Apache httpd has for quite a while now used a hybrid approach. One one hand on most operating systems threads involve less overhead than processes, so it is more efficient to use multiple threads than multiple processes. On the other hand multiple processes ultimately will give you better reliability because they can be spawned and killed independently from one another, so making a system that can run for months without stopping doesn't require writing a program that will run for months without stopping.

So how do you choose? My rule of thumb is to look at the amount of shared data or messaging required between concurrent execution paths and balance against how long the "process" (not OS process) is expected to live. Execution paths with lots of shared data or that are chatty will benefit from the lower overhead of threading, and threading allows you to avoid the complexities of shared memory or IPC. Of course, multiprocessing allows you to avoid the complexities of threading APIs, and there are libraries to address both, so the complexity issue could be a wash depending on your previous experience.

So why is Murph so wrong? Is JC right?

I think Murph wants to divide the world along nice clean lines. System programmers program in C. They don't need the hand-holding of managed runtimes or languages that treat them like impudent children. They do need lots of flexibility and lots of rope. Application programmers, on the other hand, need high-level abstractions that are close to the business domain that they are addressing. They need to be able to rapidly build software and rapidly change it as requirements evolve. They don't need lots of flexibility and should stay away from low-level details. So, in Murph's eyes, the problem with Java is that it doesn't do either particularly well. The managed runtime and object-orientation get in the system programmer's way, while the general-purpose nature of the language and mish-mash of libraries and frameworks just confuse application developers, or rather distract them from their true purpose. System programmers need C. Application developers need 4GLs.

The fatal flaw in Murph's reasoning is that it ignores the in-between. What happens when the systems programmer or 4GL creator fails to provide the right abstraction for the application developer? He's stuck, that's what happens. Software development is as much about creating abstractions as using them. Consequently, application developers need general-purpose languages.

Sphere: Related Content

Wednesday, March 05, 2008

Does Google Reader read GMail?

Up until recently, all my "Top Recommendations" in Google Reader were related to IT, software development, math, and finance. This makes perfect sense, because that covers all the topics in my subscriptions. However, today I noticed the Google Reader suggested an RSS feed for a blog about local (kind-of) restaurants, clubs, bars, etc. This means that:

  1. Google Reader has a decent idea about where I live
  2. Somehow Google Reader figured out that I enjoy dining out

The best I can figure is that Google Reader is basing my recommendations on the contents of my email. While I don't have any feeds about such things, I do exchange emails with my wife and friends about making dinner reservations, where to go out to eat, etc. The other possibility is that it figured it out using my search history, except that the contents of my search history look pretty much the same as my subscriptions (I know, I'm a nerd). This is because usually I'm not logged into Google when I am searching. Consequently, I think it is reading my GMail.

Anyone else have any ideas?

Sphere: Related Content

Monday, March 03, 2008

Floating Points Stike CNN!

Looks like developers at cnn.com need a lesson in floating point numbers:

Either that or the Democratic primary is truly too close to call...

Sphere: Related Content

Monday, February 18, 2008

Typesafe Ranged Integers in Scala

For the past few weeks I've been working on a basic numeric library for Scala for eventual contribution to the Scalax project. My goal is to create an extensible, type-safe library that provides numeric types that are easy to use and more closely resemble their mathematical foundations than the primitive operations on which they are built. My initial focus has been on building a working rational numbers implementation with unlimited precision integers. However, today on the Scala mailing lists, someone raised the issue that Scala's use of primitive numbers doesn't belong in such a high-level language. When I read this, I thought "Hmmmm, my library helps with that."

Integer Underflow and Overflow

First, let's consider integer underflow and overflow. The standard Java/Scala integer type is a 32-bit twos-complement signed integer. That means it can representing numbers between 2147483647 and -2147483648 inclusive. Often times this is plenty range, so what we really want is for exceptions to be thrown when underflow and overflow. So consider the following:

scala> import scalax.math.Int32Domain._
import scalax.math.Int32Domain._

scala> val a: Int32 = 5
a: scalax.math.Int32Domain.Int32 = 5

scala> val b: Int32 = 7
b: scalax.math.Int32Domain.Int32 = 7

scala> a + b
res0: scalax.math.Int32Domain.N = 12

scala> a * b
res1: scalax.math.Int32Domain.N = 35

scala> val i: Int32 = 2147483647
i: scalax.math.Int32Domain.Int32 = 2147483647

scala> i + 1
java.lang.ArithmeticException: result exceeds maximum value: 2147483647
at scalax.math.Int32Domain$Int32.checkRange(int32.scala:55)
at scalax.math.Int32Domain$Int32.$plus(int32.scala:60)
at .(:8)
at .()
at RequestResult$.(:3)
at RequestResult$.()
at RequestResult$result()
at sun.reflect.NativeMethodAccess...

Magic! Overflow has been detected. Well, not quite magic. The way it works (and I'm sure it could work much, much better, I just through this together tonight) is that it converts the 32-bit integers into 64-bit integers prior to calculations, performs the calculations, then checks the result to ensure that it is within range. If it is within range, then the result is converted back into a 32-bit integer and returned. Between the usage of an object instead of a primitive and all this extra work for range checking, this class is probably at least an order-of-magnitude slower that using primitive Ints.

Bounded Ranges

Now let's consider bounded ranges. In order to accomplish this, we simply create an object that extends Int32 and overrides its minimum and maximum value.

scala> object SmallDomain extends scalax.math.Int32Domain {
| override val max: N = 10
| override val min: N = -10
| }
defined module SmallDomain

scala> val d = SmallDomain.Integer(5)
d: SmallDomain.N = 5

Now we have a domain that is limited from -10 to 10 inclusive. Attempting to mix integers from this domain with integers from other domains will yield a type error:

scala> i + d
:9: error: type mismatch;
found : SmallDomain.N
required: scalax.math.Int32Domain.N
i + d
^

scala> d + i
:9: error: type mismatch;
found : scalax.math.Int32Domain.Int32
required: SmallDomain.N
i + a
^

Even if you had a Int32 within the range of SmallDomain, you would still receive the type error. Also, as you can see, Int32 will not allow itself to be mixed with an integer from SmallDomain. If you look at the code (which I'll publish soon, I promise) for Int32Domain and its superclasses, you will see a lot of complex stuff involving mixins and type constructors. However, the end class is easily extensible by the user and still provides good type safety.

Rationals

I mentioned rationals at the beginning of this blog, so I thought I would give a little sneak-peak:

scala> import scalax.math.BigRationalDomain._
import scalax.math.BigRationalDomain._

scala> val a = Integer(3)
a: scalax.math.BigRationalDomain.IS = 3

scala> val b = Integer(7)
b: scalax.math.BigRationalDomain.IS = 7

scala> a / b
res0: scalax.math.BigRationalDomain.N = 3/7

scala> a.inverse
res1: scalax.math.BigRationalDomain.N = 1/3

scala> a.inverse * a
res2: scalax.math.BigRationalDomain.N = 1

scala> (a / b)^2345
res3: scalax.math.BigRationalDomain.N = 70687450412160609527952431393691553815800419127671886519112946722799601077076407951140254943159198319665943578284183744314587093153151823879346566151398437630838022610858501398173872472014103905434680267558985498302895510754430260294086995134629615707980341285332639084519209821403179213183556756416404920769037158823806045214292550723098516538040...

scala> a * b
res4: scalax.math.BigRationalDomain.I = 21

scala> Integer(3) / Integer(6)
res5: scalax.math.BigRationalDomain.N = 1/2

Notice how an integer times an integer equals an integer, but an integer divided by an integer equals a rational. This is because integer is a subtype of rational, because in all integers are rationals, but not all rationals are integers.

More to come soon!

Sphere: Related Content

Friday, February 08, 2008

Linguistic Success

There's a new favorite pastime on the fringes of the Scala community. That pastime is blogging about aspects of the Scala language that will prevent it from being a "success" unless they are quickly addressed. "Success" is roughly defined as "widespread commercial use." A good metric might be: "How hard is it to find a job programming Scala?" The premise of the criticism usually revolves around one or more complicated, terse, or "foreign" (relative to the author) constructs that are common Scala, or at least favorites among frequent posters, and how these constructs will prevent "average" programmers from understanding Scala and thereby prevent commercial adoption. A recent favorite is symbolic function names (/: and :\ for folds).

The logic of this argument seems relatively sound. When choosing a technology for a project, it is very important to consider the availability of potential employees who know that technology. Forcing every new member to scale a potentially steep learning curve is a frightening prospect. I can imagine development managers lying awake at night fearing that their expert in some obscure technology will leave, and it will take months to replace him. It's a legitimate, although I think slightly exaggerated, fear.

That being said, I think it has little to do with the adoption of a new language. The majority of programmers simply do not spend their free time learning new languages. Many, maybe most, won't even use their free time to learn languages that they know have ready pre-existing demand. They learn when they are paid to learn, or when failing to learn will lead to immediate negative consequences for their career. I think the same can be said of most professions. Most people work because they have to, not because they want to.

Consequently, I think expanding beyond a core of enthusiasts is very difficult, if not impossible, simply be attracting people to the language. Right now some leading-edge Java people are taking a look at Scala, because they think it might be the next big thing in Java-land. These people are different than enthusiasts. Enthusiasts will learn a language for the sake of learning it. The leading-edge folks learn it as a high risk investment. If they can get a head start on the next big thing, it will be great for their careers (and businesses). These people constantly think "can I sell using this technology?" and "if I do sell it, while it come back to haunt me?" This is a very pragmatic perspective, and it is the perspective I take when I'm at work.

Confusing, odd-ball language features make the sell a lot harder. Pitching them as features increases personal risk.

But it doesn't matter.

Why? Because the vast majority of developers are not going to learn a new language because they want to, they are going to learn it because they have to. Not to mention that there are countless languages out there, so going after the enthusiasts and leading-edgers (who are mostly lookers anyway) is just fishing in an already over-fished pond.

So how does a language become a success?

Enough people use it for real projects. Let's say a consultant rapidly prototypes an application using the technology, and that application makes it into production. Now maintenance programmers have to learn that technology. Sure, it's complicated, but unlike the guy learning it on free weekends, the maintenance programmers have all-day every-day. It's hard to learn complex concepts a hour or two at a time, but these guys have all day, and the next day. It's hard to memorize something by looking at it a couple hours a week, but spend all day staring and it and it will click. Not to mention that their livelihoods depend on it. Any sort of "cowboy" development team can cause this to happen, and frankly such teams are pretty common.

So maybe one-in-five maintenance programmers actually like the technology, and admire the cowboys, so when they go get a chance to do new development, they use it, too.

The same thing can happen with products from startups. Let's say a startup builds a piece of enterprise software using Scala. They sell it to big, conservative companies by emphasizing the Java aspect. They sell customization services, too. And then it's back to the maintenance programmer, who has no choice.

Notice a pattern? The key to language success is making it powerful enough for a couple cowboys to do the work of an entire team in a shorter period of time. Selling fast and cheap is easy. If you have enough fast and cheap, the business people won't care if you are making it out of bubble-gum and duct-tape, because you are giving them what they want.

The key to success is making the reward justify the risk. Judging by what some people using Scala for real-world projects are saying, and my one hands-on experience, I think Scala offers it. It's just a matter of time before it sneaks its way into enterprises, just like Ruby has.

Sphere: Related Content

Monday, January 21, 2008

Programming Language Continuum

Introduction

Ever since I started getting into less-than-mainstream programming languages, I've pondered how to go about classifying them according to their attributes in the hopes that it would yield insight into what an ideal programming language would be. There is the genealogical way that traces roots and influences, but that doesn't break down attributes or make any qualitative judgements. Here I intend to mostly frame the context for debate, rather than draw significant conclusions. I believe bringing some more structure to the discussion is necessary before much more can be gleaned from it.

So what I've done is attempt to break the essence of a programming language down into two dimensions:

  1. Enforced Structure / Dynamism
  2. Engineered Foundations / Mathematical Foundations

Here's a loose attempt at classifying some languages:

Enforced Structure vs Dynamism

Recent debates have focused heavily on the difference between statically typed and dynamically typed languages. I believe that this is a specific case of the more general issue of "how much structure should the programming language force on the programmer?" Today we see Java versus Ruby, but it could just as easily be Ada versus Lisp. On you side of the debate you have people who believe that heavily structured languages are essential for scaling and helping ensure correctness. One of my college professors once said (paraphrased) "the compiler is your friend, help it catch your errors for you." Also, a well defined and compiler checked structure can help scale programming projects up to large teams by ensuring that all team members are working to the same program structure. Finally, they point out the sophisticated tools that available for many statically typed languages, particularly refactoring tools, but also code generators and static validators.

On the other side of the debate you have dynamic language advocates. They claim that statically typed languages are too constraining and require more up-front design than is practical, especially given the vague and changing requirements typical of software development projects. Furthermore, robust code an be achieved through automated testing and by significantly reducing the amount of code required to deliver the required functionality. Finally, they point out the quicker feedback cycles that dynamic languages enable by shortening the change->compile->deploy->test loop to change->test. There was a time when this loop was actually figured into software development cost and schedule estimates, but today outside of very large projects it is simply a cognitive disruption.

Engineered vs Mathematical Foundations

Most of the mainstream programming languages have been "engineered" or "designed" to better enable some group of programmers to better achieve some objective. For example, Ada was engineered to enable the development of very large, complicated and highly reliable systems by huge teams. SmallTalk was designed to enable the construction of modular software in a more natural or "human" manner. COBOL was designed to enable business analysts to program. The list goes on and on, but the common theme is that most of these languages were designed or engineered with very specific design goals, and those goals were largely disconnected from strong mathematical foundations.

On the other side of the spectrum you see languages that are very strongly influenced by computer science theory. Lisp started out as an executable implementation of the untyped lambda calculus and has stayed fairly true to that origin. Haskell combines the typed lambda calculus with category theory and focuses on functional purity. Prolog is based on logic, and SQL on relational theory. What these languages offer, especially strongly typed ones like Haskell, is that they enable computer programs to be much more easily reasoned about by theorem provers (and similar) and therefore can provide a much higher degree of safety. To the initiated, they also provide much more natural and elegant abstractions.

Hybrid Languages

The idea of dynamic languages with option type annotations is often raised as a potential means to bridge the divide between the advocates of static and dynamic languages. Common Lisp provides optional compile-time type checking, but in practice is seems to be used mostly as an optimization in special cases rather than as a means of ensuring correctness. Some touted StrongTalk as an answer, especially when Sun released it as open source, but it seems to have sputtered out. There was much talk about adding optional type annotations to Python, but I believe (please correct me if I am wrong) it is still absent from Python 3000. So while optional static typing is a favorite debate topic, its not popular in the languages that have it and attempts to add it to languages that don't have yet to reach fruition, despite considerable effort.

A recent phenomena, or perhaps simply recently receiving attention, are hybrid languages that attempt to blend concepts from strongly-typed functional languages such as Haskell with more mainstream object-oriented underpinnings. In my opinion, Scala is probably the best of these, as it offers innovative OO features in addition to functional ones, but others such as F# and OCaml certainly deserve mentioning, as do no doubt countless others that I am unfortunately going to neglect.

Conclusion

I think that hybrid static/dynamic languages will never really become popular - or more accurately that the capability will not be extensively used in the languages that offer it. There are a couple reasons for this. Primarily, I think that making static typing optional almost completely eliminates the benefits of static typing. Second, I think the primary advantage of dynamic languages is that they allow partially formed thoughts to be quickly expressed in "working" (meaning running, not correct) code, and that static typing is a major barrier to this. I personally find doing exploratory programming in dynamic languages to be much more pleasant and productive, but once ideas are concrete I prefer the compile-time checks and completeness of static typing.

I personally believe that languages that blend the engineered structure of OO with mathematical formalism represent the future of programming. Scala and F# are here, and both Java and C# are gradually acquiring features from strongly typed functional languages. What's going to be interesting is how it shakes out. If you peruse the Scala mailing list archives, you will notice that there is a marked tension between those from an object-oriented (engineered) perspective who enjoy the extra power that functional programming provides them, versus those from a more pure functional programming background (mathematical). Ultimately, at least from a popularity perspective, I think the more OO approach will win, as historically languages engineered to provide specific advantages have won out over languages with robust mathematical underpinnings.

Sphere: Related Content

I hate Apple and HP

A couple months ago I bought a new MacBook. For the most part I've loved it, but last night I ran into a problem.

When I bought my Mac, Apple was offering a $100 rebate on the purchase of a new printer to go with it. The sales guy pointed out that there are number of printers that cost around $100, so the printer would be essentially free. I chose an HP C4280 All-in-One. Look at that! If it wasn't for sales tax I would have made five cents off of the purchase. Well, you get what you pay for

As a printer it's worked fine. I didn't even need to install any drivers. I plugged it in and it just worked. Of course, that's what I expect from Apple. But last night my wife wanted to scan a document and make it into a PDF. I figured, "Ok, that should be easy." Boy was I wrong.

So I launch Image Capture on my Mac to scan the document. It tells me that I don't have any attached devices. Hmmm. I printed a few minutes ago. Why don't I have any attached devices? So maybe I'm using the wrong application. There's a "scan" button on the printer, so I press that, hoping that the right application will magically open up (see what Apple has done to me!). The printer thinks for a minute, and then tells me that it's not connected through USB. Well, of course it is, because I just printed over USB. I decide to do some Googling

It turns out that while the printer drivers come pre-installed with Leopard, the scanner drivers do not. It's a 192 MB download full of crapware. I hate Apple for making me think that I didn't need to install drivers, and then consuming a chunk of my Sunday evening installing drivers. They set an expectation and then disappointed. It would have been much better to just make me install all the drivers up front.

But why did I say it was full of crapware? Well, let's see. So I scanned the document as a PDF with text (so it has to do OCR) using "HP Scan Pro." That worked. Kind of. I did get a decent looking PDF document with the text correctly converted into text. I also got a completely locked up HP Scan Pro application, and I mean completely locked up. I tried to log out of my Mac, figuring that would clean up the crashed process. Nope! It sat there for a minute, then complained that an application wouldn't exit and asked if it should do it forcefully. I of course said yes, and then it just sat there for a few minutes longer. I got the same result from trying to shut down. At least when you tell Windows that it can violently kill, it violently kills the processes. Apparently MacOSX is too polite, or at least has more patience than I do.

That's another reason to hate Apple. It was worse than Windows, and using a product purchased from the Apple Store no less.

Fortunately I'm a Unix guy and I know how to violently kill processes.

su ps -ef | grep HP kill -9 pid1 (pidX is the process id of an HP process) kill -9 pid2

Until they are all dead. That worked. Of course in the process of doing this I discovered that there are a couple HP processes running as root, which disturbs me to no end.

What I'd like to ask Steve Jobs is: How many Mac users would know to drop down to a terminal, su to root, and violently kill the processes? I just can't see your average non-techie Mac user doing that. Apple should really do a better job screening the products it sells with new computers.

Sphere: Related Content

Wednesday, January 02, 2008

Slightly Less Double (Im)precision

Anonymous writes

You may want to try evaluating the polynomial using a different form. For example, given the polynomial: A*x^3 + B*x^2 + C*x + D one is often tempted to just enter it "as is". However it can also be expressed this way: ((A*x+B)*x+C)*x+D Generally speaking polynomials often work better using floating point this alternate way. Some programming languages/systems know this and convert the expression before evaluating it.

...and he is absolutely right. It makes the code of heck of a lot more efficient, too, because it eliminates the calls to Math.pow. That being said, it does not completely fix the problem. The polynomial line is a lot smoother, and the fitting algorithm yields a lower degree polynomial for the same mean error, but I still think the results are too fuzzy compared to the higher precision floating point. Here's a graph to show the difference:

Compared to the previous result:

Further improvement could probably be obtained by taking a close look at the QR Decomposition algorithm used to do the least-squares fitting.

In my opinion, the problem here is not so much the double-prevision floating points are bad. They are not. For many applications, especially with carefully crafted algorithms, they are great. They are certainly much higher performance than their higher-precision object-oriented kin. I'll warn you: Don't do high precision matrix operations with your MacBook in your lap - it gets really hot. The double-precision version finishes in a blink. It also takes a several orders of magnitude longer than using doubles. The problem is that, as an abstraction for real numbers, doubles are extremely leaky. Of course, this could be extended to any fixed-prevision, floating-point representation of numbers, depending the application.

Basically, I think in most applications doubles represent a premature optimization. Higher-precision numbers should be used by default, and then the precision reduced in order to improve performance, rather than doubles being used by default and then higher-precision numbers being considered if the programmer realizes that he has a problem due to lack of precision.

Of course, the problem is I don't know how precise is enough, because it depends entirely on the application. I'm tempted so say that, whenever possible, exact representations should be used. I've done a little research into it, and I'll do some more. There's tons of papers on the subject, but everything I've encountered so far seems to require intimate knowledge of floating points, the algorithm using the, and possibly the data being fed into the algorithm. That could help with library functions, such as matrix decomposition and solving, which could automatically scale up the precision of their internal operations in order to meet the expected resulting precision of the calling code, but that would still be leaky for "user-implemented" algorithms. What I really want is something that will work in the general case with reasonable performance characteristics, which can then be tuned for specific applications by reducing precision.

Sphere: Related Content

Open Source, Cost, and Enterprise Product Adoption

This is a relatively common topic, and today is was raised on TSS, as a result of a blog by Geva Perry:

Are developers and architects becoming more influential in infrastructure software purchase decisions in large organizations?

...with an implied "as a result of open source software." It's an interesting question, and I think it can be generalized to: What effect does license costs have on the acquisition of enterprise software?

In considering this question, it is important to remember that:

  1. In large organizations, money often comes in many colors, such as: expense, capital, and internal labor
  2. The level of authority an individual has depends on both the amount and the color of the money involved
  3. Certain colors of money are easier to obtain than others, and sometimes it varies dependent on the amount
  4. Accounting rules, both standard and self imposed, effect what can and cannot be purchased with a given color of money
In a nutshell, the budgeting process for large organizations can be extremely idiosyncratic. Not only do the official processes vary, but individual personalities and budget cycles can have profound effects.

So the first effect the cost of a piece of enterprise software has is to limit the various buckets of money that can be used to pay for it. However, this can be a very complex process. Let's assume any given piece of software typically has the following types of cost:

  1. One-time license cost (both for the application and support infrastructure, such as the OS and DBMS)
  2. Recurring maintenance and support cost
  3. Hardware cost (e.g. a server)
  4. Internal labor for development and deployment
  5. External labor for development and deployment

The lower the costs involved, the less approval is required. Driving down license costs pushes the initial acquisition decision closer to the users and/or developers. This is a big win for open source applications. It's probably a bigger win for application vendors. For example, most enterprise applications require a DBMS, such as Oracle. Oracle is not famous for being cheap. So let's say your potential customer can readily obtain $100k to spend on software licenses. If you are a software application company, do you want that money as revenue, or do you want 2/3 of it to go to Oracle and IBM?

I'll give you a hint. You want a department to be able to deploy your software without cutting a check to the big boys, but you want to be able to say "Yes, we support your enterprise standards" to the big-wigs in the IT department who think that there isn't a major conference for a piece of software, then it shouldn't be on the network. That way your product can be approved, and then the running it on Oracle can be deferred until "usage levels warrant it."

Hardware costs are even more interesting. At my employer, equipment that costs $5k or more is "capital," and less than that is expense. Capital is generally easier to obtain if (1) you know you need it a year in advance, or (2) it's the end of the year and there's still money laying around. It is impossible to obtain at the beginning of the year, when everyone thinks that they will actually spend their allocation, unless of course it was approved last year. Conversely, expense money is much more plentiful at the beginning of the year, when managers are still dreaming of sticking to their budgets, and becomes more and more difficult to obtain as reality sets in. So what's the point? Well, you want your product to require a small enough amount of hardware so that a first or second line manager can purchase it on expense without obtaining approval, but also have a recommended configuration that costs enough to be purchased on capital

This is interesting because big-iron Unix guys will often lament about how their systems have such a lower TCO than x86 Wintel or Lintel systems, so all the arguments about x86 systems being "cheaper" is bunk. What they are ignoring is that it is much easier to spend $5k on twenty small things (plus setup labor on each of those twenty items) than it is to spend $50k on one big item, because the $50k either has to be approved a year in advance or it has to wait until someone else's project falls apart so they can't spend the $50k. The "total" in TCO is completely ignored, because very few people think about the few hours that each of those servers requires to setup.

Now, about labor costs. Managers generally have at least some discretion over how their direct reports spend their time. If you actually think about it in terms of the fully burdened hourly cost of an employee, managers often have significantly more "budget" under their control by their means to direct how time is spent than they do for purchasing licenses and services. Again, this is a big win for open source.

The bottom line is that the best way to get your foot in the door is to have the lowest marginal cost of deployment as possible. I'll offer as evidence the countless wikis that have popped up around my workplace, very few of which are even officially approved.

Of course, this makes you wonder why the marginal cost of deploying a piece of enterprise software tends to be so high. Why aren't more vendors practically giving their software away for small deployments? Well, many do, such SQL Server Express and Oracle XE. But there's still more that don't. The problem is that it's really hard to get the total cost of initial deployment down to below the point where the bureaucracy kicks in, and once in kicks in it helps to be more expensive.

Yes, that's right, I said more expensive.

You see, these days one of the great ways to make your career in IT is to be a good negotiator. The combination of off-the-shelf software and outsources have shifted IT expenses from being dominated by internal labor to being dominated by procurement contracts. However, you don't build and illustrious career by negotiating $10k contracts. Likewise, once you pass a relatively small threshold someone decides that the project needs a "real" project manager, instead of just an "interested" manager or a technical lead, and project managers measure are unfortunately measure more by the size of their projects than the value that they deliver. (yes, that's right, screwing up a multi-million ERP implementation is better for your career than successfully deploying some departmental application under budget and on schedule)

In other words, once the signatures of additional people are required, you have to have something big enough for those people to consider it worth their time. So, if you are a software vendor, or an internal employee planning a deployment project, then you either need to go really small and viral or really big. Medium size projects are simply too hard to push through.

And, in my opinion, that's really a shame, because medium sized projects tend to have the best value proposition. Small ones involve too little of the organization to have a significant impact, and large ones become two unwieldy to meeting objectives. Projects need to be large enough to do things right in terms of technology and engage future users, but small enough have readily apparent benefits and incremental deliveries that provide real value (not simply "look, it's a login screen!").

Maybe it's different in other organizations, but somehow I doubt it. However, I'd be really interested in knowing what others' experiences are.

Sphere: Related Content