Sunday, July 13, 2008

Love, Hate, and Type Inference

Ever since I started using Scala, I've had somewhat of a love-hate relationship with type inference.  On the local level, type inference can make code much more concise, easier to read, and easier to refactor.  It makes code easier to read because in cases where the type of variable is obvious, type annotations just add noise that distract from the meaning of the code.  It makes code easier to refactor, because often times changes to the type of something in one place in a program can ripple throughout the rest of the program through a simple recompile, assuming the new type supports the methods being used on the original type.  This leads to something akin to structural typing without the associated overhead.

However, at a more macro level I think type inference can make code much more difficult to read.  For example, consider the following code from the Scalax library (full source here):

//  Scalax - The Scala Community Library
//  Copyright (c) 2005-8 The Scalax Project. All rights reserved
abstract class InputStreamResource[+I <: InputStream] extends CloseableResource[I] {
    def buffered : InputStreamResource[BufferedInputStream] =
        new InputStreamResource[BufferedInputStream] with Wrapper {
            type Handle = BufferedInputStream
            override def wrap(is : SelfHandle) = new BufferedInputStream(is)
            
            override def buffered = this
        }

    def slurp() = for (is <- this) yield StreamHelp.slurp(is)
    
    /* Obtains a Reader using the system default charset. */
    def reader =
        new ReaderResource[Reader] with Wrapper {
            type Handle = Reader
            // XXX: should be UTF-8 by default instead of OS default
            // practically, here in Russia I never used default charset
            override def wrap(is : SelfHandle) = new InputStreamReader(is)
        }
    
    def gunzip =
        new InputStreamResource[GZIPInputStream] with Wrapper {
            type Handle = GZIPInputStream
            override def wrap(is : SelfHandle) = new GZIPInputStream(is)
        }
    
    /** Obtains a Reader using the supplied charset. */
    def reader(charset : String) = {
        // Do this lookup before opening the file, since it might fail.
        val cs = Charset.forName(charset)
        new ReaderResource[Reader] with Wrapper {
            type Handle = Reader
            override def wrap(is : SelfHandle) = new InputStreamReader(is, cs)
        }
    }
    
    def lines = reader.lines
    
    def lines(charset : String) = reader(charset).lines
    
    def readLines() = reader.readLines()
    
    def readLine() = reader.readLine()
    
    def pumpTo[O <: OutputStream](osr : OutputStreamResource[O]) {
        // Note InputStream should be opened before OutputStream
        for (is <- this; os <- osr) StreamHelp.pump(is, os)
    }
}

This is an example of where type inference makes code easier to refactor yet more difficult to read.  If one were to change the return type of the reader methods, or the subsequent lines and/or readLines methods on that type, then the return types of these methods would automatically change on recompile.  However, now try to figure out the return types of the lines and readLines methods.  In order to do that, you need to know what the return type of the reader method is, and the structure of that type. Figuring out the return type of reader is reasonably straight forward, as it is defined in the same.  However, the base class for the return type is not, so in order to figure it out you need to trace through several class definitions and potentially in other source files.  I doubt this is a big deal for those people who are intimately familiar with the code, but I pity the new guy who has to sort it out when he's trying to create a new subclass of this abstract class.  Of course, there's always the API docs, so users of the code, and the poor new guy, do have a place to turn.

So that's Scala, which supports a relatively limited form of type inference.  Now, let's consider a snippet of OCaml, which has much fuller type inference, that I stole from Mauricio Fernandez:

   (* Copyright (C) 2008 Mauricio Fernandez  http//eigenclass.org
      Solution to the coding challenge at
      http://beust.com/weblog/archives/000491.html *)
   open Printf
   module S = Set.Make(struct type t = int let compare = (-) end)
   let (--) set elm = S.remove elm set
  
   let permutations f zero digits len =
     let base = S.cardinal digits in
     let rec aux n digits = function
         0 -> f n
       | len -> S.iter (fun s -> aux (n * base + s) (digits -- s) (len-1)) digits
     in S.iter (fun s -> aux s (digits -- s) (len - 1)) (digits -- zero)
 
   let () =
     let max = 10_000_000_000 in
     let digits = List.fold_right S.add [0; 1; 2; 3; 4; 5; 6; 7; 8; 9] S.empty in
  
     let count = ref 0 and prev = ref 0 and maxj = ref 0 and base = ref 0 in
     let report () = printf "Found %d numbers under %d.\n" !count max;
                     printf "Max jump: %d (%d -- %d).\n" !maxj !base (!base + !maxj)
   
     in try
       for i = 1 to 10 do
         permutations
           (fun num ->
              if num >= max then raise Exit;
              (* printf "%d\n" num; *)
              incr count;
              let jump = num - !prev in
                if jump > !maxj then (maxj := jump; base := !prev);
                prev := num)
           0 digits i
       done;
       report ()
     with Exit -> report ()

It's not entirely fair that I am picking on this OCaml code, because I don't know the language.  Also, this is a short, self-contained program that I personally would be likely to write in Python without any type annotations at all.  So in a way, for short programs like this, I think full type inference is great.  You get all of the protection of static typing with none of the hassle.  That being said, I think it would be extremely difficult to approach a large code base that is this devoid of type annotations.

All this adds up to my love-hate relationship with type inference.  I love it when I'm writing my own code, and I hate it (aside from local variables) when I am reading other people's code.  Over the past months I've decreased my use of it in Scala, preferring instead to explicitly specify types wherever they would not be entirely obvious from looking at the code, even though all my Scala code is hobby-work so no one else has to read it.  Of course, being hobby work, I often go days, weeks, or even months without looking at a chunk of code, so I often need help remembering what I wrote.  Overall I would say that excessive reliance on type inference obfuscates code, and that there are plenty ways for programmers to obfuscate their code, so it is a style issue as opposed to a language design issue.  I also think it will be interesting to see how many dynamic language people migrate over to languages with strong type inference as these languages gain more attention.  As you can see from the OCaml code above, there is no extra "typing burden" placed on the programmer.  If the program is correctly typed, it will compile and run quickly.  If not, it won't.  In theory this should satisfy the "type annotations are too much work/add too much noise to my program" camp, but not the "I don't want the compiler telling me what I can and can't do" camp.  My guess is that more people fall into the latter than the former, even the ones that claim they don't, but only time will tell.

Update: I just had a thought. Try thinking of type inference like using pronouns in natural languages. Imagine talking to a person who rarely uses pronouns. It's slow, cumbersome, and occasionally makes you think the speaker thinks you are an idiot who can't remember what he said 30 seconds ago. Likewise, when someone speaks entirely in pronouns, especially when they use a pronoun to refer to something that isn't part of the immediate conversation, it leads to utter confusion. Context is key, and the compiler can usually keep a much more distant context in its working memory than a programmer can. Type annotations provide context. Type inference assumes context.

Update 2: Here's a link to some OCaml API docs, where, just like in Scala, you can clearly see the inferred types.

Sphere: Related Content

11 comments:

Unknown said...

I always thought it would be a good code convention to add the return type information to public methods. Even if the compiler is perfectly able to infer it, it makes the contract explicit

Unknown said...

In Groovy, which has optional typing but no type inference, I find myself adding type signatures to methods (both arguments and return values), but not much in the internals.

A method is an abstraction barrier, the user shouldn't have to look inside in order to use it. It's great that the compiler can look inside and figure out the types, but I don't think the user should have to do the same thing.

Ilmari Heikkinen said...

OCaml's type annotations are visible in the generated documentation (and with ocamlbrowser.) Library code usually has a .mli that explicitly spells out the module's interface (bit like a .h). If you're getting to grips with a random bit of code, pasting statements into the ocaml REPL (add ;; after paste) gives you the inferred type signature as well.

As module documentation is also in the .mli, you tend to have the type signatures as a machine-checked part of the docs, while the actual code has very few, if any, type annotations. If you generate the docs or use ocamlbrowser, there isn't much difference between jumping into a Java project and an OCaml project -- if there is no documentation, you still have the type signatures.

Erik Engbrecht said...

Ilmari,
I've added a link to some OCaml API docs just so that people can refer to example.

Keep in mind that my point is relatively narrow. A lack of type annotations can make the code more difficult to read. Having to flip to an API doc interrupts the thought process. That's better than the information not existing, as occurs in dynamic languages, but in my opinion less than ideal. Code should be written to be read without heavy reliance on external references.

Jedaï said...

As an Haskell user, I can understand your problem. In fact in Haskell knowing the type of your expression might be even more important in some case since the same piece of code can have different behaviour depending on its type (inferred).

It's not as scary as it may seem, but for example, the function read transform a string into a value of the type which is asked of it, this type being usually determined by the surrounding code.

So "(read a) + (read b)" will translate the String a and b to Integer, because there is a (+) (not really true since in Haskell (+) works for all numeric type classes but let's assume).

Haskell have got full type inference though it's type system is more powerful than most and may require annotation from time to time (very seldom in real programs, most often in toy code because you don't have enough information to infer the right type) but traditionally, and for reasons that are very close to those you're exposing, Haskell coders put type signatures on most top-level functions.

--
Jedaï

Anonymous said...

There is definitely a happy medium between maximal type inference(which is practically equivalent to dynamic types, with all the inherent pitfalls of that apporach) and unnecessary annotations everywhere. At key points in the architecture of a program you want to guarantee that a certain type is being used - that's when you annotate.

Plus in inferred languages, the compiler will inform you of all points that need to be updated should you decide to radically change your types, instead of seeing only "one step ahead" in the compile, or moving the problem to a runtime error.

Mark Lee Smith said...

Worth reading simply for the update. Very well thought out :).

Jorge Ortiz said...

Erik,

I agree that type inference can sometimes make code hard to read. This happens to me when hacking on unfamiliar parts of lift.

I find that better tools can help a lot in this regard. I generally prefer to use Emacs or Textmate to program in Scala, but using the Eclipse plugin to periodically check the types of expressions can be invaluable.

Jon Harrop said...

This is a solved problem: you need to use an IDE/editor that provides automatic throwback of the inferred types. This gives you the best of both worlds: no type annotations unnecessarily littering your source code but complete information about the types in your code.

The freely available introductory articles from both the OCaml Journal and the F#.NET Journal describe this feature and you are quite right that it is essential for serious development in languages with type inference.

usr said...

This is comparable to the @Override annotation in plain Java, just that the amount of visible or invisible information is much bigger. The compiler can happily live without but having the annotation there makes the actual meaning of the code a lot more visible.

@Override has spawned IDE support for automatically adding the annotation where it applies (or is mandatory, if you instruct the compiler accordingly). I think this model would also be ideal for inferrable type annotations: write your code maxing out the inference capabilities, have the IDE materialize type inference not just in a mouseover but directly in code and search for your mistake if the IDE-generated code differs from what you expected to appear. A useful IDE configuration might for example add inferred types only to public fields/methods or something like that (as an interesting side-effect, this might even give people an incentive to use private more often: the resulting code would look shorter).

PS: if Scala ever gets mainstream we will see horribly paranoid code policies mandating to type in every type annotation possible installed in all the companies where people think that the less lazy way is always the better way (are there any other companies?).

Anonymous said...

Interesting article you got here. I'd like to read a bit more concerning that topic. Thanks for giving this material.
Joan Stepsen
Technology gadgets