Erik Engbrecht's Blog: Metaprogramming

I just saw Martin's post on levels of expertise in Scala, and Tony Morris's response. The concept of levels is something that many in the community have called for repeatedly, in order to help ease people - especially team members who may be a little put off by some of Scala's more advanced features - into the language more gently. I understand the motivation, but it has always struck me as somewhat odd. The reason is that I think one needs to have very compelling reason for such a significant change to be justified, especially a change from a mature, mainstream technology such as Java into one as new an immature as Scala. To being, I'll recount my experience with adding Python as a stable in my programming toolbox.

Learning Python

I dabbled it Python a lot before really using it extensively. Much of my initial usage basically amounted to using Jython as a repl for interactively poking at Java libraries, especially ones that weren't particularly well documented. I thought Python was neat, but at the time most of my programming experience was in C++ and Java. I valued static typing. I valued having a compiler. When I was in college, I had a professor who said something along the line so of "The compiler is your friend. Write your code so that the compiler is more likely to catch your mistakes. Don't try to trick it or bypass it." This was in the context of C++ programming, and given all the ways that one can shoot himself in the foot using C++, it always seemed like sage advice. So giving up have a compiler to catch my mistakes, and an explicit structure clearly present in my code, seemed like an awful lot to ask. Sure, Python is more concise, but in my opinion conciseness is often more a matter of how closely the libraries you're using match the problem that you are trying to solve. About a year ago I had some experimental Python code that I had thrown together that did some regular expression heavy text processing then some XML processing (I was processing output of an application that produced invalid XML, so the first phase was the fix it so the XML parser wouldn't crash and the second was to actually process it). I can't remember why, but I decided to convert it to Java. I think it was because I wanted to integrate it with something that integrated well with Java, but not as well with Python. Anyway, the point is that the Python code and the Java code weren't that different in length. Sure, Java has a lot more boiler plate, but once your code starts being dominated by actual logic instead of getters and setters, it's the libraries that make the difference.

Anyway, I didn't really get into Python until I discovered and thoroughly learned metaclasses and descriptors. In my mind, they took Python from being a toy with a clean syntax to a language that opened up an entire new dimension of abstraction. A dimension of abstraction that had been missing all me life. A dimension that I had glimpsed in C++ template metaprogramming but had never quite managed to enter. All of a sudden I had this tool that enabled me to drastically and repeatedly reduce code sizes all while making the business logic cleaner and more explicit. I had a means to encode my ideas such that they enabled my coworkers to use them without fulling understanding them (an admittedly dangerous proposition). It was beautiful! They remain an important part of my toolbox to this day.

Incidentally, my first uses metaclasses and descriptors was to concisely implement immutable, lazy data structures. This was long before I knew anything about functional programming.

Learning Scala

I also dabbled in Scala a lot before really becoming enamored with it. I was caught by the conciseness over Java, especially the absence of getters and setters, the presence of operator overloading, and multiple inheritance using traits. But I had operator overloading and multiple inheritance in both C++ and Python. I could usually abstract away explicit getters and setters in Python, and by the time I started picking up Scala I had already made a significant shift towards immutability and laziness, so setters weren't as necessary. Scala was neat, and the compatibility with Java was a real benefit, but at first it didn't seem compelling. In fact, when I first started experimenting with Scala it seemed too buggy to really be worth it. Something worth watching, but not yet worth using. Also, by the time I started learning Scala I had already learned Common Lisp and functional programming with full tail-call optimization seemed awkward (and still does).

What eventually sold me on Scala was higher-kinded types, companion objects, and objects-as-modules. You see, but the time I started picking up Scala I had already developed a habit of using a lot of generic programming. This is easy in Python where there's no compiler check a type system to tell you that you are wrong. But a lot of the patterns I commonly used couldn't be translated into Java generics (in fact, in my first foray into Java generics I crashed javac writing code that was compliant with the Java specification). Scala gave me concise programs and type safety! And with the introduction of collections that actually use higher-kinded types in 2.8.0, along with the leaps and bounds made in robustness, I think it finally reached threshold for serious use.

That's not to say these are the only key features. Understanding folds was a real eureka moment for me. I'm not sure how one can do effective functional programming without them, but perhaps there is a way. Implicit definitions are also key, and they are used so extensively in Scala libraries that I think writing Scala without a decent understanding of them is nothing short of voodoo programming. In short, there's a lot, and the features tend combine together in powerful ways.

Summary

I understand the motivation for establishing levels, but I personally think if one avoids many of the features that Martin has flagged as advanced, then the benefits of Scala don't justify the risk of using such a new technology and the need to retrain people. I think without them one is better off finding a better library or framework for their current stack. Also, between the fact that advanced features are omnipresent in the libraries, and teams will always have someone who wants to use the more powerful features, avoiding them works a lot better in theory than it does in practice. You can't not use them, unless you don't use Scala's standard library and avoid almost any library written explicitly for Scala. You can only pretend that you are not using them.

This is probably going to get me flamed, but reward must justify risk. In many circumstances Scala has the features to justify the risk associated with it, but the justification quickly vanishes as the more powerful features are stripped away. So while I think a programmer may be productive at Martin's lower levels, they don't justify Scala over Java.

Sphere: Related Content

Engineering in general, and software engineering in particular, is all about abstraction. The creation and utilization of abstractions is a core part of the daily activities of every good engineer, and many engineers are on a never ending quest to increase their level of abstraction. Abstraction both increases productivity and increases the complexity of problems that can be tackled. Few would argue that increased abstraction is a bad thing.

But people do argue about abstraction, and often condemn abstractions that they view as unfit or too complex. It is common to hear a software developer praise the merits of one means of abstraction and then demean another in the same breath. Some abstractions are too general. Some aren't general enough. Some are simply too complex or, ironically, too abstract. So while engineers almost universally agree that abstraction is good, and more abstraction is better; they often disagree fiercely about what genuinely constitutes "better."

What is an abstraction?

Simply put, an abstraction is something that hides the details involved in realizing a concept, and thus frees the engineer to focus on the problem at hand rather than the details of some other concern. This can take many, many forms. The simplest and most common, so common that the term is often used to describe almost all "good" abstractions, is higher-level abstractions.

Higher-level Abstractions

Higher-level abstractions are fairly simple: they encapsulate details so that they can be used without knowledge or concern about the details. Let's consider a simple example in Scala 2.8 (you can copy and paste these examples directly into the Scala 2.8 REPL):

scala> def sum(data: Array[Double]): Double = {
     |   var i = 0
     |   var total = 0.0 
     |   while(i < data.length) {
     |     total = total + data(i)
     |     i = i + 1
     |   }
     |   total
     | }
scala> def mean(data: Array[Double]): Double = sum(data) / data.length

So we have two functions: one that computes the sum of an array of doubles, and one that computes the mean. Pretty much any programmer (professional or otherwise) would feel confident both writing and using such functions. They are written in a modern multi-paradigm programming language yet I bet if you went back in time and showed them to programmers in some of the very first high-level languages they would recognize exactly what they do. They clearly encapsulate the details of performing certain computations. But what's more interesting about them is what's missing from them:

Polymorphism
Closures or first-class functions
Usage or creation of higher-order functions
Other Scala magic such as type classes and implicit parameters

In other words, they provide a simple, layered, hierarchical abstraction in a very bounded way. If we step away from software for a minute, you can imagine a digital designer placing a symbol for an adder or a register on a schematic without worrying about the arrangement of transistors that will be required to realize them in an actual circuit, or a mechanical engineer selecting a standard screw or clamp. These are parts that can be simply built, tested, and composed into larger devices.

Higher-Order Abstraction Take 1: The Mathematics of Fruit

Imagine I have some apples. But unfortunately I'm here, off in the internet, show I can't show them to you. I need an abstraction to tell you about them. For example, I could say I have two apples. Two is a number, and numbers are abstractions. I can use the same number two to describe the number of McIntoshes in my refrigerator or the number of Apple Macintoshes I own. Now let's say I also want to talk about my strawberries and blueberries. I have 16 strawberries and 100 blueberries. How many pieces of fruit do I have? 118! How did I figure that out? I used arithmetic, which is an abstraction of higher order than numbers. How let's say I want to know how many days it will be before I will have eaten all my strawberries. I can write an equation such as: current_fruit - pieces_per_day * number_of_days = fruit_after_number_of_days. I can do even more with this by solving for different variables. This is algebra, and it is a higher-order abstraction than arithmetic. Now let's say I want to build upon that so that I can study the dynamics of the amount of fruit in my refrigerator. I purchase fruit and put it in my refrigerator. I take fruit out and eat it. I forget about fruit and it gets moldy, the I remove it and throw it away. I can capture all of these as a system of differential equations, and using calculus describe all sorts of things about my fruit at any given point in time. Calculus is a higher-order abstraction that algebra. In fact, abstractions similar to the ones built with calculus are what I mean when I say "higher-order abstraction."

At each step up the chain of mathematics both the generality and number of concepts that can be conveyed by a single abstraction increased. In the case of calculus it becomes essentially infinite, and that's the essence of higher-order abstractions: they deal with the infinite or near-infinite. Also, observe that almost everyone from relatively small children on up understand numbers and arithmetic, most adolescents and adults can stumble through applying algebra, and only a very small portion of the population knows anything about calculus, much less can effectively apply it. Also observe that the functions I defined earlier are just below algebra in terms of their order of abstraction.

Higher-Order Abstraction Take 2: Programming in Scala

Let's see if we can sprinkle some higher-order abstraction into our previously defined functions:

scala> def sum(data: Traversable[Double]) = data.foldLeft(0.0)(_ + _)

This definition of sum exposes one higher-order abstraction (polymorphism - it now can use any Traversable[Double] instead of just an Array[Double]), and it uses higher-order functions in its implementation to perform the summation. This new definition is both much shorter and much more general, but it's still relatively approachable, especially for use. Calling it with an Array[Double] works as before, and now it can be used with any number of collections, so long as the collections contain doubles. But forcing the collections to contain doubles is very limiting, so let's see if we can do better:

scala> def sum[T](data: Traversable[T])(implicit n: Numeric[T]): T = {                                  
     |   import n._
     |   data.foldLeft(zero)(_ + _)
     | }

Ahhh, that's better! Much more general. Not only will this work for any numeric type defined in the Scala standard library, but for any numeric type for which the Numeric type class has been defined! It doesn't even need to be in the same class hierarchy! In order to introduce this new level of generality, we've also introduced the following higher-order abstractions:

Classic polymorphism (Traversable instead of Array)
Parametric polymorphism (the type parameter T for various classes)
Higher-order functions and closures (foldLeft and it's argument that does addition)
Type classes (well, Scala's flavor of type classes, e.g. Numeric)

Now, this is the point where people start screaming that abstraction has gone too far. Many professional programmers would look at it and think "WTF?" They could still guess what it does, but the mechanics are quite elusive for anyone that doesn't know Scala reasonably well. That being said, the code is still far more compact than the original imperative version and is extremely general (to which someone replies "Yes, but it's slow a heck compared to the original!"). At this point I would say the order of abstraction has went from being just below algebra to being on par with integral calculus. Just like with mathematics, we see a significant drop off in the number of people who readily understand it.

Higher-Order Abstraction Take 3: Conclusions

Let's consider a short, incomplete list of higher-order abstractions, means of abstraction, and fields that rely upon higher-order abstractions:

Higher-order functions
Polymorphism and parametric polymorphism
Metaclasses such as in Python and Smalltalk
Rich macros such as in Lisp or C++ Templates
Calculus and all the mathematics built upon it (and various other forms of advanced math)
First-order Logic and its kin, and all the higher-order logics
Any physics or other sciences that are built upon calculus
Any engineering or other applications that are built upon physics

Higher-order abstractions tend to exhibit one or more of the following traits:

They deal in infinities (e.g. integration and differentiation in calculus, universal and existential quantification in first-order logic, polymorphism and type classes in programming)
They deal in transformations (e.g. integration and differentiation in calculus, metaclasses and macros in programming)
They rely upon composition rules (e.g. chain rule and integration by parts in calculus, higher-order functions in programming)
Learning to use any given form of higher-order abstraction requires significant effort and discipline, therefore they tend to be the tools of specialists
They form the foundation of our modern, technological society.

The last two can be converted into conclusions:

Higher-order abstractions are critical for most interesting forms of science and technology
You can't expect a colleague or customer from another field to understand the higher-order abstractions in your field, even if his own field is totally immersed in higher-order abstraction

Sphere: Related Content

Erik Engbrecht's Blog

Sunday, January 16, 2011

Martin Odersky's Scala Levels

Learning Python

Learning Scala

Summary

Tuesday, July 27, 2010

Higher-Level versus Higher-Order Abstraction

What is an abstraction?

Higher-level Abstractions

Higher-Order Abstraction Take 1: The Mathematics of Fruit

Higher-Order Abstraction Take 2: Programming in Scala

Higher-Order Abstraction Take 3: Conclusions

Blog Archive

Labels

About Me