Tuesday, July 27, 2010

Higher-Level versus Higher-Order Abstraction

Engineering in general, and software engineering in particular, is all about abstraction. The creation and utilization of abstractions is a core part of the daily activities of every good engineer, and many engineers are on a never ending quest to increase their level of abstraction. Abstraction both increases productivity and increases the complexity of problems that can be tackled. Few would argue that increased abstraction is a bad thing.

But people do argue about abstraction, and often condemn abstractions that they view as unfit or too complex. It is common to hear a software developer praise the merits of one means of abstraction and then demean another in the same breath. Some abstractions are too general. Some aren't general enough. Some are simply too complex or, ironically, too abstract. So while engineers almost universally agree that abstraction is good, and more abstraction is better; they often disagree fiercely about what genuinely constitutes "better."

What is an abstraction?

Simply put, an abstraction is something that hides the details involved in realizing a concept, and thus frees the engineer to focus on the problem at hand rather than the details of some other concern. This can take many, many forms. The simplest and most common, so common that the term is often used to describe almost all "good" abstractions, is higher-level abstractions.

Higher-level Abstractions

Higher-level abstractions are fairly simple: they encapsulate details so that they can be used without knowledge or concern about the details. Let's consider a simple example in Scala 2.8 (you can copy and paste these examples directly into the Scala 2.8 REPL):

scala> def sum(data: Array[Double]): Double = {
     |   var i = 0
     |   var total = 0.0 
     |   while(i < data.length) {
     |     total = total + data(i)
     |     i = i + 1
     |   }
     |   total
     | }
scala> def mean(data: Array[Double]): Double = sum(data) / data.length

So we have two functions: one that computes the sum of an array of doubles, and one that computes the mean. Pretty much any programmer (professional or otherwise) would feel confident both writing and using such functions. They are written in a modern multi-paradigm programming language yet I bet if you went back in time and showed them to programmers in some of the very first high-level languages they would recognize exactly what they do. They clearly encapsulate the details of performing certain computations. But what's more interesting about them is what's missing from them:

  1. Polymorphism
  2. Closures or first-class functions
  3. Usage or creation of higher-order functions
  4. Other Scala magic such as type classes and implicit parameters

In other words, they provide a simple, layered, hierarchical abstraction in a very bounded way. If we step away from software for a minute, you can imagine a digital designer placing a symbol for an adder or a register on a schematic without worrying about the arrangement of transistors that will be required to realize them in an actual circuit, or a mechanical engineer selecting a standard screw or clamp. These are parts that can be simply built, tested, and composed into larger devices.

Higher-Order Abstraction Take 1: The Mathematics of Fruit

Imagine I have some apples. But unfortunately I'm here, off in the internet, show I can't show them to you. I need an abstraction to tell you about them. For example, I could say I have two apples. Two is a number, and numbers are abstractions. I can use the same number two to describe the number of McIntoshes in my refrigerator or the number of Apple Macintoshes I own. Now let's say I also want to talk about my strawberries and blueberries. I have 16 strawberries and 100 blueberries. How many pieces of fruit do I have? 118! How did I figure that out? I used arithmetic, which is an abstraction of higher order than numbers. How let's say I want to know how many days it will be before I will have eaten all my strawberries. I can write an equation such as: current_fruit - pieces_per_day * number_of_days = fruit_after_number_of_days. I can do even more with this by solving for different variables. This is algebra, and it is a higher-order abstraction than arithmetic. Now let's say I want to build upon that so that I can study the dynamics of the amount of fruit in my refrigerator. I purchase fruit and put it in my refrigerator. I take fruit out and eat it. I forget about fruit and it gets moldy, the I remove it and throw it away. I can capture all of these as a system of differential equations, and using calculus describe all sorts of things about my fruit at any given point in time. Calculus is a higher-order abstraction that algebra. In fact, abstractions similar to the ones built with calculus are what I mean when I say "higher-order abstraction."

At each step up the chain of mathematics both the generality and number of concepts that can be conveyed by a single abstraction increased. In the case of calculus it becomes essentially infinite, and that's the essence of higher-order abstractions: they deal with the infinite or near-infinite. Also, observe that almost everyone from relatively small children on up understand numbers and arithmetic, most adolescents and adults can stumble through applying algebra, and only a very small portion of the population knows anything about calculus, much less can effectively apply it. Also observe that the functions I defined earlier are just below algebra in terms of their order of abstraction.

Higher-Order Abstraction Take 2: Programming in Scala

Let's see if we can sprinkle some higher-order abstraction into our previously defined functions:

scala> def sum(data: Traversable[Double]) = data.foldLeft(0.0)(_ + _)

This definition of sum exposes one higher-order abstraction (polymorphism - it now can use any Traversable[Double] instead of just an Array[Double]), and it uses higher-order functions in its implementation to perform the summation. This new definition is both much shorter and much more general, but it's still relatively approachable, especially for use. Calling it with an Array[Double] works as before, and now it can be used with any number of collections, so long as the collections contain doubles. But forcing the collections to contain doubles is very limiting, so let's see if we can do better:

scala> def sum[T](data: Traversable[T])(implicit n: Numeric[T]): T = {                                  
     |   import n._
     |   data.foldLeft(zero)(_ + _)
     | }

Ahhh, that's better! Much more general. Not only will this work for any numeric type defined in the Scala standard library, but for any numeric type for which the Numeric type class has been defined! It doesn't even need to be in the same class hierarchy! In order to introduce this new level of generality, we've also introduced the following higher-order abstractions:

  1. Classic polymorphism (Traversable instead of Array)
  2. Parametric polymorphism (the type parameter T for various classes)
  3. Higher-order functions and closures (foldLeft and it's argument that does addition)
  4. Type classes (well, Scala's flavor of type classes, e.g. Numeric)

Now, this is the point where people start screaming that abstraction has gone too far. Many professional programmers would look at it and think "WTF?" They could still guess what it does, but the mechanics are quite elusive for anyone that doesn't know Scala reasonably well. That being said, the code is still far more compact than the original imperative version and is extremely general (to which someone replies "Yes, but it's slow a heck compared to the original!"). At this point I would say the order of abstraction has went from being just below algebra to being on par with integral calculus. Just like with mathematics, we see a significant drop off in the number of people who readily understand it.

Higher-Order Abstraction Take 3: Conclusions

Let's consider a short, incomplete list of higher-order abstractions, means of abstraction, and fields that rely upon higher-order abstractions:

  1. Higher-order functions
  2. Polymorphism and parametric polymorphism
  3. Metaclasses such as in Python and Smalltalk
  4. Rich macros such as in Lisp or C++ Templates
  5. Calculus and all the mathematics built upon it (and various other forms of advanced math)
  6. First-order Logic and its kin, and all the higher-order logics
  7. Any physics or other sciences that are built upon calculus
  8. Any engineering or other applications that are built upon physics

Higher-order abstractions tend to exhibit one or more of the following traits:

  1. They deal in infinities (e.g. integration and differentiation in calculus, universal and existential quantification in first-order logic, polymorphism and type classes in programming)
  2. They deal in transformations (e.g. integration and differentiation in calculus, metaclasses and macros in programming)
  3. They rely upon composition rules (e.g. chain rule and integration by parts in calculus, higher-order functions in programming)
  4. Learning to use any given form of higher-order abstraction requires significant effort and discipline, therefore they tend to be the tools of specialists
  5. They form the foundation of our modern, technological society.

The last two can be converted into conclusions:

  1. Higher-order abstractions are critical for most interesting forms of science and technology
  2. You can't expect a colleague or customer from another field to understand the higher-order abstractions in your field, even if his own field is totally immersed in higher-order abstraction

Sphere: Related Content

3 comments:

yonkeltron said...

This post was an enjoyable and interesting read for me. One of the things I miss most about the Lisp world is also one of the hardest! Though macros are quite flexible, they are also very tough to get right for many reasons, not the least of which is the problem of abstraction.

Also, this gives me an excuse to go look at the new collections in Scala 2.8, so thanks for the nudge.

Channing Walton said...

Nice post, thanks.

What I find lamentable about many developers is that when the abstraction moves out of their grasp they denigrate it as academic and useless.

We do not see this in other engineering disciplines. Instead, engineers respect those that are able to use more powerful abstractions and admire the work they do.

Software devs needs to grow up and stop behaving like excited children.

Order Form said...

Great post! is abstraction concept used even in JAVA? I need to join this course in a month..