Tuesday, August 28, 2007

Scala Code for GPS

The link to the source code in the previous blog didn't work. I put it in the wrong directory. It's fixed now. You can get the code here: scala-gps.zip

Sphere: Related Content

Tuesday, August 21, 2007

Scala vs Lisp for GPS: Part I: Expressivity

Introduction Before I found Scala I was looking for a language with the following characteristics:

  1. Expressivity on par with Python or Lisp
  2. Performance on par with Java
  3. Has a REPL or at least an interpreter
  4. Compiled
  5. Object-oriented
  6. Large number of modern libraries and frameworks
  7. Cross platform, including think-client GUIs
  8. Open Source
Now I've found Scala and I think I have my language, but unfortunately it's hard to tell about the expressivity without doing a real project and doing a real project involves a measure of risk. Instead, I decided to port some Lisp programs from Norvig's Paradigms of Artificial Intelligance Programming to Scala and compare them to the original Lisp programs. As as second step, I'll "Scala-ize" each program to be more idiomatic Scala (mostly increasing object orientation) to see the result. Finally, I'll attempt to do some sort of performance comparison. Background - What is GPS GPS stands for "General Problem Solver." The algorithm here comes from Alan Newell and Herbet Simon way back in 1957. The original attempt was to create a computer program that could solve any problem. It fell quite short of its goal, but the algorithm is still interesting and its expression in Lisp or Scala quite concise. Those of you familiar with planning problems will recognize the basic structure and the sample problems. For more information, see Chapter 4 of PAIP. The Algorithm A problem is described using a set of initial conditions, a set of operators that can change those conditions, and a set of goal conditions. Each operator has a name, a list of preconditions (conditions that most be true for it to be invoked), an add list (conditions it will add to the state when invoked), and a delete list (conditions it will remove from the state with invoked). Note that these conditions are simple. They are not based on on propositional or predicate logic. The GPS uses a technique called means-ends analysis. A brief explanation is contained here about half way down the page. In a nutshell, the algorithm tries to achieve each goal one at a time, until it fails to a achieve a goal or it has achieved all the goals. The preconditions for each action are achieved by recusively invoking the algorithm on them. For a detailed description of the implementation of the algorithm, see PAIP or look at the comments in my Scala code. Comparison of Implementations There are some substantial differences between the Lisp implementation and the Scala implementation. They are:
  1. The Scala implementation is more explicitly structured. Some of the structure, such as the static typing, is forced by Scala. Others, such as using classes instead of conventions on list contents to distinguish between normal conditions and "executing conditions," are not forced by Scala but are simply easier and more natural.
  2. The resulting plan is a list of operators rather than a list of lists obeying a convention. This is straightforward because "executing conditions" are a special class of condition and are explicitly tied back to the operators they represent.
  3. The core algorithm in Scala is completely and explicitly devoid of mutable state. The Lisp solution makes only one small use of mutation in achieve-all, and that occurrence could easily be removed in the same was as it was from Scala. However, cons cells, and the lists they form, are mutable and therefore it is much more difficult to prove that the algorithm uses no mutable state.
  4. The Scala solution required fewer utility functions. This is due to both the richer standard library and case classes.
  5. The problem-definition DSL required significantly more code in Scala, especially in order to make it interpreter-friendly.
  6. The syntax of the problem-definition DSL in Scala is statically checked. For example, you cannot use preconditions in the place of the add list. Achieving this took a fair amount of work, and I almost just went with a looser version that was equally easy to type and read but less validated.
  7. In my subjective opinion, the Scala version is much easier to read. I find the fact that Lisp control structure like "if" implicitly return nil on failure if not "else" is provided to be quite confusing.
General observations:
  1. At least for this problem (which didn't use macros), Lisp code could be almost directly converted to Scala.
  2. Scala implicits make an effective substitute for special variables (i.e. defparameter) in Lisp.
  3. Overall LOC count is too close to matter.
  4. Lisp development is far more interactive that Scala.
  5. The Scala compiler catches a lot of "stupid" mistakes that the Lisp compiler does not (but the Lisp compiler catches some, much more than Python, Ruby, et. al.), reducing the number of edit/run cycles by a fair margin.
  6. Case classes are really useful. They allow you to add structure to your program without going to all the work of implementing all the support methods, such as equals, hashCode, toString, and unapply for pattern matching. Lisp is worse than Java et. al. because it has four different equals functions (eq, eql, equal, equalp) all with subtle (and not so subtle) differences. In Scala, if you are wise and don't have to deal with Java libraries (hah!), you pretend eq (reference equality) doesn't exist (indeed, pretending references don't exist, and null doesn't exist, seems like a good idea to me). In fact, I find case classes so useful, I think I'm going to write a Python metaclass so I can use them in Python, too.
  7. I retract previous negative statements about pattern matching on lists. It's useful. I get it now. But I still wouldn't convert a string to a list just for pattern matching.
Updated - Finished point 6. Conclusion I'm not going to conclude anything yet. So far, Scala seems to be as expressive as Lisp, and its added structure and compiler definitely reduce the number of errors made while coding. However, this comparison was on a relatively simple piece of code and the Scala implementation was mostly a port. This means that I can't say one way or another whether the added structure of Scala would be a help or a hindrance to the early stages of flushing out code. Based on my experience with Python, I think without case classes the added structure would definitely be a hindrance. Keeping all the basic functions (comparison, hash, toString) up-to-date on rapidly changing objects is a real pain.

Sphere: Related Content

Friday, August 17, 2007

Syntactic Sugar for Composition/Delegation

Composition/Delegation is a common pattern in object oriented software. There are a lot of different definitions floating around for both composition and delegation, but basically it amounts to this, as demonstrated in Java code:

class MyStringList implements List<String> { 
    private List<String> backingList;
    public MyStringList(List<String> backingList) {
        if (backingList == null) throw new NullPointerException();
        this.backingList = backingList;
    }
    public String get(int i) { return backingList.get(i); }
    public boolean isEmpty() { return backingList.isEmpty(); }
    // ... lots of other methods like the ones above
    public void add(String s) {
        if (s == null || s.length() == 0)
           throw new IllegalArgumentException();
        backingList.add(s);
    }
    // and some more slightly modified methods to complete List
}
So MyStringList is composed of backingList and it delegates most of its methods to it. Furthermore, in this case it is using backingList to implement the interface List. There are other ways to use composition and delegation, both together and separately, but this is the pattern with which I'm concerned. If I had full written out the code, the first thing you would notice is that there is a painfully large amount of boiler-plate code (the List interface requires 25 methods) to accomplish very little. If you know Java or a similar language, you would then probably think that it would be easier to use inheritance as follows:
class MyStringList extends ArrayList<String> {
    public MyStringList() {}
    public void add(String s) {
        if (s == null || s.length() == 0)
            throw new IllegalArgumentException();
        super.add(s);
    }
    // do something for the other add methods as required...
}
That's a lot less typing. It's also strongly coupled to ArrayList. Depending on what you're trying to do, that may or may not be a problem. For example, let's say you are developing a web application using Hibernate, and you want Hibernate to lazily fetch the contents of the list. The only way would be to extend Hibernate's list instead, thereby breaking any code you already wrote that assumed ArrayList and strongly coupling you to a specific ORM. The other problem is that Java only supports single inheritance, so you can only save typing for one interface. If you have several, you are out of luck. Trust me, these are not good things. So, with Java at least, you are stuck between a rock and a hardplace. You either get strong coupling or a lot of typing boilerplate code. If you don't believe me, ask Google. But problem is very solveable. On one hand there's mixins, but these introduce their own set of restrictions, oddities, and coupling. It would be better to add something simple that preserves the pattern in an easily understandable way while eliminating the boilerplate code. Delegation should be made part of the language. So instead of writing the initial piece of code, you would write:
class MyStringList implements List<String> { 
    private delegate List<String> backingList;
    public MyStringList(List<String< backingList) {
        if (backingList == null) throw new NullPointerException();
        this.backingList = backingList;
    }
    public void add(String s) {
        if (s == null || s.length() == 0)
           throw new IllegalArgumentException();
        backingList.add(s);
    }
    // and some more slightly modified methods to complete List
}
The delegate keyword tells the compiler that any methods from List that are not already implemented should be generated by delegating responsibility to the member backingList. This eliminates the boilerplate code without introducing excess coupling, reducing flexibility, or even being particularly confusing. Furthermore, this scheme could work in many languages. I'm not a compiler hacker, but concepts like this are always better with a demonstration. So I've thrown together some Python that enables this behavior. Code is here. It's far from production-quality. I'm sure people will find plenty of flaws. But it does demonstrate the concept (along with showing how easily extensible Python is.

Sphere: Related Content

Tuesday, August 14, 2007

Business Engagement and IT Project Failure

CIO.com recently ran an article by the CIO of GE Fanuc regarding "functional engagement" (i.e. "the business" or more commonly "the users") and project failure. Michael Krigsman later posted a more succinct summary on his ZDNet blog. Here's a even shorter version:

  1. Make sure a single business-side has significant capability, responsibility, and authority for project success.
  2. Don't short-circuit important parts of the project
  3. Make that person understands the "laws of IT" and defends them to his peers, subordinates, and superiors
#1 is just plain common sense. #3 amounts to establishing a scape goat for the inevitable failure caused by short-circuiting appropriate systems engineering or architectural activities. Notice that there is neither a definition of "failure" nor a definition of "success." I think it can be inferred that he's defining "failure" as "exceeding project or maintenance budget and/or schedule." Not once does he mention delivering real innovation to the business or even real value - just avoiding causing massive amounts of pain. Consider the following statement:
Enterprise platforms like Siebel, Oracle and SAP are not intended to be heavily customized. When going from a niche, custom application to Siebel, you need a strong functional leader to push back on every “Yeah, but” statement. They must start at zero and make the business justify every customization. Saying “no” to customization is bitter medicine for your business partners. They will make contorted faces and whine ad nauseum. But it is for their own good.
Let's think for a moment. Your customer (internal or external) wants to spend millions of dollars implementing a CRM (or ERP, PLM, etc.) package. Earlier, it went to the trouble of building one from scratch. That probably means it considers CRM to be an extremely important part of its business, and it expects to derive a competitive advantage from having a shiny new system. The best way to be competitive is to copy what everyone else is doing, right? Also, repeatedly receiving "no" as an answer will really make your customer want to take an active role in the implementation, right? Hmmm....somehow I don't think so. Introducing a strong impedance mismatch between the organization and the software it uses is failure. Standing in the way of innovation is failure. Mr. Durbin wants you to fail. There is actually a simple, albeit occasionally painful, solution to this problem: don't buy an off-the-shelf application that neither meets the business's requirements nor can be cost-effectively extended to meet them. First, you have to understand your business processes and requirements. I mean really understand them, not just understand the official process documentation that no one follows. All those winces and "yes buts" are requirements and process details that absolutely must be understood and addressed. Second, you do a trade study. This is a key part of systems engineering. Replacing enterprise systems is expensive, often prohibitively expensive, so do a thorough analysis. Take some of your key users to training sessions and write down every wince and vaguely answered question, because those are all either issues that will stand in the way to delivering value or will require customization. Finally, keep your pen in your pocket. Don't be tempted to write checks, buy licenses, and sign statements-of-work before your really understand the product. Just ignore those promises of discounts if you get a big order in before the end of the quarter. The next quarter isn't that far away, and the end of the fiscal year may even be approaching. The discounts will reappear then. Instead, make sure the vendor really understands the requirements and business process, and structure any contracts in a way that they must be met or the vendor will face significant penalties. It's amazing what comes out of the woodwork when you do this. All of a sudden that 3x cost overrun from the original projection is sitting right there in front of your eyes in the form of a quote, all before you've sunk any real money into the project. The purpose of engaging "the business" in an IT project is not to create a personal shield for all the deficiencies in the system you are deploying. It is to make sure you identify those deficiencies early enough in the project cycle to avoid cost and schedule overruns.

Sphere: Related Content

Friday, August 03, 2007

Minimal Software Development Process

It's not uncommon for me to find myself in debates regarding what constitutes a sufficient software development process. On one side, there are the Agile folks who argue that code-and-fix, with user representatives determining what should be fixed. Of course they have lots of rules to regulate the code-and-fix, some of which are quite sensible and some of which are quite dubious, to make it appeal to closet process engineers. On the other side you have old-school developers who believe in such rarities as requirements signed in blood and requiring human sacrifice to alter. Ok, so perhaps I'm exaggerating a little bit. But that's how it often feels. So I'm going to propose my own process that is every bit as simple as what the Agile crowd pushes while being more all-encompassing than traditionalists. Here it is:

  1. Define the problem
  2. Define the solution
  3. Apply the solution
  4. Validate the result
Now you are probably thinking that this is nothing novel. I'm just using weird words for:
  1. Develop requirements
  2. Design and Code to the Requirements
  3. uhh....Test?
  4. Customer Acceptance Testing
Wrong! Even if you were more creative than me, or pulled out RUP's phases or anything like that. But that's expected, as I rarely see steps 1 and 4 completed. Let me explain. Define the Problem This is where most projects get in trouble. Their problem definitions looks kind of like this:
  • We need to rollout a corporate standard ERP system.
  • We need to web-based foo tracking database accessible to the entire corporation.
  • We need an automated workflow for the bar process.
I could go on-and-on. Sometimes these are followed by phrases like "will save X million dollars" or "will reduce cycle time by X%." Really the problem is someone said that a competitor was saving money by applying an automated workflow to the bar process in order to track foos in the ERP system with a web-based frontend, the company at hand doesn't have one, so that's a problem. Anyway, often times these statements are really masking genuine problems such as:
  • My old college roommate is an ERP salesmen and wants a new boat
  • We have a ton of foo, but no one really knows where it is or if it's being used. So we keep on buying more foo, even though we probably have enough. The problem is when we find so foo, the people with the foo always claim that they need all of it, even though often times they clearly aren't using it. We have some data, but it's massive and impossible to interpret. We need a way to find unused foo and prove that it is unused so that we can transfer it where it is needed.
  • Some cowboys in department X keep on ignoring the bar process. They think they are heros because it saves time and money upfront, but really they just end up creating costly, time consuming problems down the line (for me). I need a way for force them to follow the bar process, but it can't be too hard otherwise they'll convince upper management to let them ignore it.
So why is the difference important? Don't you just end up with the first set of statements as your project charter, anyway? No. A couple months ago I faced a situation similar to the second item. A consultant had (unfortunetly) convinced a couple senior managers that they wanted a big, fancy database integrated to half of our internal systems and requiring a whole bunch of data maintenance. Fortunately the consultant also directed them to me. They had tons of data, and had spent countless hours fiddling with spreadsheets trying to turn it into actionable information. Having failed, they decided they needed a huge database that would cost hundreds of thousands of dollars to develop and then require staff to keep up-to-date. They also needed something in about a couple weeks. So I poked a proded until I finally understood what they needed to know, what data they had, and what they needed to decide based on that data. Then I wrote a few hundred lines a Python to analyze the data and make pretty graphs, along with a couple dozen lines of VBA to stick the outputs of the Python program into a PowerPoint presentation. They were thrilled with the result. Hundreds of thousands of datapoints were transformed into actionable charts that even the most impatient executive could correctly interpret. This took me about 2 weeks effort. Their original requirements would have taken a couple man years effort to implement, and the result would not have solved their problem. Traditionalists would have wasted the time to implement the requirements (which were actually fairly well developed), or at least a portion of them. Agilists would have fiddled around for a while and achieved the same result. Now I'll admit that on the majority of projects it's the other way around. Understanding the problem makes that cost of the solution grow by an order-of-magnitude, rather than shrink. My guess is only 1 in 4 can actually be simplified by understanding the problem, and 2 in 4 become significantly more complex. But solid estimates that can be tied to solid business cases are extremely important. Delivering a cheap solution that doesn't deliver value is a waste of money. In my experience development teams assume that "the customer" or "the business" already understand their problem and are defining requirements that will solve it. When in reality, the problem is usually vaguely understood at best, and the description of requirements is every bit a design activity as coding is. Define the Solution This is where most of the traditional software engineering activities occur. You have a customer (or marketing) who has given you some high level requirements defining the general shape of the system, and then you go about gathering more detailed requirements, followed by architecture, design, code, and test. Or maybe you are agile so you do all of those activities at once and only bother writing down code (in the form of application code and test code). Either way, knowing the problem really helps. Some people probably object to lumping all of these activities under one heading because they take so much time. I would agree, but they rarely done entirely sequentially. Prototyping is every bit as valid of a method for eliciting requirements as interviews. Sometimes it is a lot more effective. Also, there are strong feedback loops among all of the elements. So really, they are all done pretty much at the same time. It just happens that requirements kick off the process and testing finishes it up. Others would object because "the completed system is the solution." Well, no. It's not. You don't really know if you have a solution until after you've deployed and run the system long enough for the business to adjust itself. Apply the Solution This is just another way to saying "deploy," plus all the other things you have to do like training. If you think of it at a really high level (too high for effective engineering), the organization is the problem, and you apply the software to the organization to see if the organization gets better. Validate the Result This is where you measure the impact of deploying the software on the problem so see if indeed the problem has been solved. I don't think anyone will disagree that a piece of software can meet all of its requirements and fail to make a positive impact. So you need to measure the impact of the deployed system. In practice, success is declared as soon as a piece of software that meets "enough" of its requirements is rolled out. This puts the organization in a very bad position, because if the software subsequently fails to deliver value, then the declaration of success and those who made it are called into question. In most cases the organization will just end up either ignoring the system or limping along until enough time has passed to call the system a problem.

Sphere: Related Content