Practical OCaml: larger example

This "Larger Example" I’m going to write about is in chapter 18, starting on page 243. The chapter is (supposed to be) about the object oriented features of OCaml (they would be the "O" part) but this example is more objectionable than objective. It was intended to be an implementation of the Levenshtein distance algorithm: it started with an incomplete description of that algorithm and a figure of a matrix with four mistakes. Then it got worse.

Smith’s plan was to create a base class containing the algorithm and then use subclasses to adapt the algorithm to different data types. But when I looked at the edit_distance base class I noticed that half of the core algorithm was missing! I found it in the string_edit_distance subclass, and also in the list_edit_distance subclass. Martin Fowler in his great "Refactoring" book says: Number one in the stink parade is duplicated code. If you see the same code structure in more than one place, you can be sure that your program will be better if you find a way to unify them. Following Fowler’s good advice, which Smith should have done, I moved the duplicated code to the base class. There was only a small bit of code that was different for each subclass: the part that compares the elements of the particular data structures (strings and lists in this example) to work out if a changing cost is required. I created a cost_to_change method to handle that.

Then I tried to simplify the code and make it more readable.

In the update_matrix method I removed three of the let bindings and passed the calculation results directly the trimin function I mentioned before.

In the gen_matrix method I replaced the pattern matching and nested iteration functions with two separate iteration functions, which hopefully makes it clear that we are only initialising the first row and first column; most of the matrix is left untouched.

In the calc method, which I’d moved from the subclass to the base class, I again replaced the pattern matching and nested iteration functions, this time with nested for loops. Pattern matching and higher-order iteration functions are very powerful, and they could be used instead of loops and conditional statements in all ocaml code; but ocaml includes for loops and if statements because they are easier to use and understand than the more powerful features. Choose the right tool for the job.

As a result of all these changes the size of a subclass has been reduced from 18 lines to 6, only 2 of which need to be changed for each data type (the other 4 are structure). You can have a look at some of Smith’s code and my modified version to see the differences for yourself.

Practical OCaml: not again

I was about to comment on the "Larger Example" in chapter 18, starting on page 243; but first I have to get rid of the trimin method it contains:

    method private trimin x y z = match x,y,z with
        m,n,o when (m > n) -> if (n < o) then n else o
      | m,n,o when (m < n) -> if (m < o) then m else o
      | m,n,o -> if (n < o) then n else o

Let me now repeat exactly what I said in my previous post: Huh?! This method will return the smallest of its three arguments, although its ugliness made me doubt that for a while. But even if it were improved, it shouldn't even be part of this class at all. So I removed that method and replaced it with a simple function:

let trimin x y z = min x (min y z);;

That done, I was then able to continue with the "larger example".

Practical OCaml: minimum review

I have been reading Practical OCaml. I didn’t buy the book; I recommend that you don’t buy it too. I was given a copy and asked if I could say something good about it, so here goes: “it could be worse”.

There are so many other negative reviews of this book I don’t want to just repeat what has already been said. Instead I’m going to criticise some parts of it in more detail in the hope that it may help. I’m going to choose targets on a whim, with not much rhyme or reason; that should mirror the structure of the book quite well.

So, to start small, my first complaint is against page 172. Smith (the book’s author) writes:

let min x y = match x with
    n when x < y -> x
  | _ -> y;;

Huh?!

That can’t possibly be a sane way to define a function that returns the smaller of two arguments. In addition to the needless complexity we have the unused named value n. What was wrong with:

let min x y = if x < y then x else y;;

That should be clear enough. But I now have another question: why even bother? The min function is built-in to OCaml; it is part of the Pervasives module that provides “<” and the other comparison operators.