On Scala

This is one of a series of posts on languages, you can read more about that here.

I've been messing around with Scala for a while. For no particular reason I let it slide but have come back to it with increased interest in the last few years. Scala's blending of programming paradigms in a single language is impressive. It has the most powerful type system of the 'mainstream' JVM languages [1]. It's fun to write code in and is a good language for leaning into the functional plus static typing paradigm. As much as I like the language, I usually qualify recommending it outright, due to some noticeable externalities and the very need to adopt new paradigms. And so, Scala is the language I find myself most conflicted over - the part of me that likes programming loves it, the part of me that goes oncall wants to see more work done on the engineering. 

What's to like? 

Herein a drive by of some parts of the language that I like and have found useful. For a broader overview a good place to start is scala-lang.org's Learning Scala page material, which also has an excellent overview of features.

She's gone rogue, captain! Have to take her out! 

Scala has a repl. Every language should have a repl -

dehora:~[macbook]$ scala
Welcome to Scala version 2.9.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_09).
Type in expressions to have them evaluated.
Type :help for more information.
scala> println("hello, world!")
hello, world!

The repl enables method discovery (hit tab after the dot operator).

Like Go (and Groovy) Scala makes semi-colons optional. Also like Go, declaration order is reversed compared to C/C++/Java. The type comes after the variable/method name allowing you to say "a of type int", "foo returning int". You can define something as a val (cannot be modified), a var (can be modified) or a def (a method/function) - val/var declarations in classes result in getters and setters being implicitly created (the private[this] variant can be used to force direct field access). Using vals is preferred for thread safety and comprehension reasons. Scala supports type inference, which removes a lot of boilerplate relative to Java. Strings in Scala are immutable and can be multilined using triple quotes and support interpolation using a $ notation (as of 2.10). Scala strings can be enhanced with StringOps, which provides a raft of extra methods compared to java.lang.String.

In Scala each value is an object. The root class is Any. Scala doesn't have predefined primitives as per C++ and Java; these are modelled as a sub-class of Any called AnyVal, everything else is a sub-class called AnyRef, which maps to Java's Object class.  Because numbers are objects their arithmetic operations are methods - 1 + 2 is the same as 1+(2) (but not 1.+(2) which will give a you a double instead of an int ;). As a result, the keywords 'true' and 'false' are objects - true.equals(false) is a legal statement. Technically because we're dealing with methods this is not operator overloading, but the term is often used. Since Scala allows symbolic methods for binary/unary operations, you are free to define something like def +(that:Thing):Thing for your class.  You are also free (with some restrictions) to define methods with unicode and ascii symbols -

def ||(dowop:ThatThing):ThatThing = ...
def ψ(dowop:ThatThing):ThatThing = ...

And so it goes - for example, list operators such as '::' (append) and ':::' (prepend) are just methods. Scala can be understood as strongly object oriented when compared to Python or Java and notably gets two things right in its approach - don't have primitives, and, preserve precedence rules. 

Classes and construction in Scala are noticeably different to Java, but also better in more or less every respect.  Classes in Scala have a different syntax to Java, thanks to the constructor signature being part of the class definition -

class Song(songName: String, songLength:Double, songArtist:String) {
  private val name = songName 
  var length = songLength 
  val artist = songArtist 
}

Constructor arguments can be assigned, and are also visible through the class. A var field can be re-assigned, vals cannot, protected fields are available to sub classes. Private members are not accessible outside the class with the exception of a companion object (more on those soon). Default and named arguments are available on constructors and methods. 

Scala also has objects, which are singleton instances rather than a template declaration; they are also where main() methods are declared. Objects with the same name as a classes and placed the same file are called companion objects. It's common enough to see such object and class pairs in Scala. Companions can access each other's private methods and fields. Companion objects enable the use of the apply() method, which pretty much eliminate the factory and constructor chaining gymnastics seen in Java. An apply call is dispatched to via construction - roughly, new Song(name) goes to Song.apply(name). There's also unapply(), which enable extractor objects that work with pattern matching - 'case Song(f)' will dispatch to Song.unapply(f), which will return whether the argument is matched. Companion objects are a clean way to package up declaration and creation concerns.

Scala's case classes remove the need to declare fields, accessors, copy contructors, equals/hasCode pairs. Case classes are about as concise as Groovy POGOs, with the advantages of not maintaining type safety and providing private scope for fields  - they're a huge win compared to JavaBeans or POJOs.

case class Song(name: String, length:Double, artist:String)
val s1 = Song("Align", 3.56, "Mowgli") 
s1.name
res72: String = Align

val s2 = s1.copy(length=3.57)
s2.length
res73: Double = 3.57

s2.equals(s1)
res75: Boolean = false

val s3 = s1.copy()
s3: Song = Song(Align,3.56,Mowgli)
s3 == s1
res77: Boolean = true

s3.hashCode
res78: Int = -882114443

Case classes integrate well with pattern matching (more below) and can be treated as functions (note that s1 above didn't require a 'new').

Scala traits allow multiple inheritance of implementations using a well defined ordering. Traits also support default implementations unlike C++ virtual voids or Java interfaces, and which are similiar in style to static methods in Java (strictly speaking Scala doesn't have statics). In conjunction with the sealed modifier you can define a 'sealed trait' and case class subtypes as a neat way to declare abstract data types such as trees and grammars, and which can be dispatched over using pattern matching. While I would be happy to see inheritance removed from the language, Scala traits are a practical way to avoid the diamond problem

At least it's an ethos

Scala's Option allows you to wrap a value as being either Some or None. An Option[T] is a container for a value of type T. If the value is not null it's a Some, otherwise it's a None. This pushes null handling into the type system making it a compile time concern - 

val a = Some("a")
a: Option[String] = Some(a)
 
val nope = None 
nope: Option[String] = None

Using Option construction delegates None/Some selection, giving you None for a null or a Some for a non-null -

 val whoKnows = Option(null)
 whoKnows: Option[String] = None

 val mightCouldBe = Option("aliens looking fer gold")
 mightCouldBe: Option[String] = Some(aliens looking fer gold)

 println(whoKnows.getOrElse("Nope"))
 Nope

Options are one of the things I like most in Scala. For many of us, null is simply reality, and it can be hard to imagine life without it, but if you can appreciate why returning an empty list is better than returning a null, or why null shouldn't ever be mapped to the domain, I think you'll come to like Options. Options are handy for expressing the three valued logic (there, not there, empty) that is common when dealing with maps and caches. Code working with data stores may gain robustiness benefits using Some/None in conjunction with pattern matching or higher functions. Options can be treated as collections, so foreach is available against them and collection functions like filter deal with them. Subjectively I prefer Scala's Option to Groovy's elvis and Clojure's nil largely because you can drive Option into API signatures. Option is also a gateway concept for monads.

Scala pattern matching allows you dispatch against a structure or the type of objects instead of reasoning about an instance. Pattern matching in Scala  is powerful and shouldn't be compared to case/switch blocks in Java/C. You can match a populated list using a 'head :: tail' pattern; match against regexes; match against or an XPath-like option if you are working with XML, and so on. Pattern matching also helps with expressing business logic, such as doing something with an Account when it has a certain age or is from a certain location. Case classes are designed with pattern matching in mind (if you are familiar with and like Linda tuplespace style matching, chances are you'll like pattern matching over case classes).  You can get compiler warnings on non-exhaustive matches in conjunction with the sealed modifier, helping to address problems with switch block fall-through and visitor pattern code (incidently, Yaron Minksy has a great talk that touches on this subject in the context of OCaml). I'm not entirely sure, but as far as I can tell pattern matching eliminates the visitor pattern and so the need for replace conditional with visitor/polymorphism workarounds. That's nice because that helps decouple structure from processing, avoiding some of the gnarlier maintenance issues with orthodox OO.

If pattern matching is a bit low level for handling input, you can try parser combinators. You define an object (or a set of classes) that is the 'parser combinator' (Scala provides some helpers you can sub-class), and each method in the object becomes a parser that returns Success, Failure or Error. You then call parseAll with the root parser and the input. Here's the stupidest example I could think of -

import scala.util.parsing.combinator.RegexParsers

object EpicNameParser extends RegexParsers {
  val name: Parser[String] = """[a-zA-Z]*""".r
 
// bind object to a handy val
val enp = EpicNameParser
enp.parseAll(enp.name, "Donny")
res26: enp.ParseResult[String] = [1.6] parsed: Donny

enp.parseAll(enp.name, "Walter")
res27: enp.ParseResult[String] = [1.7] parsed: Walter

enp.parseAll(enp.name, "The Dude")
res28: enp.ParseResult[String] = 
[1.5] failure: string matching regex `\z' expected but `D' found
The Dude
    ^

This probably won't be the fastest way to do things performance-wise, but it's concise. Pattern matching opens up possibilties for dealing with conditionals and clean code; it's also the basis for actor receive loops.

Strange enlightenments are vouchsafed to those who seek the higher places

Scala has a rich set of collections. As well as Lists, Vectors, Sets, Maps and the generic Sequence (Seq), Scala has a Tuple that can take mixed types. The classes Pair and Triple are aliases for arity 2 and arity 3 tuples. Scala has both mutable and immutable collections that are collection analog to var and val fields.  The community idiom seems to be to prefer immutable collections until you need to drop into mutability or j.u.c for performance reasons. Arrays in Scala map to Java arrays, but can be generic, and can also be treated as a Sequence.

As you might expect from a functional language, working with lists, maps and collections is nice. Scala's for loops (for comprehensions) work like Python's list comprehensions but have a syntax that allows more expressiveness. I like forcomps, but it's possible you'll move towards map/flatmap/foreach/filter methods that are available on collections as idiomatic for simpler loops (for is implemented using these methods). The foreach method accepts closures (similar to each/do in Ruby), and also functions.  You can apply functions over members using map(), eg 'Seq(1, 2).map({ n=> "num:"+n })'. The flatMap method is similar but instead, accepts a function that returns a sequence for each element it finds, and then flattens all those sequences' members into a single list. This interacts nicely with Options, since an Option is a sequence -

def stoi(s: String): Option[Int] = {
  try {
    Some(Integer.parseInt(s.trim))
  } catch {
    case _  => None
  }
}

Once you have a function you can apply it to collection using map(), using  the syntax '{ element => function(n)}' -

Seq("1", "2", "foo").map({n => stoi(n)})
res34: Seq[Option[Int]] = List(Some(1), Some(2), None) 

As we can see, map() produces 'List(Some(1), Some(2), None)', ie, a list of lists. The flatMap() method on the other hand collapses the inner lists and removes the members that are None (since None's length is 0) leaving just the Some members (since Some's length is 1) -

Seq("1", "2", "foo").flatMap({n => stoi(n)})
res37: Seq[Int] = List(1, 2)

Collections also provide a filter() method that chains nicely with map() - easy composition of functions is a feature of Scala. For gathering up results you get fold and reduce (foldLeft and foldRight, reduceLeft, reduceRight). For example you can use foldLeft to trap the largest int in a list or sum a list -

List(5, 0, 9, 6).foldLeft(-1)({ (i, j) => i.max(j) })
res24: Int = 9
List(5, 0, 9, 6).foldLeft(0)({ (i, j) => i + j })
res48: Int = 20

Although reduceLeft might be a more direct way to go about both -

List(5, 0, 9, 6).reduceLeft({ (i, j) => i.max(j) })
res38: Int = 9
List(5, 0, 9, 6).reduceLeft({ (i, j) => i + j })
res49: Int = 20

Reduces are something of a special case of folds. Folds can gather up elements and may return a different type whereas reduces return a single value of the list type. For example, you can't say this - 

List(5, 0, 9, 6).reduceLeft({ (i, j) => "" +  (i + j) })

but you can say this - 

List(5, 0, 9, 6).foldLeft("")({ (i, j) => "" +  (i + j) })
longList.foldLeft("")({ (i, j) => "" +  (i + j) })
longList.foldRight("")({ (i, j) => "" +  (i + j) })
res51: java.lang.String = 5096

Methods like fold/reduce generally handle nulls without blowing up - 

val x:String = null;
List("a", x).reduceLeft({ (i, j) => "" +  (i + j) })
res60: String = anull

Monads are coming

Functional programming is unavoidable in Scala. Scala is a functional language in the sense that you can program with functions and not just class methods, and functions are treated like values and so can be assigned and passed around as arguments. This is where Scala stops being a better Java, in the sense Groovy is, and becomes its own language.  Functions in Scala can be declared using val - 

scala> val timestwo = (i:Int) => i * 2

and can take variables from scope -

scala> val ten = 10
ten: Int = 10
scala> val mul = (i:Int) => i * ten
scala> mul(3)
res2: Int = 30

You can also create functions that accept other functions and/or return functions (aka higher order functions) -

def theSquareMullet(i:Int) = i * i
def theHoffSquareMullet({ mullet: => Int }) = mullet * mullet

The expression 'mullet: => Int' represents any function (or value) that returns an Int. All of these are valid -

scala> theSquareMullet(10)
res7: Int = 100

scala> theSquareMullet(theHoffSquareMullet(10))
res8: Int = 100

scala> theHoffSquareMullet(theHoffSquareMullet(10))
res9: Int = 10000

Functions can be anonymous, you can see some examples in the section on collections above.  Functions can be nested and multiple values can be returned from a function/method -  

def theSquareAndTheMullet(i:Int) = (i * i, i)
val (f, f1) = theSquareAndTheMullet(2)
f: Int = 4
f1: Int = 2

val bolth = theSquareAndTheMullet(2)
bolth: (Int, Int) = (4,2)

scala> bolth._1
res9: Int = 4

bolth._2
res10: Int = 2

The slight downside being the syntax for unpacking a discrete val isn't that pleasing (and are default limited to size 22 if you're going to return the world). On the upside accessing a tuple out of bounds is a compile time error and a tuple preserve each member's type.

Endofunctor's Game

The type system is powerful. Like functional progrmming support, the type system marks Scala its own language instead of a better Java. Scala has generics, like Java, and supports type variance, +T (covariant), -T (contravariant), :> (sub type, or Java 'extends'), <: (super type, or Java 'super'). Classes can be parameterised like Java, but unlike Java so can types via what's called a type alias. These allow you genericise members independently of the class signature -

abstract class Holder {
  type T // alias
  val held: T


def IntHolder(i: Int): Holder = {
    new Holder {
         type T = Int
         val held = i
       }
}

One nice feature is the compound type, which allows you to declare for example, that a method argument is composed of multiple types - 

def allTheThings(thing:ThatThing with ThatOtherThing):ThatThing = {...}

I can't do justice to Scala's type system in a single post. I doubt I know enough to appreciate what it can do - diving into it is very much a red pill thing. What I will say is the type system is Scala is deeply impressive in what can be achieved. The type system allows things that are either impractical or impossible in Java  and gives good ways to scrap your boilerplate.  

What's not to like

Onto some things that bother me about Scala.

Here are eighteen ways to say, roughly, add one to each member of a list creating a new list, remove any members in the new list that aren't greater than one creating another new list, and show that final list as a string -

 List(0, 1) map { _ + 1 } filter { _ > 1 } toString
 List(0, 1) map { _ + 1 } filter { _ > 1 } toString()   
 List(0, 1) map ( _ + 1) filter (_ > 1) toString()
 List(0, 1) map ( _ + 1) filter (_ > 1) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString 
 List(0, 1) map ({ _ + 1 }) filter ({ _ > 1 }) toString()  
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString() 
 List(0, 1) map ({ n => n + 1 }) filter ({n => n > 1 }) toString 
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString()
 List(0, 1) map (n => n + 1) filter (n => n > 1) toString 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString() 
 List(0, 1).map( _ + 1 ).filter( _ > 1 ).toString 
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString()   
 List(0, 1).map({ _ + 1 }).filter({ _ > 1 }).toString 
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString()
 List(0, 1).map({ n => n + 1 }).filter({n => n > 1 }).toString  
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString()
 List(0, 1).map(n => n + 1).filter(n => n > 1).toString  

Scala provides syntactic flexibility. This flexibility is not limited to methods. Two ways to express for-comprehensions; three collections libraries (four if you count Java's); methods foo and foo() are not the same; XML assignment demands whitespace around '='; careful placement of suffix invocations such that they are not treated as infix.  Closure captures can be tricky. In Scala '_' means everything and anything - there's no real way to avoid it in practice (imports, existential types, exceptions and closures).  In particular, implicit conversions strike me as tricky to reason about. Scala generics have other bounds, 'view' and 'context' that allow you work with types beyond traditional hierarchies, where implicit conversions are made available. 

Symbolic methods on top of the features above provide the base capability for DSLs. DSL support in a language is not an obvious win to my mind.  Even a relatively restrained library like Gatling offers enough variation to catch you out and a symbol heavy project like SBT can be hard work. Scala's not unique in this - I have a similar beef with Ruby and Groovy.

Maintainable, comprehensible Scala requires restraint, in the form of disciplined practice and style guides. That you don't have to use all the features isn't sufficient to wave the problem away. If you are maintaining code, you don't get to ignore all the variation teams have added over time (I do not see the concept of Scala levels helping much). Dependency on open source amplifies the matter. Scala has a simple core with orthogonal concepts (the language spec is pretty readable as these things go). However that says nothing about the interaction of features, which is where complexity and comprehension resides. I'm not trying to conflate understanding of functional and type-centric constructs with complexity - simple things can be hard, but that doesn't make them complex.  

As a systems grunt, I spend a lot of time with running code, sifting through stacktraces, heapdumps, bytecode, metrics, logs and other runtime detritus. You learn from bitter experience that correctness capabilities provided by a type system do not always trump having comprehensible runtime behaviour (the same can be said for the outcomes enabled by dynamic languages). Scala generates plenty of classes; as a result Scala stacktraces can be a bit of a bear, although better than Groovy or Clojure's. An artifact of Scala's integration with Java is that you can't catch checked exceptions, you catch all and extract the type with a case statement. Compilation is slow and becomes dead slow with the optimise flag. The expressivity of the language results in an abstraction gap when it comes to reasoning about runtime cost. Option, a great feature, incurs the cost of wrapper classes. Scala introduces a fair bit of boxing overhead. For comprehensions have overhead such that you may need to swap with a while/do loop. Closures naturally have overhead (at least until the JVM can optimise them). To ensure unboxed class fields, you need to qualify them with [this]. Immutable collections may need to be replaced with mutable or j.u.c, to deal with memory overhead, or performance (sometimes there are wins, like Scala's HashSet). The foldRight method will happily blow up when foldLeft wouldn't. Some serious runtime bugs get kicked down the road.  That said, you can build high performance systems in Scala. 

My least favourite aspect of Scala is probably backwards compatability. Scala has broken compatability three, maybe four times, in five years. This does not seem to be considered a problem by the community. I understand how this frees things up for the future, but the effort needed to bring all the dependencies forward yourself or wait until the maintainers rebuild is an area where the language does not strike me as remotely scalable. Cross-compilation support is available in in SBT, but is something of a workaround.

Conclusion

If a grand unified theory of programming language existed, its implementation would be called Scala. But enough of the modh coinníollach. Scala is neither a better Java/C# nor a worse Haskell/OCaml; it is its own language, and the most advanced on the JVM today. How much you are willing to invest in and work with functional and type-centric constructs will have a bearing on how much value you can get out of the language. And I find I need to weigh up three externalities - runtime comprehension, maintaining a clear reader-comes-first codebase, and dealing with language/binary incompatibility.

I feel it's inevitable that functional programming plus Hindley/Milner/Damas flavor static typing will become a dominant paradigm. If that's right, what are today considered exotic constructs like closures, function passing, higher order functions, pattern matching and sophisticated type systems are going to be mainstream. It also means the more mathematical abstractions, such as monads and functors will be common understanding, certainly by the 2020s (they are closer than you think!). As such spending time with Scala (and Rust/Haskell/OCaml) is a worthwhile investment. There will still be places for dynamically typed and close to the metal languages, but orthodox OO as practised today, is a dead end. Scala is this paradigm's breakout language and the one mostly likely to drag us half-way to Haskell.

Complexity is a criticism that is frequently levelled at Scala, one which gets pushback from the community. It's a sore spot, as there is no shortage of FUD. My take is that Scala's complexity is typically justified for the broad set of applications and paradigms it can encompass.  

Reading 

Scala has a relatively small number of books. Unfortunately since the language is moving quickly a number of them are out of date, as good as they may be. 

  • Scala for the Impatient, Cay Horstmann. This is probably the closest thing to an up to date text; as of mid-2013 it's the go to introductory book on the language.  Written by a professional writer who's covered many languages in previously, and it shows.
  • Scala in Depth, Josh Sureth. Targetted at people familiar with Scala but looking for idiom. It covers Scala 2.9 and by definition is out of date, however a lot of available open source is still targetting 2.9, so it remains relevant at least up to 2014. 
  • Programming In Scala 2nd Edition, Martin Odersky, Lex Spoon, Bill Venners. This has been the standard reference for the last number of years and a very good book. It's out of date by virtue of covering Scala 2.8, but worth picking up to understand the language's core, especially if you see it discounted.  
  • Programming Scala, Alex Payne, Dean Wampler. Part textbook, part evangelism, I credit this book with getting me interested in Scala again. Sadly, it's too out of date to recommend by way of targetting 2.7 and giving mention of 2.8 features, which hadn't been released at the time.
  • Akka Concurrency, Derek Wyatt. Not strictly a Scala book, but a fun read on programming with Akka, actors, and for concurrency in general with Scala. It targets Akka/Scala 2.10.
  • Functional Programming in Scala, Paul Chiusano, Rúnar Bjarnason. Not out untill Sept 2013. Teaches functional programming using Scala, as opposed to being a book about Scala. I have high hopes for this one, there's a gap in the market for a pragmatic book on functional programming.

Scala's flexibility is such that idiomatic Scala isn't always obvious. I've found these two guides helpful -

 


 

[1] Java, Groovy, Jython, JRuby, Clojure. 

On Go

This is one of a series of posts on languages, you can read more about that here.

Herein a grab bag of observations on the Go programming language, positive and negative. First, a large caveat - I have no production experience with Go. That said I'm impressed. It's in extremely good shape for a version 1. To my mind it bucks some orthodoxy on what a good effective language is supposed to look like; for that alone, it's interesting. It makes sensible engineering decisions and is squarely in the category of languages I consider viable for server-side systems that you have to live with over time.  

You can find more detail and proper introductory material at http://golang.org/doc. And this is my first observation - the language has great documentation.

What's to like?  

I’m gonna go get hammered with Papyrus

Syntax matters. In all languages I've used and with almost all teams, there's been a need to agree syntax and formatting norms. And so, somebody, invariably, prudently, fruitfully, gets tasked with writing up the, to be specific, company style guide. Go eschews this by supplying gofmt, which defines mechanically how Go code should be formatted. I've found little to no value in formatting variability with other languages - even Python which constrains layout more than most doesn't goes far enough to lock down formatting. It's tiring moving from codebase to codebase and adjusting to each project's idiom - and open source now means much of the code you're working with isn't written to your guidelines. Another benefit of gofmt is that it makes syntax transformations easier - for example the gofix tool is predicated on gofmt. This isn't as powerful as a type driven refactoring but is nonetheless useful. So while I gather it's somewhat controversial, I like the decision to put constraints on formatting. 

yak

As is the way with recent idiom in languages, semi-colons are gone, no suprise there. Interestingly, there's no while or do, just for, something I like so far. If statements don't have parens, which took me a while to get used to, but have come to make visual sense. If statements on the other hand, don't have optional bracing, which I really like. A piece of me dies every time someone removes the carefully crafted if braces I put around one-liners in Java. Go doesn't have assertions - having seen too much sloppy code around assertions, specifically handling the failure itself I think this is a good decision. The import syntax using strings (like Ruby's require statement), something I'm not crazy about; but what's interesting is that these are essentially paths, not logical packages.

Declaration order reverses the usual C/C++/Java way of saying things.  This is similar to Scala and is something I like because it's easier when speaking the code out - saying 'a of type Int' is less clumsy than 'the Int named a' - the latter sounds like a recording artist wriggling out of a contract. Go has simple type derivation when using the ':=' declaration and assignment operator, albeit somewhat less powerful than again say, Scala inference.

Syntactically the language reminds me of JavaScript and Groovy, but feels somewhat different to either. The end result is a comprehensible language. Go has minimal variation and no direct support in the language for DSL-like expression. The hardest part I've found to be reasoning with pointers, but even those get manageable. What you see is more or less what you get - if you value readability over self-expression (and not everyone does, but I do), this is a big win.

You had just one job

There are no exceptions in Go and no control structures at all for error handling. I find this a good decision. I've believed for a long time that checked exceptions are flawed idea. Thankfully this isn't controversial anymore and all's right with the world. Runtime exceptions are not much better. Maybe try/finally is ok, but as we'll see, Go has a more concise way to express code that should execute regardless. 

So you handle errors yourself. Go has an inbuilt interface called error - 

type error interface {
    Error() string
}

and allows a function to return it along with the intended return type, which has a surface similarity with multiple return values in Groovy/Python. Now, I can see this kind of in place error handling driving people insane -

if f, err := os.Open(filename); err != nil {
 return err
}

but I prefer it to try/catch/finally, believing that a) failures and errors are commonplace, b) what to do about them is often contextual such that distributing occurence and action isn't the right default. Pending some improved alternate approach or a real re-think on exceptions, it's better not to have the feature. 

Because there are no exceptions there's no finally construct - consequently you're subject to bugs around resource handling and cleanup. Instead there is the 'defer' keyword, that ensures an expression is called after the function exits, providing an option to release resources with some clear evaluation rules. 

Go has both functions and methods and they can be assigned to variables to provide closures. Functions are just like top level def methods in Python or Scala and can used directly without an enclosing class structure via an import. Methods are defined in terms of a method receiver. A receiver is instance of a named type or a struct (among others). For example the sum function below can be used against integers whenever it is imported -

package main
import "fmt"
func sum(i int, j int) (int, error) {
  return i + j, nil
}
 
func main() {
    s, err := sum(5,10)
    if err != nil {
     fmt.Println("fail: " + err.Error())
    } 
    fmt.Println(s)
}

Alternatively, to make a method called 'plus' the receiving type, is stated directly after 'func' keyword -  

package main
import "fmt"
type EmojiInt int
func (i EmojiInt) plus(j EmojiInt) (EmojiInt, error) {
  return i + j, nil
}
 
func main() {
    var k EmojiInt = 5
    s, err := k.plus(10)
    if err != nil {
     fmt.Println("fail: " + err.Error())
    } 
    fmt.Println(s)
}

This composition drive approach leads us into one the most interesting areas of the language, that's worth its own paragraph.

Go does not have inheritance or classes. 

If it helps at all, here's a surprised cat picture -

While OO isn't clearly defined, languages like Java and C# are to a large extent predicated on classes and inheriteace. Go simply disposes of them. I'd need more time working with the language to be sure, but right now, this looks like a great decision. Controversial, but great. You can still define an interface -

type ActorWithoutACause interface {
    receive(msg Message) error
}

and via a structural typing construct, anything that implements the signatures is considered to implement the interface. The primary value of this isn't so much pleasing the ducktyping crowd (Python/Ruby developers should be reasonably ok with this) and support of composition, but avoiding the premature object hierarchies typical in C++ and Java. In my experience changing an object hierarchy is heavyweight, and it requires effort to avoid creating one early. These days I'm reluctant to even define Abstract/Base (implementation inheritance) types - I'll use a DI library to pass in a something that provides the behaviour. I'd go as far as saying I'd prefer duplicated code in early phase development to establishing a hierarchy (like I said it requires effort). Go lets me dodge this problem by providing functions that can be imported but no way to build up a class hierarachy.

This ain't chemistry, this is art.

 

Dependency seems to be a strong focus of the language. Depedendecies are compiled in, no dynamic linking. Go does not allow cyclic depedencies. Consequently build times in Go are fast. You can't compile against an unused dependencie - eg this won't compile -

package main
import "fmt"
import "time"
 
func main() {
    fmt.Println("imma let u finish.")
}
prog.go:3: imported and not used: "time 

which may seems pedantic but scales up well when reading existing code - all imports have purpose.  I mentioned that Go builds are fast. Actually they're lightning fast, fast enough to be used for short scripts. You can use the dependency model to fetch remote repositories, eg via git. I have more to say on that when it comes to externalities.

Package visibility is performed via capitalization of the function name. Thus Foo is public, foo is private.  I'll take it over private/public/protected boilerplate. I  would have gone for Python's _foo idiom myself, but that's ok, it's obvious what's what when reading code. 

Go doesn't have an implicit 'self/this' concept, which is great for avoiding scoping headaches a la Python, as well as silly interview questions. When names are imported, they are prefixed - 

package main
import "fmt"
import "time"
 
func main() {
    time.Sleep(30)
    fmt.Println("imma let u finish.")
}

such that imported names are qualified, all unbound names are in the package scope. Note how I still have to qualify Sleep and Println with their time and fmt packages. I love this - it's one of my favorite hygenic properties in the language. If you dislike static imports in Java as much as I do and the consequent clicking through in the IDE to see where the hell a name came from, you may also like what Go does here.

Go allows pointers. For the variable 'v', '&v' holds the address of v rather than its value.

 

package main
import "fmt"
func main() {
 v := 1; 
 vptr := &v
 fmt.Println(v)
 fmt.Println(vptr)
 fmt.Println(*vptr)
 fmt.Println(*&v)
}
1
0xc010000000
1
1

 

 

Overall this enables by reference and by value approaches; for some usecases it's useful to avoid a data copy (technically, passing a pointer creates a copy of the pointer, so ultimately its all pass by value). Thankfully there's no pointer math, so JVM types like myself don't need to freak out on seeing the '*' and '&' symbols (and in case you're wondering arrays are bounds checked). Go has two construction keywords for types - new and make. The new keyword allocates memory but doesn't initialise and returns a pointer. The make keyword performs allocation and initialization and returns the type directly.

There aren't any numeric coercions, so you can't do dodgy math over different scalar types.  This isn't really a feature because languages that allow easy coercions like this are arguably broken. Still, given Go's design roots in C and C++, I'm happy to see that particular brokeness wasn't brought forward. It still won't stop anyone using int32 to describe money however.  

You have just saved yourself from a fate worse than the frying pan

The Go concurrency model prefers CSP to shared memory. There are synchronization controls available in the sync and atomic packages (eg CAS, Mutex) but they don't seem to be the focus of the language. In terms of mechanism, Go uses channels and goroutines, rather than Erlang-style actors. With actor models the process receiver gets a name (in Erlang for example, via a pid/bif), whereas with channels the channel is instead the thing named. Channels are are sort of typed queue, buffered or unbuffered, assigned to a variable. You can create a channel and use goroutines to produce/consume over it. To get a sense of how that looks, here's a noddy example - 

package main
import "fmt"
import "time"
 
func emit(c chan int) {
    for i := 0; i<100; i++ {
        c <- i
        fmt.Println("emit: ", i)
    }
}
 
func rcv(c chan int) {
   for {
        s := <-c 
        fmt.Println("rcvd: ", s)
    }
}
 
func main() {
    c := make(chan int, 100)
    go emit(c)
    go rcv(c)
    time.Sleep(30)
    fmt.Println("imma out")
}

Channels are pretty cool. They provide a foundation for building things like server dispatch code and event loops. I could even imagine building a UI with them.  

As you can see above it's easy enough to use goroutines - call a function with 'go' and you're done. Once they're created goroutines communicate via channels, which is how the CSP style is achieved. That said you can use shared state such as mutexes but the flavour of the language is to work via channels. Goroutines are 'lightweight' in the Erlang sense of lightweight rather than Java's native threads. Being 'kinda green' they are multiplexed over OS threads. Go doesn't parallelize over multiple cores by default as far as I can tell; like Erlang it has to be configured to do so.  It is possible to allot more native threads to leverage the cores via the runtime.gomaxprocs global, which says 'this call will go away when the scheduler improves'; it will interesting to see what happens in future releases. Go's default is closer to Node.js with the caveat it can be made multicore whereas Node can only run single threaded. Otherwise, the approach to scaling out seems to be to use rpc to dispatch across multiple go processes for now.  As best as I can tell a blocking goroutine won't block others as the other goroutines can be shunted onto another native thread, and it seems that blocking syscalls are performed by spawning an additional thread so the same number of threads are left to run goroutines, but my knowledge of the implementation is insubstantial, so I might have the details wrong. 

Externalities

Onto some things that bother me about Go. 

Channels are not going to be as powerful as Actors used in conjunction with pattern matching and/or a stronger type system. Mixing channel access with switch blocks seems possible if you want to emulate an actor style receive model, but it'll be lacking in comparison Erlang and Scala/Akka that underpin their various actor models. That said, channels seem more than competitive in terms of the all import concurrency/sanity tradeoff when compared to thread based synchronization. I can't imagine wanting to drop into threads after using channels.

The type system in Go is antiquated. If you're bought into modern, statically typed languages such as Haskell, Scala OCaml and Rust and value what they give you in terms of program correctness, expresiveness, and boilerplate elimination, Go is going to seem like a step backwards. It is probably not for you. I'm sympathetic to this viewpoint, especially where efforts are made to match static typing with coherent runtime behaviour and diagnostics, not just correctness qualities. On the other hand if you live in the world of oncall, distributed partial failures, traces, gc pauses, machine detail, and large codebases that come with their very own Game of Code social dynamics, modern static typing and its correctness assurances don't help much. Perhaps the worst aspect with Go is that you are still subject to null pointers via nil; trapping errors helps but not as much as a system that did its best to design null out, such as Rust

Non-declaration of interface implementation feels in a kind of chewy yet insubstantial way, like a feature that isn't going to scale up. I imagine this will be worked around with IDE tooling providing implements up/down arrows or hierachy browsers. Easier composability via structural typing is possibly a counter-argument to this if the types in the system interact with less complexity than sub-type heavy codebases, something that's achievable to a point with Python/Scala but impractical in Java. So I'm ready to be wrong about this one. 

Go concurrency isn't safe, for example, you can deadlock. Go is garbage collected, using a mark/sweep collector. While it's probably impossible for most of us to program concurrent software and hand manage memory, my experience with Java is a lot of time dealing with GC, especially trying to manage latency at higher percentiles and overflowing the young generation. Go structs might allow better memory allocations, but I don't have the flight time to say GC hell will or won't exist in Go. It would be very interesting to see how Go holds up under workloads witnessed by datastores such as Hadoop/HBase, Cassandra, Riak and Redis, or modern middlewares like Zookeeper, Storm, Kafka and Rabbitmq.

The 'go get' importing mechanism is broken, at least until you can specify a revision number. I'd hazard a guess and say this comes from Google's large open codebase, but I've no idea what the thinking is. Having worked in a codebase like that I can see how it makes sense along with a stable master policy. But I can also see stability will suffer from an effect similar to SLA inversion, in that the probability of instability is the product of your externally sourced dependencies being unstable. It's important to think hard about your dependencies, but in practice if you have make an emergency patch and you can't because you can't build because of upstream changes you are SOL. A blameless post-mortem that identified inability to build leading to a sustained outage is going to result in a look of disapproval, at best. I don't see how to protect from this except by copying all dependencies into the local tree and sticking with path based imports, or using a library based workaround. The former complicates bug propagation in the other direction resulting in rot. Put another way using sneaky pincer reasoning - if Go fundamentally believed this was sane, the language and the standard libraries wouldn't be versioned. Thankfully it's a mechanism rather than a core language design element and should be something that can get fixed in the future. 

Conclusions

Go seems to hit a sweetspot between C/C++, Python JavaScript, and Java, possibly reflecting its Google heritage, where those languages are I gather, sanctioned. It seems to be trying to be a more effective language rather then a better language, especially for in-production use. 

Should you learn it and use it? Yes, with two caveats. How much you like static type systems, and how much you value surrounding tooling.

If you really value modern, powerful type systems as seen in Haskell, Scala and Rust, I worry you'll find Go pointless. It offers practically nothing in that area and is arguably backwards looking. Yes there is structural typing, and (thankfully) closures, but no sum types, no generics/existentials, no monads, no higher kinds, etc - I don't think anyone's going to be doing lenses and HLists in Go while staying sane. 

An issue is whether significant investment into the Go runtime and diagnostic tooling will happen outside Google. Tools like MAT, Yourkit, Valgrind, gcviz etc are indescribably useful when it comes to running server workloads. The ecosystem on the JVM for example, in the form of the runtimes, libraries, diagnostic tools, and frameworks, is like gravity - if anything was going to kill Java, it was the rise of Rails/Python/PHP in the last decade - that didn't happen. I know plenty of shops are staying on the JVM, or have even moved back to Java, mostly because of its engineering ecosystem. This regardless of the fact the language has ossified. JVMs have been worked on for nearly two decades, by comparison Go's garbage collector is immature, and so on.  

A final thought. Much code today is written in a language that can be considered to have a similar surface to Go - C, C++, C#, Python JavaScript, and Java. If you buy into the hypothesis that programming language adoption at the industry level is slow and highly incremental, then Go's design center is easy to justify and it very broad adoption possible, given either a killer application (a la Ruby on Rails for Ruby and browsers for JavaScript) or a set of strong corporate backers who bet on the language (a la Java and C#). Aside from generics and possibly annotations, there's a reasonable argument to be made that Go is sufficiently more advanced than Java and JavaScript without being conceptually alien, and good enough compared to C#, Python and C++. Plenty of shops don't make decisions based on market adoption, but for larger engineering groups it's inevitably a concern.  


 

2013/06/02: updated the Node/Erlang concurrency and clarified pointer observations with feedback from @jessemcnelis. Somehow in this wall of text, I forget to mention readability, thankfully was reminded by @davecheney

On Languages

 

 

In my first degree I had to learn a language called Pascal, with a big old blue Walter Savitch book  and a lecturer who said you can make a computer do anything. He was wrong, or simply didn't understand how little he, or I, knew about computers at the time. This was an industrial design degree and programming was part of the technology/science block given in first year.

Fast forward 8 years and I'm doing an AI degree, with the data structures/101 course being taught in Modula-2, a fix-upped Pascal and I'm thinking not much has changed. After that we had to learn C and Prolog in different courses at the same time. The course lecturers I gathered did this deliberately to keep us flexible. This clearly could work - C and Prolog might as well be the Zerg and Protoss of languages. On reflection I think this is one of the better things that happened to me, education-wise.  Since then I have tried to learn the basics of new languages in pairs or more, and if possible digest languages that are reasonably different in approach. 

I have been playing around with Erlang, Scala and Go more and more, to the point where I'm able to draw conclusions about them as production languages.  I don't consider myself expert in any of them, but like most people in my line of work I have to be able to make judgments in advance of deep understanding. I've also spent time to a lesser extent with Rust, not enough to decide anything much. I've been foostering with Erlang and Scala on and off for years, Go and Rust are more recent.

I wanted to write down what I've found to be good about the languages so far. For context, the production languages I work with most these days are Java, Python and Groovy, indirectly Clojure and Scala (the latter due to Storm and Kafka, platforms I operate with my current employer). 

I also want to talk about what's 'wrong' with these languages, and by what's wrong I mean two things. First, absence of features; an example might be the limited static typing features in Go. Second, what I'm going to call 'negative externalities' - aspects that distract from or even outweigh the direct benefits of the languages. Externality concerns are usually more important to me than missing features because of the way they impact engineering quality and cost. In Java an example might be GC pauses or boilerplate. In Groovy or Clojure it might be comprehending stacktraces or runtime type failures that other languages would catch earlier. In Scala it might be compatability or readability. The longer I work in this industry the more I find myself concerned with ergonomic/lifecycle factors and I believe understanding the more indirect aspects of a language helps make better decisions when it comes to selection.

I'll add links here as I write each one up -

 

Switching from TextDrive to Rackspace and Linode

 Some notes and observations on moving hosting providers.

Why?

Well, Joyent said August last year that people on lifetime deals from TextDrive hosting, of whom I'm one,  would come to be no longer be supported. Then they changed their minds somewhat, and their deadlines. That had me looking around. Not because Joyent don't want to support years old BSD servers rusting in some cage somewhere, I have a realistic definition of lifetime when it comes to commercial offerings. Over six years I've been using TextDrive it has been a great deal, all things considered, so no complaints there.  

Then TextDrive was restarted and said they would honor the accounts. That's beyond admirable, but communication has been less than ideal - simply figuring out what to do to get migrated and what I'd be migrating to has been too difficult (for me).  The new TextDrive situation should work itself out in time - people are working hard and trying to do right thing. But I wanted something with more certainty and clarity. 

Aside from incidentals like old mercurial/subversion repositories and files, the two main services to move were this weblog and email. 

Email

Email was hosted on a server called chilco, and had been creaking for some time. IMAP access had slowed down noticeably in the last 12 months with increasing instances of not being to connect to the server for a period of time. That the server certs haven't been updated is a constant irritation. I decided I'm past the point of running my own mail server and it was time to pay for a service, although Citadel was tempting. After whittling it down to Google and Rackspace, I went with Rackspace Email, for four reasons -  pricing, solid IMAP capability, just email and not an office bundle, and their support. It probably helped that I don't use GMail much.

Setting up mail with Rackspace was quick. DNS Made Easy lost me when I couldn't see my settings after my account expired, so I moved DNS management over to Rackspace as well and routed the MX records to their domain.  Rackspace have an option to auto-migrate email from existing servers. This didn't work, I strongly suspect the certificates being out of date on the old server were the cause. The answer was to copy over mail from one account to another via a mail client. That wasn't much fun, but it worked. 

Rackspace webmail and settings are proving very usable. IMAP access is fast. The sync option via an Exchange server is good. There's plenty of quota. Rackspace support has been excellent so far, they clearly do this for a living. I'm much happier with this setup.

Cost: $2 per email account per month, $1 per sync account per month.

Linode

I wanted to avoid a shared host/jailed setup. There's a lot to be said for running on bare metal, but  virtualization for hosting at personal scale makes more sense to me.  I considered AWS and Rackspace but the pricing didn't suit, mainly due to the same elastic pay as go model that works for commercial setups. Linode offered a predictable model for personal use. Other VPS/Cloud providers were either more expensive and/or lacked surrounding documentation. Plus I got a nod from someone I trust about Linode which settled it. 

In hindsight I should have done this years ago. Getting an Ubuntu 12.04 LTS set up was easy. Payment was easy as was understanding the utilization model. Using the online admin is easy - the green and gray look really ties the site together.  Linode instances have been responsive from the shell, more so than the previous BSD hosted option I was using. Having full access to the server instance is great as it avoids workarounds that come with shared hosting (I had an 'etc' folder in my home directory for example).

Linode has good documentation. I mean really good. All the instructions to get started just worked. 

Cost: $20.99 per 512M instance per month.

Weblog

Moving to Linode meant moving this weblog, a custom engine I wrote in 2006/7 based on Django and MySQL,. The last major work here was a pre 1.0.3 Django update in late 2008 that had trivial patches to get TinyMCE to work in the admin. It did occur to me to move to a non-custom engine like Wordpress, but that would involve a data migration. And I figured having originally wrote it to learn something, fixing it up four years on might teach also teach me something. 

I upgraded Django to 1.4.3, breaking one of my cardinal rule of migrations - migrate don't modify. In my defense I had started the upgrade process on the old server, but due to the uncertainty with Joyent/TextDrive and my discovering there was a December 31st 'deadline' for shutting off the old TextDrive servers, I decided to get migration done quick (that the deadline was with no explanation pushed back to January 31st after I raised it on the TxD forum, is an example of lack of clarity I mentioned). 

It took a while but was straightforward. Django 1.4 has a more modular layout for its apps - the main app can live side by side with other apps instead of being nested. You can see the difference here between the two layouts - 

This meant changes to import paths that were worthwhile, as the new way of laying out projects is much better. Some settings data structures have changed, notably for the database and for logging (which now actually has settings). The Feed API left the impression it was the one that was most changed from earlier versions, and while it's significantly improved, I couldn't see a simple way to emit blog content. I like full text feeds as it avoids people having to click through to read a post. I had to use the Feed API's internal framework for extending data -

class AtomFeedGenerator(Atom1Feed):
    def add_item_elements(self, handler, item):
        super(AtomFeedGenerator, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content', item['content'], {u'type':u'html'})

class LatestEntries(Feed):
    def item_extra_kwargs(self, item):
        return {'content': item.content}

 

Comments are still disabled until I figure out how to deal with spam which is a severe problem here, although the comments framework looks much improved. That's probably ok because the comments form UX on the blog sucked. Also, comments seem to be going away in general in favour of links get routed to social networks and the discussion happens there. I might experiment with Discqus or routing discussions to Google Plus. 

Now that django-tinymce exists, I was able to remove my own integration hack with the admin app. I used Django's cache api, which is much improved from what I remember, in conjunction with memcached. 

The old blog deployment used fast CGI with lighty. This was never a very stable deployment for me and I had wanted to get off it for some time. Years ago, mod_python was the production default for Django, mod_python in the past has made me want to stab my eyes out with a fork, hence fcgi. I switched over to mod_wsgi, which is the 1.4.3 default option for deployment, and we'll see how that goes. Certainly it requires less init boilerplate

The only Linode documentation that didn't work first time was getting Django and mod_wsgi setup, but it was from 2010 and easy to fix up. I ended up with this

WSGIPythonPath /home/dehora/public/dehora.net/application/journal/journal2
<VirtualHost dehora.net:80>
	ServerName dehora.net
	ServerAlias www.dehora.net
	ServerAdmin bill@dehora.net
	DirectoryIndex index.html
	DocumentRoot /home/dehora/public/dehora.net/public
	WSGIScriptAlias / /home/dehora/public/dehora.net/application/journal/journal2/journal2/wsgi.py
	<Directory /home/dehora/public/dehora.net/application/journal/journal2/journal2>
	  <Files wsgi.py>
	  Order allow,deny
	  Allow from all
	  </Files>
	</Directory>
        Alias /grnacres.mid 	/home/dehora/public/dehora.net/public/grnacres.mid
	Alias /pony.html 		/home/dehora/public/dehora.net/public/pony.html
	Alias /robots.txt 		/home/dehora/public/dehora.net/public/robots.txt
	Alias /favicon.ico 		/home/dehora/public/dehora.net/public/favicon.ico
	Alias /images 		/home/dehora/public/dehora.net/public/images
	#
	# todo: https://docs.djangoproject.com/en/1.4/howto/deployment/wsgi/modwsgi/#serving-the-admin-files 
	# for the recommended way to serve the admin files  
	Alias /static/admin 		/usr/local/lib/python2.7/dist-packages/Django-1.4.3-py2.7.egg/django/contrib/admin/static/admin
	Alias /static 		/home/dehora/public/dehora.net/public/static
	Alias /doc 			/home/dehora/public/dehora.net/public/doc
	Alias /journal1 		/home/dehora/public/dehora.net/public/journal
    <Directory /home/dehora/public/dehora.net/public/static>
    Order deny,allow
    Allow from all
    </Directory>
    <Directory /home/dehora/public/dehora.net/public/media>
    Order deny,allow
    Allow from all
    </Directory>
    <Directory /home/dehora/public/dehora.net/public/doc>
    Order deny,allow
    Allow from all
    </Directory>
	LogLevel info
	ErrorLog /home/dehora/public/dehora.net/log/error.log
	CustomLog /home/dehora/public/dehora.net/log/access.log combined
</VirtualHost>

I was very happy a data migration wasn't needed. Not a single thing broke with respect to the database in this upgrade.I'm with Linus when it comes to compatability/regressions and in fact think breaking data is even worse. If you do it, provide migrations, or go home. Too many people seem to think that stored data is something other or secondary to the code, which is wrong-headed - data over time is often more important than code. 

It's been four years give or take since I used Django, it was fun to use it again. Django has grown in that time - apis like feed, auth and comments look much better, modularization is improved - while retaining almost everything I like about the framework. Django has a large ecosystem with scores of extensions. Django was and still is the gold standard by which I measure other web site frameworks. Grails and Play are the closest contenders in the JVM ecosystem, with Dropwizard being what I recommend for REST based network services there. One downside to Django (and by extension, Python) worth mentioning is that the stack traces are less than helpful - more than once I had to crack open the shell and import some modules to find my mistakes.

Observations

The main lessons learned in the move were as much reminders as lessons. 

First if it ain't broke don't fix it, is a bad idea that doesn't work for web services. You should as much as possible be tipping around with code and keeping your hand in so that you don't forget it. Most of the time I spent migrating this weblog was due to unfamilarity with the code and Django/Python rustiness (but see the fifth point as well). Never mind the implications of running on ancient versions, I simply wish I had checked in on this project a couple of times a year.  Not doing so was stupid.

Second, when a framework comes to support a feature you added by hand, get rid of your code/workaround - if it was any good you'd have contributed it back anyway. 

Third, divest of undifferentiated heavy lifting; in my case this was email. If you don't enjoy doing it, learn nothing from it, gain no advantage by it, let it go.

Fourth, excess abstraction can be a future tax. In my case the weblog code was intended to support multiple weblogs off a single runtime. Turns out after five years I don't need that, and it resulted in excess complexity in, the database model, view injections, url routing, and path layouts for files/templates. All these got in my way - for example dealing with complex url routing cost me hours with zero benefit. I'll have to rip all this nonsense out to ungum the code. I mention this also because I've been critical recently of handwaving around excess abstraction - claims of YAGNI and other simplicity dogma are ignorable without examples.

Fifth, and this is somewhat related to the first point. Online service infrastructure has come a long long way in the last number of years. I know this because it's part of my career to know this, many people reading this will know this, but when you experience moving a more or less seven year old setup to a modern infrastructure, it's truly an eye opener. My 2006 hosting setup was so much closer to 1999 than 2013.

Finally - hobby blog projects are for life.  I'm looking forward to expanding Linode usage and experimenting with the code more. When you have complete control of instances you also get more leverage in your infrastructure choices - for example I now have the option to use Cassandra and/or Redis to back the site. I don't see myself going back to a shared hosting model.

 

Level Set

Joe Stump: "It's a brave new world where N+1 scaling with 7,000 writes per second per node is considered suboptimal performance"

Spotted on the cassandra users list. This was on EC2 large instances, which should put the wind up anyone who thinks you need specialised machinery for anything but extreme performance/scale  tradeoff requirements. For many cases now, it seems you don't need to just pick one.

And check out Joe's startup, SimpleGeo: a scale out geo spatial/location platform running on (I think still) AWS and using Cassandra as the backing database, which a bunch of geospatial software built on top. It is a new world - a startup couldnt have done this even a few years ago, not without a boatload of VC funding for the hardware.

Ubuntu Lucid 10.04 on Thinkpad W500


After over 3 years at 10 hours a day it was time to retire the thinkpad T60. I replaced it with a thinkpad W500, not the most recent model but well thought of and importantly works with Ubuntu Linux. The fan was nearly worn out, the disk slow (5100rpm) and the right arrow key was dead (using emacs or programming without the right arrow key is no fun at all).


The W500 a nice machine. The reason I use Thinkpads over other machines are build quality, keyboard and that they generally just work with Linux. Not many laptops will take the consistent (ab)use my T60 has seen. The W500 keyboard is very good and the build seems fine. The single major criticism I had of the T60 was its dull screen - the W500 is much better here. Its sound quality is much better than the T60. It has a 7200rpm HDD which is noticeably better than the T60's 5100rpm and 4Gb RAM will do for now.

This was my first use of Ubuntu 10.04 Lucid. I had held off, knowing a new machine was coming.So far I'm very impressed. Installation went very well, after a clearing 270Gb of the 320Gb drive, it installed in a few minutes. This is the most straightforward installation of any Linux I've used (I go back to Redhat 4), and it matches a Windows install for simplicity. The days of deciding how big to make /home and stepping through X11 and networking stuff are long gone. Also, the steps toupdate the operating system and packages with Synaptic are very simple - my Windows 7 update on the other disk partition hung a few times, so it seems Microsoft haven't quite sorted this out (I've had a few severe problems with Vista updates).

All went well until I clicked on the proprietary drivers option for the ATI Mobility Radeon HD3650. I had read in the past that there were problems with fglrx and the ATI card, but not that on startup the screen would go dead with no possibility to switch run levels or move into recovery. After trying to manually fix up Xorg via a LiveCD, I reinstalled. Because I had already copied over a lot of data I shrunk the original partition and reinstalled to reduce copying time. Again installation was a breeze. However the steps to remove the old partition and grow the new one via gparted resulted in the /boot partition getting messed up somehow. I was't able to get grub to work so I had to blitz the partition and install a 3rd time. It was unfortunate to mess things up twice, but that's not  a criticism of the distribution. (or gparted, this is the first time I've had anything untoward happen with it)

Ubuntu 10.04 Lucid is an excellent distribution and it is very close to the goal of  a Linux for human beings. I have two criticisms of the UI. First the window controls. I don't care they're now on the left, I do care that it breaks symmetry with the previous window layout by placing the minimize button in the center instead of the right. This is bad design as the most common operation is collapse and the relative sizings not have disjoint order that makes you think. Second, I believe the mouse grab for window sizing is too fine grained - I find myself having to place the mouse very carefully to grab the box.

It's always interesting to look at the extra software you need to install use the computer. Here's the list -

ant
build-essential
cisco vpnclient
django
emacs
eclipse+pydev
gimp
git
gparted
ipython
idea9
maven
mercurial
meld
mysql server
mypasswordsafe
pidgin (can't use empathy at work)
protocol buffers
p4/p4v
scala
skype
ssh
subversion
sun-java6-jre
thrift
thunderbird
vmware (visio, word, powerpoint)
wireshark

As far as I can tell this is less software that I used to depend on, which is a good thing. The main installed software is Firefox, Bash and Open Office. I still need to use MS Ofice via VMWare for Powerpoint and Word, as Presentation and Word Processor aren't quite good enough, whereas Spreadsheet is excellent.

The biggest changes in the last couple of years have been Git, Scala, and Emacs Orgmode. I still find Mercurial more usable than Git but so much OSS is in Git now, it's neccessary to have a working knowledge. Or more accurately, so much OSS is in github - github seems to be becoming the Facebook of DVCS. Scala has become my most liked JVM language in the last few years, although most day to day work is Java. Scala makes excellent tradeoffs between type safety, performance and expressiveness but the repl is not fast enough nor that language syntactically simple enough to replace Python for off JVM work. Orgmode is the biggest change and has become vital to me. Orgmode is the only notetaking/organisation tool I use now and the only GTD style app that has not failed me when I really needed it - it's indescribedly excellent and a true productivity enhancing tool - I can't recommend it enough.

Extensions v Envelopes

Here's a sample activity from the Open Social REST protocol (v0_9):

<entry xmlns="http://www.w3.org/2005/Atom">
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title>some activity</title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
        href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <content type="application/xml">
       <activity xmlns="http://ns.opensocial.org/2008/opensocial">
           <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
           <title type="html"><a href=\"foo\">some activity</a></title>
           <updated>2008-02-20T23:35:37.266Z</updated>
           <body>Some details for some activity</body>
           <bodyId>383777272</bodyId>
           <url>http://api.example.org/activity/feeds/.../af3778</url>
           <userId>example.org:34KJDCSKJN2HHF0DW20394</userId>
       </activity>

    </content>
</entry>


It's 1.1 kilobytes. I'll call that style "enveloping". Here's an alternative that doesn't embed the activity in the content and instead use the Atom Entry directly, which I'll call "extending":

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:os="http://ns.opensocial.org/2008/opensocial>
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title
type="html"><a href=\"foo\">some activity</a></title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
       href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <os:bodyId>383777272</os:bodyId>
   <content>Some details for some activity</content>
</entry>


It's 686 bytes (the activity XML by itself is 460 bytes). As far as I can tell there's no loss of meaning between the two. 545 bytes might not seem worth worrying about, but all that data adds up (very roughly 5.5Kb for every 10 activities, or 1/2 a Meg for every 1000), especially for mobile systems, and especially for activity data. I have a long standing belief that social activity traffic will dwarf what we've seen with blogging and eventually, email. If you're a real performance nut the latter should be faster to parse as well since the tree is flatter. The latter approach is akin to the way microformats or RDF inline into HTML, whereas the former is akin to how people use SOAP.

Ok, so that's bytes, and you might not care about the overhead. The bigger problem with using Atom as an envelope is that information gets repeated. Atom has its own required elements and is not a pure envelope format like SOAP. OpenSocial's "os:title", "os:updated", "os:id", "os:url", "os:body", "os:userId" all have corresponding Atom elements (atom:title, atom:id, atom:link, atom:content, atom:url). Actually what's really interesting is that only one new element was needed using the extension style, the "os:bodyId" (we can have an argument about os:userId, I mapped it to atom:url because the example does as well by making it a urn).  This repetition is an easy source of bugs and dissonance. The cognitive dissonance comes from having to know which "id" or "updated" to look at, but duplicated data also means fragility. What if the updated timestamps are different? Which id/updated pair should I use for sync? Which title? I'm not picking on Open Social here by the way, it's a general problem with leveraging Atom.

I suspect one reason extensions get designed like this is because the format designers have their own XML (or JSON) vocabs, and their own models, and want to preserve them. Designs are more cohesive that way. As far as I can tell, you can pluck the os:activity element right out of atom:content and discard the Atom entry with no information loss, but this begs the question - why bother using Atom at all? There are a couple of reasons. One is that Atom has in the last 4 years become a platform technology as well as a format. Syndication markup now has massive global deployment, probably second only to HTML. Trying to get your pet XML format distributed today without piggybacking on syndication is nigh on impossible. OpenSocial, OpenSearch, Activity Streams, PSHB, Atom Threading, Feed History, Salmon Protocol, OCCI, OData, GData, all use Atom as a platform as much as a format. So Atom provides reach. Another is that Atom syndicates and aggregates data. "Well, duh it's a syndication format!", you say. But if you take all the custom XML formats and mash them up all you get is syntactic meltdown. By giving up on domain specificity, aggregation gives a better approach to data distribution. This I think is why Activity Streams, OpenSearch and Open Social beat custom social netwoking formats, none of which have become a de-facto standard the way say, S3 has for storage - neither Twitter's or Facebook's API is de-facto  (although StatusNet does emulate Twitter). RDF by being syntax neutral is even better for data aggregation but that's another topic and a bit further out into the future.

So. Would it be better to extend the Atom Entry directly? We've had a few years to watch and learn from social platforms and formats being built out on Atom, and I think that direct extension, not enveloping, is the way to go. Which is to say, I'll take a DRY specification over a cohesive domain model and syntax. It does means having to explain the mapping rules and buying into Atom's (loose) domain model, but this only has to be done once in the extension specification, and it avoids all these "hosting" rules and armies of developers pulling the same data from different fields, which is begging for interop and semantic problems down the line.

I think in hindsight, some of Atom's required elements act against people mapping into Atom, namely atom:author and atom:title. Those two really show the blogging heritage of Atom rather than the design goal of a well-formed log entry. Even though author is a "Person" construct in Atom, author is a fairly specific role that might not work semantically for people (what does it mean to "author" an activity?). As for atom:title, increasingly important data like tweets, sms, events, notifications and activities just don't have titles, which means padding the atom:title with some text. The other required elements - atom:id, atom:updated are generic constructs that I see as unqualified goodness being adopted in custom formats (which is great). The atom:link too is generically useful, with one snag, it can only carry one value in the rel attribute (unlike HTML). So these are problems, but not enough to make me want to use an enveloping pattern.

Just a little work

 

Tim Bray : "I'm pretty sure anybody who's been to the mat with the Android APIs shares my unconcern. First of all, a high proportion of most apps is just lists of things to read and poke at; another high proportion of Android apps are decorated Google maps and camera views. I bet most of those will Just Work on pretty well any device out there. If you’re using elaborately graphical screens you could do that in such a way as to be broken by a different screen shape, but it seems to me that with just a little work you can keep that from happening."

Tim might want to live through a few real handset projects to understand portability costs. All that little work adds up and is sufficient to hurt the bottom line of a company or an individual, perhaps enough to keep them with Apple. Even if you could develop a portable .apk through disciplined coding, the verification testing alone will hurt, especially as the Android ecosystem of hardware and versions grows.

"Oh, and the executable file format is Dalvik bytecodes; independent of the underlying hardware."

I've heard the same said about J2ME bytecode. 

Activity Streams extension for Abdera

The next time I see someone saying XML is inevitably hard to program to, I'll have a link to some code to show them:

public static void main(String[] args) {
  Abdera abdera = new Abdera();
  abdera.getFactory().registerExtension(new ActivityExtensionFactory());
  abdera.getFactory().registerExtension(new AtomMediaExtensionFactory());

  Feed feed = abdera.newFeed();               
  ActivityEntry entry = new ActivityEntry(feed.addEntry());
  entry.setId("tag:site.org,2009-01-01:/some/unique/id");
  entry.setTitle("pt took a Picture!");
  entry.setVerb(Verb.POST, false);
  entry.setPublished(new Date());
  Photo photo = entry.addTypedObject(ObjectType.PHOTO);
  photo.addThumbnail(
    "https://example.org/pt/1/thumbnail",
    "image/jpeg", 16, 32);
  photo.addLargerImage(
    "http://example.org/ot/1/larger",
    "image/jpeg", 1024, 768);
  photo.setTitle("My backyard!");
  photo.setDescription("this is an excellent shot.");
  photo.setPageLink("http://example.org/pt/1");
}

That generates Activity Streams (AS), an extension to Atom - you can read about it here - http://activitystrea.ms. I think Activity Streams are going to be an important data platform for social networking.

The scalability of programming languages

Ned Batchelder: "Tabblo is written on the Django framework, and therefore, in Python. Ever since we were acquired by Hewlett-Packard two and a half years ago, there's been a debate about whether we should start working in Java, a far more common implementation language within HP. These debates come and go, with varying degrees of seriousness."

For anyone coming from Python and looking at the type system side of things, and not socio-technical factors such as what particular language a programming shop prefers to work in, I would recommend Scala over Java. It has a good type system, allows for brevity, and some constructs will feel very natural (Sequence Comprehensions, Map/Filter, Nested Functions, Tuples, Unified Types, Higher-Order Functions). Yes, I know you can run Django in the JVM via Jython, I know there's Clojure, and Groovy too. This is just about the theme of Ned's post, which is the type system. And Scala has a better one than Java.

James Bennett: "The other is that more power in the type system ultimately runs into a diminishing-returns problem, where each advance in the type system catches a smaller group of errors at the cost of a larger amount of programmer effort"

Sure, maybe at the higher order end of the language scale. But in the industry middle, there's less programmer effort around Scala than Java, modulo the IDE support but that changes year by year.

The Boy Hercules strangling a snake

Anyway, the real problem with Python isn't the type system - it's the GIL ;)

 

Bug 8220

"The TAG requests that the microdata feature be removed from the specification."

RDFa is preferred by the W3C TAG over the Microdata spec made up in HTML5.

How this one plays out will be interesting.  Pass the popcorn!

 

Java Software Foundation

Joe Gregorio: "Does the ASF realize that subversion isn't written in Java?"

Better not tell them about Buildr, Thrift, CouchDB, Etch, TrafficServer et al.

 

Copier Heads

Robert Scoble: "Every month longer that this deal takes is tens of millions in Google’s pockets. Why? Well, the real race today isn’t for search. Isn’t for email. Isn’t for IM. It’s for ownership of your mobile phone."

That was back at beginning of 2008, at the height of the Microhoo excitement. It's interesting to revisit these things.

Scoble said that this because he "met the guy who runs China’s telecom last week in Davos. He’s seeing six million new people get a cell phone in China every month."

That was 138 per minute, about twice the growth rate of internet/web adoption. In terms of world wide adoption of phones, Scoble was probably off by an order of magnitude. It's not the "next big game" as one commenter put it (Tim O'Reilly). It is the big game.

Steve Jobs: "Basically they were copier heads that just had no clue about a computer or what it could do. And so they just grabbed, eh, grabbed defeat from the greatest victory in the computer industry. Xerox could have owned the entire computer industry today. Could have been you know a company ten times its size. Could have been IBM - could have been the IBM of the nineties. Could have been the Microsoft of the nineties."

I read 99 comments back then. About half a dozen picked up on the mobile point. Everyone was talking about property rights on social graphs and inferred information, or web2.0 ad models, or search, or the importance of email. Google still seems to the webco that understands best the importance of mobile.

Concision

David Pollack: "We can argue about whether Scala is syntactically simpler than Java. I find that my code - I did two years of Ruby On Rails work before I did Scala and I've done Java work since 1996 - I find that Scala is as concise as Ruby. I also find that Scala's type system doesn't often get in my way. Where I have to declare types in Scala is where I should document types in Ruby, and I can also use the type system as a way of defining tests. Scala's design lead to Lift's design. Lift's design lead to abstracting away HTTP in a way that we can do real time, or what appears to be real time, push, in a way that's syntactically and semantically pleasing for the end developer."

My observation here is that the last language I found as interesting as Scala was Python.  They reel you in.

 

Process and individuality are not exclusive

update 2009/08/22: Jason Yip also recommended "Lean Product and Process Development" by Allen C. Ward.

Gadi Amit: "Against that "unreliable" branded-personality design management, multidisciplinary agencies push the notion of large teams and a rigid process. The message of the process crowd is simplistic, "have a few more disciplines in place and we can create the winning product with the right design." Here comes the ethnographer and the strategist and the focus-group studies and the 500-page dissertations, and so on. I have yet to see any hard proof that these large processes yield higher rates of success in design. I have met more than a few large organizations that will not take this any longer. The process method managed to stifle creativity and nourish argumentative myopics while exhausting corporate budgets and personnel. The case of Doug Bowman, Google's just-resigned lead designer and the 41 shades of Blue sounds painfully familiar. As you churn out more creative work, more data-points and more "scientific" validation, your design never gets better. The process method justified large design budgets yet never reliably delivered. It catered to the corporate ladder that is now gone. It required time and the ability to commit resources that we've probably lost for the next decade. "

Steven Keith responds: " However, I cannot see how what you advocate can work, in reality. 

I believe one of those incessant problems with design management is that everyone who cares or seeks methods to address their own challenges is too unique or idiosyncratic to borrow workable insights, processes or anecdotes from all the great thinkers and practitioners out there. I wish I had a dollar for every time I had a client proclaim they're going to do the Sapper or Ives thing. Whatever that is. Is this the consiglieri you speak of?

In the final analysis, I see a gorgeously articulated avalanche of design mgmt ideas, methodologies and articles anchored to edge cases like Apple, Google, IDEO and the like. They're easy to swan over and even fun. But, these edge cases are so disconnected from what will work for most. In fact, I see them doing potentially more harm than good. How many presentations at conferences do you see about really average companies that believe in the promises of design thinking but are struggling because they cannot bridge their "today reality" with the mythological fully integrated design business of tomorrow? There are good reasons why."


I see an ugly fact getting in the way of both these positions and it's called the Toyota Product Development System (TPDS), which marries a strong process and measurement culture with Amit's concept of a consiglieri role. In Toyota that position called the Chief Engineer. To argue either end of the process v creativity spectrum in product design, you have to be able to explain away the Chief Engineer role in Toyota and Toyota's phenomenal market success, which has made them one of the largest companies in the world (10th in the Global Fortune 500). In my sector, when people talk about design, they tend to obsess on Apple. They should also look closely at Toyota, who have a design system than transcends both individual brilliance and process.

I think most software people know about Toyota's methods through Lean Software Development and the Toyota Production System (TPS). TPDS is just as important to understand the product development angles. To get a grasp on the TPDS this book, "The Toyota Product Development System: Integrating People, Process And Technology" is highly recommended as is "Product Development for the Lean Enterprise"