" /> Bill de hÓra: February 2005 Archives

« January 2005 | Main | March 2005 »

February 24, 2005

HTTPLR: reliable message delivery over HTTP

HTTPLR is an application protocol for reliable messaging using HTTP. The protocol specifies reliable delivery for upload and download of messages.

It's a REST style protocol - client-server, no URI or content inspection is required, and each item of interest gets its own URL. The reliability is achieved by adding behavourial constraints to HTTP aware software. Probably the most interesting technical aspects are a) that you can push messages around without requiring two servers and, b) that there is a distinction made between the exchange and the message (each gets their own URL).

In Propylon we've had an earlier revision of the message upload protocol running for over a year now and we're happy with how it's working out. Download is something we're working on currently. Formally speaking it's probably an adequate protocol, and practically speaking, it works. The current version is draft-httplr-00. I think it'll go through a few revisions over the next few months, so if you have any comments or would like to implement the spec, feedback is welcome!

Content trumps Architecture

David Megginson is asking about content in terms of the REST style:

RESTafarians can argue that the lack of content standardization is a good thing, because it leaves the architectural flexible enough to deal with any kind of resource, from an XML file to an image to a video to an HTML page moving the last two using XML-RPC or SOAP can be less than pleasant. On the other hand, the lack of any kind of standard content format makes it hard actually to do anything useful with RESTful resources once youve retrieved them.

Open content has been a social problem, not a technical one. I think the REST style did the right thing under the circumstances in shying away from over-specifying what it calls representations, since there was no chance of obtaining agreement 10 years ago; or 5 years ago; or even last year. I'd speculate that specifying content to the level David is interested in would have hurt adoption as it would be an easy excuse not to use the architecture in question (all aside from the protocols fundamental that data and control are orthogonal).

Today, that punt seems smart. Sjoerd Visscher's comment is an example of this.

You shouldnt use elements from foreign vocabularies like the Dublin Core. There are often subtle semantic differences between the defined meaning of the elements and the way they seem to be applicable in other vocabularies. Those differences tend to reveal themselves only in practice (when the semantics are actually used, still quite rare on the web). Its better to only use your own vocabulary, and then provide a translation (like f.e. XSLT) to the other vocabularies. When the differences show up, you only have to change the translation, not your format.

Now, Sjoerd's a good guy, and there is some truth in this. And while I firmly agree that content transformation is the optimal approach to data integration, it's not hard to see how his observation, that semantic drift can occur through content reuse, could be leveraged as an excuse to maintain more choices that you strictly need. People historically have not agreed on content at the level David is arguing for. In that sense it's not a REST specific problem. To paraphrase Jim Highsmith, content trumps architecture, people trump content, politics trumps people.

This won't always be the case. The plumbing and infrastucture is at the level now where the the obvious bottlenecks will be seem to be content related. Those who are not IT mavens will surely someday run out of patience with RSS v SOAP v XMLRPC v Atom, or with SUO v OWL v Cyc v WSDL. This is what happened to plumbing and infrastructure; it will happen to some extent with content. Don't hold you breath for a perfect language, but do expect content formats to rationalize as some light is shed on the matter.

Another reason to standardize content is to commoditize the very services David mentions - the likes of Google, Amazon, Flickr. Almost all 'Web2.0' services are predicated on data franchises, not the API or platform franchises, as was the case in the 1980s and 1990s. Wanting to make the data liquid and drive value out of Web2.0 businesses to somewhere else will effect content standardization and interoperation.

David mentions some formats and those are worth highlighting:

  • Dublin Core: DC is way useful, but it has the problem of being underspecified and being associated with RDF when too many people thought RDF sucked.
  • xsi:type: I find WXS types overrated for interchange and I tend to agree with the piece David links to by Norm Walsh. In general much the value of XSD is tied up in being able to bind onto programming language type systems; but they mismatch sufficiently badly that the programming languages will probably have to have their types systems coerced to make things effective [1].
  • xlink, xml:id: xlink seems to have languished. I think the W3C was overambitious - instead of focusing on cleaning up href and incementally improving it, there was an overhaul which resulted in ideas like link bases and n-way links, which killed it for practical use. It's too early to say on xml:id, some people seem buzzed about having it in XML, but in the near term I think RSS and Atom linking constructs are much more likely to gain traction.

While RDF got a mention, rdf:type did not. I think rdf:type annotation is more flexible and less brittle over the Internet as it's closer in spirit to Postel's law than xsi:type (not to say xsi:type isn't useful - xsi:type has a very 'middleware' feel to it). A important aspect of RDF is that typing is optional, a hint to processors (and if those are RDF processors they will tend to be robust in that regard).

Overall the content model most likely to become the gold standard in the next few years is Atom, plus a few things that tunnel some extra semantics through, such as InterWiki URLs, a cut-down RDF, or WHAT Web Forms [2].



[1] More likely is that a smaller de-facto set of types that inteoperate will be found due to the efforts of the likes of SOAP Builders

[2] Mark Baker might have something to say about that - RDF Forms

February 23, 2005

Jython new style classes

Good news. Brian Zimmer has announced on jython-dev that he and Samuele are merging the new style classes branch into the trunk:

Samuele and I collaborated today on getting his changes for new-style classes checked into the tip and I just finished committing everything.

The changes are extensive and quite exciting. While not all core types are converted a number have been and over the next couple of days I"ll work on migrating the rest. It"s likely this code will change a bit, especially where the generated code will eventually live, but please give it a try and let us know how you fare.

February 20, 2005

The first testing tool

"I do believe that the structure we have creates an environment where the engineers are very motivated, very happy. They overwork themselves but they do good quality work, they enjoy what they're doing and the end of it we get great code out of that. You should take this back as the first testing tool that we have, or the first quality enhancing tool that we have - make the engineers feel empowered and enabled to build quality software"- Sriram Sankar, Google

February 19, 2005

Two classic hardbacks

I received two hardback books for my birthday a while back.

The first is the fantastic Structure And Interpretation of Computer Programs. I have a softback copy of this, but after 5 years of abuse, it's in tatters. Structure And Interpretation of Computer Programs is the best book on programming ever written, hands down.

sicp-hb.jpg "The evaluator, which determines the meaning of expressions in a programming language, is just another program"
SICP p 360

I have always wanted the hardback, but have been put off before by the cost. Then again this is the kind of book you feel you should have a hardback copy of. Just like you should have a copy of the Knuth's Art of Computer Programming (even though I don't) or the Beatle's White Album (nope). So when given a chance to have it as a present of course I said yes.

For every programmer that adores this book, there's probably five who don't and ten who've never come across it, and if they did, would think it irrelevant to their working lives. I can sympathize with those fifteen programmers having been one of them. All the code in the book is in Scheme. That's a barrier right there because it means most people will have to learn a non-algol language to understand the book. But given the range and depth this book succeeds in covering, it's definitely the case that Scheme is a good choice. Given the first edtion was published in 1985 (I think), it's hard to imagine it being done in any other language. Smalltalk was another powerful language available at the time, but then again, Objects don't appear for 200 pages (in section three).

If you're of the school that a programmer's first language should be assembler or C and their first book should be K&R you might have problems with SICP - programming with register machines (that's CPUs to you and me), the root dialect for the Algol family don't appear for nearly 500 pages. Some people will understandably think that's impractical, but it's not. The nature of the computing profession is that you get your technical world view turned on its head every five years or so, and your investment in toolsets tends to depreciate at a faster rate again. This can be very difficult to endure and is probably a major cause of burnout in the industry. For example if you're trying to make a conceptual leap, say from compiled languages to interpreters or from objects to aspects or from XML to relational data, this book would have helped you be ready. Incidentally SICP also has wonderful section on how time and state affect the nature of programs - that alone makes it relevant to just about anyone programming with servers, middeware or databases.

Today you might be look to write it using Python or Ruby - which would make it more popular but perhaps a tad clumsier. Scheme is a really good teaching language that allows you to not be distracted by details. Then again, Peter Norvig of Google, who recommends SICP highly, has mentioned that he would consider Python for his AI classics over pseudo-code and Lisp.

I honestly don't think I'll even finish the Structure And Interpretation of Computer Programs. I'll read it who knows how many times, but i doubt I'll ever really be done with it. Personally I find SICP tough going sometimes, but it's time well spent.

How To Write Parallel Programs is by David Gelertner and Nicholas Carriero. I picked this up a few years back after getting heavily into Javaspaces and finding out that this book is the keystone for anything to do with Tuplespaces and Linda. I think it's the best book written on parallel programming and is arguably the best book written on distributed computing. But there are two things which make the book remarkable.

sicp-hb.jpg "We can envision parallelism in terms of a program's result, a program's agenda of activities or of an ensemble of specialists that collectively constitute the program"
HTWPP p 14

The first is how well it's written. This book is probably the best written computer book I own. Whenever I think of good technical writing, I think of How To Write Parallel Programs. Some people will claim that this is because the tuplespace is such a beautifully elegant programming model. Perhaps, but I've read tuplespace material that was far from beautiful. What the authors do in HTWPP, is this. They never stray into jargon, everything is explained. Concepts are introduced only when necessary, there's hardly any forward paging ("as we'll see in the next chapter") that plagues technical book writing. There are no wasted words, and no words are left unsaid.

I really do believe this book is close to perfection as a piece of technical writing. I imagine in 1990 when it was first published it would be easy to write something terrifically difficult about parallel programming. Heck, it would be easy to do that today - it is easy to do today. Parallel and distributed computing are not simple fodder.

The second is that it's out of print. I find that incredible. Having How To Write Parallel Programs out of print is bit like having the Ten Books on Architecture out of print [1]. It's a reflection of some sort of malaise in our industry that this book is out of print but you can build your own personal tower of Babel from "in varnum days" books. This is why I decided recently to hold two copies. One I want to be able to give to people to read, and the second, inscribed from my family, is mine, mine, mine.

So if you're interested in either parallel or distributed forms what I would say to you is that this is the best place to start to get a crisp overview before getting bogged down in formalisms and baroque technical English. Even before the Coulouris and Tannenbaum books (which are great). I wish it was the first book I'd read on the matter, because I wouldn't have had to go back and re-read a bunch of others, including the Javaspaces book that got me started. Second hand copies come up on Amazon, my advice is to snap one up.


[1] O'Reilly should run a "Classics" series like Penguin do for novels and plays, but for out of print or half-forgotten computer literature. I think they'd have a long-lived franchise.

Monoblog

I had an odd experience recently. I saw someone's weblog outside my aggregator. I don't think I'd ever looked at this person's website in a browser before. And it wasn't at all what I expected it to look like.

Which made me realize that I don't look at my weblog or anyone else's through anything other than an aggregator these days. Apart from search engines and work related sites, where I use a browser, an aggregator is now my primary web UI - I had barely noticed the move away from a browser until the above site made me realize the shift in habit.

I fired up a browser and went to my own weblog. It did not look good. Overlapping div boxes (I have a lot of mail going back months asking me to fix that - sorry), sickly pale blue and white, maroon quotes. Blech.

So I've changed it to a monotone (I've been watching film noir lately). I still need to tweak the font size. But I think it looks ok - simple and clean. I think a few drops of color here and there in an entry could be used with interesting effect when everything else is monotone.

February 18, 2005

The Fog of Service

The 'classic' monitoring technologies (such as snmp and syslog) are mostly concerned with machine and server level events, and don't support the monitoring of things that are going on at the higher messaging and application levels. Importantly the intepretation of system level events with business level issues is mostly non-existent in IT. On a project last year we used a combination of XMPP, RDF and Atom to provide a monitoring system that informed on multiple layers of the stack. It's proving useful since the system can report on events at abitrary levels and from arbitrary nodes. It's also semantically rudimentary, for example it doesn't interpret collections of low level events as a potential business issue - the implications of such events are left to the people operatoring the system which requires domain knowledge.

It may be that NASA, Fedex, Formula One crews, and the odd supply chain have figured out how to make sense of telemmetry noise, but it's an open issue for Web and Service oriented systems. It would be so much better if we had protocols and formats for this. There do not seem to be WS or Semweb specs targeted at this area (if I missed any, please let me know).

My current thinking is the most useful feature direction for such a monitoring system would be to allow matchers to register a pattern or set of patterns that it should be notified about. Which sounds a lot like a blackboard architecture or a content based non-router.

February 17, 2005

Agile Testing

Via Sean: Agile Testing. Subscribed.

The integrator's dilemma

Mike Champion has a crack at delineating between REST and WS. The question he's responding to is whether Microsoft are ignoring demand for REST tools?.

"The REST approach (as far as I can tell after years of discussion) assumes that all these features are provided at the application level rather than being provided by the infrastructure."

My first reaction was - I didn't know there was a demand for REST tools. My impression of those who are interested in using REST is that they want to be close to wire and to the data and if anything, want to reduce the number of tools in use. REST is a minimalist approach to systems building - when you're using it you're making a decision to do some of the heavy lifting on your own. Or perhaps you realized that you never needed to lift half of what you thought you did. If anything, I would think that selling tools into REST users would be tough work.

"I am convinced that developers with demanding requirements for security, reliability, transactions, etc. generally DO want what WS-* and Indigo promise."

Perhaps they do. Tho' there's not much question that WS-* is in need of rationalizing. Entirely natural in the tech sector, competitive pressures have resulted in duplicated effort and excess choice for technology decision makers, which lessens the value of having standards to begin with. The string 'WS-*' itself indicates a problem. not a solution, for end-users [1]. End users don't benefit by having multiple similar specifications to choose from (implementations are another matter).

Mike goes on the contrast the WS potential with the current state of REST:

"Perhaps there is some way to offer these services within a RESTful toolkit so that application developers don't have to wrestle with them. The obvious answer, however, is that few people seem to have much of a clue how to do this. Those who do, such as the developers at Amazon, have invested fortunes in making it happen."

The position advocated for WS in contrast to REST sounds like this - yes, that's nice for simple things, but WS are for serious work. They exist to service those who need systems where all the 'ilities are turned up to 10.

But it's worth considerign that the sheer cost of solutions for bodies such as Amazon may simply reflect the massive scale of their operations (Werner Vogels, Amazon CTO has remarked that Amazon's scale provides unique challenges for technology and infrastructure). There's no compelling evidence that WS based technology would generally be cheaper or that REST solutions are generally not cost-effective.

Nonetheless, that WS can be positioned primarily for those companies whose requirements are effectively off the bell curve is interesting. The foundation technology of WS is of course SOAP. SOAP is no longer an acronym, but the S used to stand for Simple. The implication is that as of 2005, the simple stuff can be served by inferior technology and WS has progressed in complexity to serve a more rarified market. The rest of this entry runs with that idea.

Points of disruption

We can see the market looking at 3 things to provide value in software solutions:

  1. Simple protocols and formats
  2. Open source infrastructure
  3. Agile methodologies

All of these are disruptive.

The first is what we're talking about here - formats and protocols that are sufficient for most users. The other two amplify the first and are worthy of a brief mention.

Open source as disruptive technology to traditional per seat, per CPU, licensing models has been beaten to death by Tim O'Reilly among many others - today everyone selling products into the integration and middleware space has to factor OSS into their plans.

The reason agile methods are disruptive is because software methodologies are closely aligned to business and procurement models. How you go about delivery has a first order effect on how you get paid, and how much. Agile approaches also favour what Scott Ambler calls 'generalizing specialists', rather than actual specialists, something that runs counter to much of industry's organisational and training practices. Notably, the Standish group is still telling us that 65% of projects are failing in way or another. In that context, providing a process that get the figure down to even 40% and say, halves the post-live defect count and you have in your hands a disruptive business model.

A services dilemma

Using the Christensen playbook, we should consider whether WS specs and tools are being commoditized by disruptive technologies. If so the natural progression for them and the software based on them is to move up the value chain. When the mass market for enterprise technology is either over-served by technology, quality or the price is too high that leaves an opportunity for disruptive technology to take hold at the bottom end of the market.

I worry this may put out some people who feel that WS are a disruptive technology, but when you see RSS usage figures doubling twice a year, you stop and think about what's really disrupting what. But, REST as it appears in the HTTP Web seems to have a broader potential set of uses than was originally imagined by WS proponents, especially in conjunction with simple formats like RSS or Atom.

In the case of WS, it's possible that the broader market is over-served. Statistically speaking, relatively few organizations needs all that serious stuff, all those extreme 'ilites. What most organizations need is cost-effective, stable, flexible software with reasonable TCO. So if the number of people who really need 'serious' high-end systems is relatively small compared to those who can't afford or don't need all the technology, what we can expect is that the market will be served by inferior technology at margins the providers of serious tech will be typically unwilling to meet. The suggestion is that those providing the disruptive technology will enter a growth market.

Naturally, some people will object to classing the Web, REST, RSS et al as inferior technology. The point is that this stuff is good enough for a growth market. And following we should expect that good enough tech will improve and those supplying it will move up the value chain, while the technology itself probably gets cheaper.

What's driving WS?

Should we accept that people are either being supplied excessive technology requirements and architectures through WS or simply can't/won't afford to procure in the first place? I think we should, but there's nothing sinister here, rather it's a quite simple outcome of enough 'what if' questions. What if the database server goes down? What if the transaction fails? What if the network goes down? What if we have 1M users online? What if someone tampers with the data? What if we get slashdotted? It's all about listening to your customers and taking care of them . It should make sense to do this. This is a good thought process to go through.

The problem is that the concerns are often answered without taking cost into consideration, without anything that looks like risk assessment, without a thorough examination of the requirements.

The burn rate for such things fits the classic J curve - a modicum of improvement has a disproportionate cost the rate of which increases as improvements are added. Software people don't always consider the J-curve in answering what if questions - where they do, answering them by climbing to the summit of the curve is not ideal and is something that currently separates software from engineering. Engineers will establish ratios and determine fitness for purpose. Engineers understand the effect, necessity and relative cost of part tolerances. They know the properties of their materials and structures under a variety of environmental conditions. They have a good understanding of solution cost. They know they might get sued. Engineers even have a physics. Yes, engineers do get it wrong, but generally, the software industry isn't strong on this.

Good enough opportunities

Finally, REST advocates like to paint an appealing picture of a Web governed by open standards that mesh well with one another and over which XML documents conforming to industry-standard schemas with well-documented semantics are transferred. Few who actually work down in the trenches suffer from this delusion for long. Even a dirt-simple idea such as the XML format RSS becomes an interoperability mess in practice.

This is easy to agree with. What is less easy to understand is how fifty plus specifications from the WS-* canon will improve the situation.

How are complicated WS specs going to help with interoperability a way the dirt simple stuff does not? There doesn't seem to be anything special about REST that would make it prone to interop problems over WS specifications. The interop story on WS has not been spectacular, but is getting better [2]. I'm neutral on the relative merits of their interoperability and much more interested in how much interopability is actually required case by case. What seems to be special about REST from this point of view is that it's fit for purpose and cheap by comparison, once you establish that you aren't served best with the complexity and capabilities implied by WS.

Once again, it appears that REST is a fine idea for simple scenarios where bad stuff can happen with impunity and corner cases can be disposed of arbitrarily. Building secure and reliable applications for situations in which serious money or human lives are at stake is another matter.

I don't think anyone should read that and draw the conclusion that REST is an irresponsible technology and WS is not.

update: Mike Champion had a comment about this that's worth highlighting:

"...My point, which I think you agreed with at the beginning, is that the WS-* stuff is trying provide standardized interfaces to encryption, signatures, authentication, authorization, etc. services in the infrastructure. AFAIK "REST" puts this responsibility on the application developer. In reality, security has to be a consideration at every level, and I certainly didn't want to imply that WS-* provides any kind of magical protection or that REST is intrinsically insecure."

Which is fair - the quote I took from Mike could be misunderstood out of context.


What's really worth bearing in mind is that serious money is relative, reliability is relative, security is relative. In one sense, the polarization (you're secure or you're not, it's reliable or it's not, it's serious or it's not) and the drive toward higher levels of performance and complexity are entirely rational. As Christensen has pointed out, it's what happens when you listen to your best customers and try to maximise existing profit centres. This attentiveness inevitably forces WS vendors up the value chain and out of areas where they believe it's not worthwhile selling into, leaving a gap for simple technology like REST to step into. It means when you take into account the J-curve, there is a risk of abandoning a growth market by thinking interms of absolutes and overlooking the notion of fit for purpose.


Conclusion

If we accept that WS based technology is being disrupted at lower and mid ends of the market then it becomes clear that there is no absolute technology winner between REST and WS [3]. What to choose becomes a question of establishing your relative requirements. However we can expect that vendors servicing the low end will move up the value chain over the coming years and as a result claw away at WS based revenue - Christensen's model does not predict a steady state of affairs.

If dirt simple equates to good growth and better profits, then the missed opportunity arises when simple and simplistic are conflated. When the WS contingent are looking at the REST and syndication crowd and saying more or less, 'here's a nickel kid', they may want to stop and take a second look at what the kid is doing with that nickel.



[1] I should qualify myself here on the state of WS. There's no doubt that some WS technology will prove useful - for example a standard RM protocol will be immensely valuable whenever the vendors decide to coalesce. SOAP will clearly remain useful for some cases.

[2] The SOAP interop story is getting better, but it's not all there yet. The people I look to for level-headed guidance in this space are Steve Loughran and Sam Ruby - when they say things are good it's an opportunity to re-evaluate. Otherwise I see know reason why you couldn't evolve towards the best WS has to offer as and when the features are needed.

[3] This does not mean the public debate is reduced to a tourniquet fest, only that we need to take a focused look at what the customer actually needs.

February 13, 2005

Generating feed identifiers can be tricky

There is a bug in Roller whereby the guid of the RSS entry changes if the date changes. It seems to be affecting Javablogs at the moment.

Debates on the means and structure of feed guids have taken up a lot of time on the Atom WG (we call them ids over there); debate has occasionally been heated.

Roller's generator is a good example of what not to do, if the goal is to create a stable identifier. The issue here is that Roller is synthesizing ids from feed data (in this case, a date), and the data is mutable. Unless the generator is one-shot, then the id will change each time the data changes, which is undesirable. Even if the generator is one-shot, you will be left with an id that is dissonant with the data, unless the id obfuscated by something like a hash. The temptation not to use a hash is understandable since the Roller guid is in the form of a URL. However, the key problem is that the Roller id not so much an id, as it is a signature or digest.

Someone who's been tasked with generating globally stable identifiers might frown on the Roller code, but mixing up identifiers with signatures is an easy mistake to make in a web context - there are seemingly contradictory aspects to consider. I also think specifically the case of using a date in stable URL identifiers and then recomputing the id is an easy mistake to make - URL fragments of the form YYYY/MM are a popular Cool URI technique for everything from versioned namespaces, to W3C specifications, to blogs. Using them as source material for ids is understandable.

This also highlights a usability issue with using URLs as identifiers. Cool URIs, according to W3C doctrine, don't change, but Cool URLs are also meant to be comprehensible to human beings. Roller guids meet the latter criteria but not the former. URLs double up as locators (addresses) as well as identifiers. In RSS2.0 this is achieved on the guid element by the 'isPermaLink' attribute, which is telling you the guid can be used as an address (making guid the moral equivalent of a URL).

So, what's the answer? In Roller's case, the first thing to do is decouple id generation from mutable data like dates so as to produce a time-stable identifier. The downside is that this is probably not going to look like a 'Cool URI'.

By the way the current Atom spec (draft-05) text on identity constructs has this to say about stability:

When an Atom document is relocated, migrated, syndicated, republished, exported or imported, the content of its Identity construct MUST NOT change. Put another way, an Identity construct pertains to all instantiations of a particular Atom entry or feed; revisions retain the same content in their Identity constructs.

We hope that's enough to guide developers away from the pitfalls. The problem with being more specific, ie saying "the content of its Identity construct MUST NOT be computed or be sourced from mutable data items" is that spec writers tend to not want to base their specs on what are considered 'implementation details' rather than 'architectural constraints' - altho' implementation details matter a lot in this particular case.

February 12, 2005

Programmers' block

Here's what I don't always like about high level languages: there is nothing to do except solve the problem. You can't fool about with braces, create abstract classes, factories and interfaces, move some methods about, write redundant code, write yet another plugin manager, yet another fast file reader, yet another configuration language, or goof around with the build file for the nth time this week. There is nothing for your conscious self to do except solve the damn problem.

Sometimes it's great to have nothing but a problem to deal with. And it's clearly a good thing to be focused on the task at hand. In fact, the aspects mentioned above are exactly what many programmers come to loath about some programming languages. They are the very things that drive them to find more elegant and concise forms of expression. Why then would anyone want to keep a hold of that stuff?


I think part of the answer can be found in programmers' block. Programmers' block is like writers' block, but instead of staring at a blank page you're staring at a blank screen with a blinking cursor:


                 >>> |
                      Programmers' block in Python

Now, I'm not saying using a high-level language results in programmers' block, just that high-level language will make it evident you don't know what to do next. Sure, it happnes to everyone, but having nothing but you, the problem and your inability to solve it now, can be uncomfortable. Even stressful - many programmers' self-esteem is tied up with their problem solving ability and intelligence. Languages with inbuilt distractions give you somewhere momentary to hide from the problem.


                public class Main
                {
                  public static void main(String args[])
                  {
                  }
                } 
                      Programmers' block in Java

About the most you can do with a high level language is change the syntax coloring, which is much too obvious a wastefulness to be beneficial. Continued blatant wasteful activity, such recolouring code, should result in guilt and low self-esteem, emotions I suspect will damage one's ability to solve problems, whatever the language.


One view on this is to think a bit of inefficiency, some distraction and slight verbosity could be a good thing in a programming language. Being allowed to fidget and zone in and out of a program might be valuable because it affords a way for your conscious self to spin its wheels without feeling bad, while your unconscious self gets busy in the background doing whatever it is unconscious selves do to provide us with insight. Perhaps having some shiny toys around for the conscious self to play with is important for working on software problems. Some slack in a language could be healthy.

Another reason to want to avoid programmers' block is that it may not be acceptable to be not writing code, as you must be seen to be productive by your peers and your managers. There are after all no accepted objective measures for programmer productivity, so we often resort to instinctive (but unjustified) measures such as counting lines of code of function points, or simply watching overtime [1]. In some companies extended periods of non-production of software may be acceptable; this is notably so working in highly technical environments, where it's accepted that the programming is going to be difficult or where the founders were from an academic background, where punctuated silences are not unusual. In more business and service-oriented companies, and even startups, cranking out code whatever the form might be an implicit requirement. This is not so much a matter of right and wrong as it is a matter of prevailing cultures. But if someone is working in a culture that does not understand or accept "dry periods", it may be perceived as analysis-paralysis, ineffectiveness, or worse, laziness and incompetence.

The difficulty comes when there is so much inefficiency, endless distraction and fatuous verbosity in a programming language that you'll never complete in a reasonable amount of time. This starts to rear its head when building larger systems (collections of problems). Shiny toys become attractive nuisances, and to fool about with the braces, create abstract classes, factories and interfaces, move some methods about, write redundant code, write yet another plugin manager, yet another fast file reader, yet another configuration language, or goof around with the build file for the nth time this week, becomes an end in itself, not a means.


If programmers' block is real, and clinging onto some problematic aspects of a programming language is a means to cope with it, then it's arguably a high risk means. Are there better ways to keep our conscious selves occupied so we can use an alternative high level language without fear? Or better ways to use a less productive language when we get stuck?

I think there are. Refactoring is one way. Refactoring is a technique to manipulate the structure of source code without changing its behaviour. It's best considered as a means of managing technical debt in software. Ironically refactoring started with the HLL Smalltalk, but has truly blossomed in languages like C#, and especially Java.

While refactoring is important for codebases that are expected to provide a product dynasty rather than a legacy mess, it's often devalued as mere tinkering by those removed from source code, as it doesn't provide new functionality - it can be very difficult to explain to a non-expert why not adding, or even delaying features now, will help with delivery of features later. The idea of tinkering should be a valuable clue. Rather than add cruft and needless indirection to code, refactoring can be actively used to refine the structure while providing a useful distraction. If we're going to be need be zone out from time to time, we may as well do something useful.

Another way is to write tests as you develop, in particular to use the act of passing tests to drive development of code. It's often easier to stay productive when you have something concrete to aim at, which is exactly what a test provides.

The presence of tests also change the nature of problem solving. I think one of the reasons we get blocked is that we tend to think of problems in terms of solutions which are right and wrong. This attitude is deeply ingrained in computing, and probably comes from mathematics and logic where solutions do tend to be right or wrong, rather than having relative levels of expediency and usefulness. It's so powerful that companies like Microsoft and Google famously use problems with no right solution to test candidates during interviews, as much for their ability to deal with stress and hazing that results from "no right answer" as their creativity. What a test often does is let us think of programming as a series of moves along the lines of a game. Moves in a game aren't right or wrong, they're better or worse [2]. That can be enough to snap out of a blocked consciousness.




[1] Martin Fowler has more than once suggested that we may never be able to measure programmer productivity.

[2] This is paraphrasing Alistair Cockburn, the agile software guru. Beyond testing, Cockburn has been establishing a compelling theory to explain why software projects succeed or fail based on co-operative games.

ACM Queue adverts - how not to do it

I've commented on Gregory Wilson's article on extensible programming before - it's a great read.

So I find out that it's up on ACM Queue, and stopped reading after page 2. Why? Well I got distracted enough by the advert inserted into the middle of the page, I gave up reading. ACM Queue is about the only arm of the ACM I find relevant any more, the content is usually good, but this isn't helping any. The ads are even pasted into the print-friendly version.

This is not how to do online advertising. The danger of the approach here is people will do one or more of the following:

  • Give up - it's too much work to read around the ads
  • Sense that ad revenue matters more than my comfort
  • Sense that ad revenue matters more than content
  • Worst case - unconsciously associate the ad vendor and Gregory Wilson with being annoyed

I'm sure the folks at ACM Queue don't intend for any of the above; they care enough to create quality content after all, but the psychology of association is quite powerful.

Other publications manage to get around this - for example ITWorld and O'Reilly allow large ads to jut into articles, but the text is always allowed to flow around the ad - intrusive but much better, because my eye is give a path to follow by the page flow. In Queue the text is amputated by the ad.

If there's an analogy to the level of intrusion and annoyance on television it's the kind of adverts your see painted onto or super-imposed onto the pitch for Rugby Internationals. What's bad about those? Isn't it clever they way they can get the ads to cue visually and stand off the pitch dimension and apear flat on the screen? We'll the problem is the illusion of the ad being put into the dimension of the screen rather than the game is dissonant and potentially irritating. It leaves more work for the brain to do to swap back and forth between the ad and pitch. It's not unlike trying to watch a game through a Necker cube. The ad may well be impossible to ignore, but at the risk of negative association.

Although I know there are some bloggers who worry about this a lot, ads interpersed with content are fine by me - done right. We buy a lot of stuff, anf I suspect that - done right - ad/content mixins could be as valauble as Amazon reviews. The thing to do is make sure the content can be absorbed with minimum inconvenience. Putting an ad out to the side of article where page flow is maintained is fine, putting a brand on a rugby jersey is fine - they're probably all the more more effective because I'm not going to unconsciously associate them with anything negative or irritating.

[update: some folks were kind enough in comments to point out that Adblock stops the ad in question appering.]

February 11, 2005

Jython and Perl benchmark

Benchmarks are always controversial, but Nuno Leitao (nice website) experiences Jython to be about 20% slower than Perl.

February 07, 2005

JOnAS receives J2EE 1.4 certification

This is great news: JOnAS Completes J2EE 1.4 Certification. JOnAS is a good container with a very good JMS provider in Joram.

Perhaps now it'll start getting the mainstream attention it deserves.

Jython gets a Wiki

Jython has a wiki: JythonWiki. Brian Zimmer Sean McGrath and I have gotten together to set up MoinMoin in the jython.org webspace and seed it with a few pages.

JythonWiki, and community bootstrapping

  • There's an RSS feed of RecentChanges you can subscribe to. I've found a feed invaluable for keeping up to speed on the Atom wiki.
  • It's been seeded with content, including a RoadMap, DeveloperGuide and a copy of Brian's MovingJythonForward.
  • For those of you running your own wiki, the InterWiki name is JythonWiki.

I think there's enough info to get a sense where Jython is going this year. The main thing of course is to get the word out that Jython is going somewhere. Wikis, done right, are a great tool for developing community, much more so that mailing lists, which tend to be task/argument based medium.

You'll have to log in and send a mail requesting write access. If you know how Wikis work, you'll know that sucks horribly, but it's what we've we done to deal with link spam, which is a big problem for a lot of wikis today.

Jython.org, and a new logo

Jython.org will be getting a facelift as soon as possible. If you have anything you want to see on the site, add it here: JythonOrgRedesign.

And if you have a talent for graphics, Jython is looking for a new logo. Brian will be announcing details on how that's going to work, but it will probably be run as a submit-and-vote by the community. When that starts, I'll blog it here.

Last of all, if enough people wanted it, I'll set up a PlanetJython.

The code

So, websites and logos and shiny doodads are all well and good, but what about the code?

There is a branch merge coming that will upgrade the Jython trunk to new style classes. That will afford the ability to reach parity with Python 2.3 and 2.4. It's a significant, non-trivial change to Jython and most of us are happy to stay out of the way while Samuele Pedroni gets it finished (Samuele is the co-author with Noel Rappin of Jython Essentials and the closest thing Jython currently has to a BDFL ;)

After that, I think you can expect to see the tree opened up, on a merit basis. Brian Zimmer is pretty focused on building a strong developer community around Jython.

In the meantime, one of the main jobs is to go through the buglist and patch manager on sourceforge, start prioritizing bugs and creating test cases.

And for those of you that don't like fixing bugs (really?), Brian has created a list of AbsentModules in Jython. If anyone wants to start working on those, that would be tres cool. Contact the jython-dev list.

Further down the line there's been some talk of reorganizing the source tree, which is best described as 'pre-Ant'. Doing that now is only going to get in the way of the new style classes work, but there's a clear desire to make the tree more idiomatic post-merge (and keep the CVS history). In any case, those makefiles are going to go (which means if you build Jython with Make and this is going to put you out, get over to jython-dev and let us know!).

Finally, there's been talk recently on the jython-dev list about moving to Subversion. Were that to happen, it would mean moving off Sourceforge, a move that has pros and cons, and whichever way you look at it would be disruptive. The active committers haven't indicated what they want to do (there are other things to be focusing on right now), but it's sure to come up again.