" /> Bill de hÓra: January 2004 Archives

« December 2003 | Main | February 2004 »

January 31, 2004

Objects v Services: move along, nothing to see here

This polarization (object vs services) seems to be getting a lot of amplification in blogland. What is it with computer people and mutual exclusion ;)

Service-orientation doesn't replace object-orientation - I don't see the industry (or Microsoft) abandoning objects as the primary metaphor for building individual programs. I do see the industry (and Microsoft) moving away from objects as the primary metaphor for integrating and coordinating multiple programs that are developed, deployed and versioned independently, especially across host boundaries. Don Box

Don Box sets it straight, but shouldn't have to. For example, where I work, Propylon, that object and service oriented styles are appropriate for different problems is understood and the idea that distributed services are not to be fashioned after distributed objects, but that objects are a good way to implement services is non-controversial. I suspect that a lot of folks in the industry hold this view, despite all the kerfuffle.

The best ways I've found for implementing constructors...

...with lots of Strings in them is to normalize stuff like this:
public class CreateUserUseCase {
    public void createUser(String username, String firstName, String lastName,
boolean canCallerEditEmailAddress, String emailAddress,
String addressLine1,String addressLine2,String addressLine3, 
String addressLine4, String addressLine5, String phoneNumber,
String mobileNumber, ...
into:
public class CreateUserUseCase {
    public void createUser(Name name, List EmailAddresses, Address address, List phoneNumbers,...

The problem with these String driven constructors or create calls is that they grow and grow and grow. They're never done; each addition requires a change to the published interface. You wind up with half a dozen of them and ultimately the interface becomes a mess (it's less common in Java, but for some reason a lot of production VB code I've come across looks this way). Methods like this are often a sign you're moving from the world of transaction scripts to a domain model. Less often, it's a sign that the relational data model you were provided with contained assumptions (and especially one to one relationships) that no longer hold.

If for some reason creating domain objects/beans isn't an option, consider using a Map and keying each String value to a constant (or better, a URI). Although you'll be circumventing the inbuilt type checking of a language like Java, Maps offer better decoupling while scaling and evolving far better at the API level than parametric polymorphism.

REST WARS

Episode IV: A New Hope.

Some of my colleagues and I have also been concerned about the tight coupling between the transport protocol (HTTP) and the MVC-type framework that many implementations such as Struts exhibit today. Ganesh Prasad on TSS

HTTP is not a transport protocol, it's an application protocol. This isn't just a nitpick - such protocols clearly mandate behaviour for client and server applications (hence the name) in a way that transport protocols do not. Let's stop this damaging meme.

When you do it right, there's nothing overly wrong with having your web app tied to HTTP. By design HTTP provides what you need for workflow, actions, representation and state. We can think of HTTP defining a protocol for applications based on a state machine. You move from one state to the next by following a link. The client's job is to request state transitions, the server's job is to react to those to state transitions and return representations of the next state. Layered over this is a fixed set of actions for state jumps (get, post, put, delete, etc) and an extensible header metadata format (content-length, pragma, etc) for understanding the state repesentations.

The problem is that we don't, generally speaking, do it right - we keep trying to treat web apps like desktop apps and keep trying to pretend the network is not there. We don't explictly name each state as URLs, we bury that information in session cookies and behind front controller dispatching mechanisms. We don't declare our actions properly, we casually invent new ones and tunnel everything through form posts. If we insist on abstracting HTTP away for web apps, we end up reinventing HTTP in our applications, which (imo) invariably leads to new frameworks that abstract out common functionality. This abstraction through reinvention seems to be the case with WebWork2 and Java Server Faces; it is already a significant issue with webservices, where a primary purpose of treating HTTP as a transport is to get arbitrary and possibly dangerous invocations through firewalls (aka protocol tunnelling).

But the key insight behind frameworks such as WARS and Mission Control (note: my employer's product) is right - MVC as interpreted by most webapp frameworks is the problem, not the solution. The problem is that as the framework moves further away from the underlying application protocol it ends up reinventing with what's already available in the protocol, except now the reinvention is private to the implementation framework. That's guaranteed to occur when people start treating HTTP or SMTP as being conceptually and architecturally the same as TCP or UDP (which is why I honed in on Ganesh's comment).

HTTP by being a REST-oriented protocol, has most of the architectural properties that N. Alex Rupp wants, so it's good to see him mention the REST thesis as an influence. Where HTTP does fall down (or more accurately, where browsers fall down) is session management. I think the answer to this is to place the session state under a URL space distinct from the web application space the client is accessing. That way both the server and client (or any authorized third party) can refer to it in way that doesn't induce the security problems posed by cookies.

[Update: Stefan Tilkov followed up with some related links ]

January 28, 2004

Faults of Omission (aka the Frame Problem)

I don't know if this is a true story, but it's truly a story I've heard. A new jet fighter was being tested. The test pilot strapped in, turned the key in the ignition (or the equivalent), and flipped the switch to raise the landing gear. The plane wasn't moving, but the avionics software dutifully raised the landing gear. The plane fell down and broke. Brian Marick

That is a Fault of Omission.

Once upon a time there was a robot, name R1 by its creators. Its only task was to fend for itself. One day its designers arranged for it to learn that its spare battery, its precious energy supply, was locked in a room with a time bomb set to go off soon. R1 located the room, and the key to the door, and formulated a plan to rescue its battery. There was a wagon in the room, and the battery was on the wagon, and R1 hypothesized that a certain action which it called PULLOUT(WAGON,ROOM) would result in the battery being removed from the room. Straighaway it acted, and did succeed in getting the battery out of the room before the bomb went off. Unfortunately, however, the bomb was also on the wagon. R1 knew that the bomb was on the wagon in the room, but didn't realize that pulling the wagon would bring the bomb out along with the battery. Poor R1 had missed that obvious implication of its planned act. Daniel Dennett

That is the Frame Problem.

I remember one of the first AI programs I wrote. It was a simple block worlds solver written in Prolog. Blocks world, if you don't know, is a classic AI problem domain. It usually consists of a number of objects like tables, blocks ground and so on. You instruct the program to configure the blocks in a certain way and it it uses it rules and domain knowledge to achieve the goal. The first cut of the program seemed to be going very well, the solver was doing a good job rearranging things - until it tried to put the ground on the table. Bill de hra

That is Somewhere Between The Two.

By the way, if you're into testing, Brian Marick is a must read.

January 27, 2004

Open source leads to outsourcing? Hardly


Open source leads to outsourcing?

It's doubtful that open source is leading to outsourcing and suggesting so seems like flawed reasoning. Post hoc ergo propter hoc comes to mind. In this case it seems (sourceforge) proved to be the best tool for the job (anyway, isn't enabling this kind of distributed communication Collab.net's business model?).

January 26, 2004

TDD: why half your time spent writing tests is a good thing

Jon Udell highlights a new tool that supports Test Driven Development (TDD):

...but when up to half of the output of a full-blown TDD-style project can be test code, we're going to want to find ways to automate and streamline the effort. The art and science of software testing

Very true, and that figure should not be considered negative or problematic - simply a point of leverage (or in Agitar's case, an opportunity). The future of software development is often said to be in code generation and meta-modelling. Yet those technologies have a ways to go to match the productivity gains made by the smart application of automated tests.

One aspect of TDD is that it captures knowledge about the system, executable knowledge, which was previously transient, unavailable or simply thrown away. Without TTD or test-first, this 50% output/effort was probably never captured. It was spent interpreting print lines and on hours poured over the debugger - neither of which result in reusable knowledge. Time spent in a debugger is not recyclable, not repeatable, not resuable by others - a debugger is the tool of last resort.

Without TDD probably less than 20% of your time is in writing code. It's in eyeballing print lines and debugger output, reading an ill-formed spec, being in more and more meetings because things are falling behind, doing everything except adding functionality. Sure, the first month you cranked it and it felt good - we've all been there. But without the continual investment in tests, the ability to keep going and sustain pace falls off - dramatically. After 3 months, the first month doesn't matter - things just averaged out. Chances are that's the best it will get; the pace will continue to fall off thereafter.

I remember someone saying to me once - "The problem with JUnit is you spend half your time writing tests." At the time I didn't have snappy comeback. But half your time spent writing tests - that's not a problem, it's a feature. There's a good chance 30-40% of the rest of your time is spent passing those tests - that is, adding functionality. Depending on how you look at it (and how cavalier we're being with statistics), that's a 50-100% productivity improvement. One compounded by the fact that anyone else can run and read the tests to understand how the system actually behaves, at any time in the future, and by the fact that an evolving test suite along with refactoring continually defers the day when you cannot easily add new features and capabilities, because you fear breaking what's already there. That's design by inertia. We've all been there too.

January 25, 2004

OnlyForward

The schedule for the project is never based on the amount of work to be done. In my experience, it's based on some kind of external factor. We have a competitive need. We have a customer need. We have a corporate announcement. We're going public. There's some sort of market-driven need for whatever this solution is. - Linda Hayes

January 18, 2004

Living in Dryden

I've just been reading Living in Dryden. Fascinating in its detail - it reminded me of Pepys Diary. I hope Simon keeps this up.

Just when you thought CORBA was hard

A list of WebServicesSpecifications recorded by the ASF - hasn't been updated for over two months (so there's probably half a dozen or so missing).

"How did it come to this?"

[warning sign - coldplay]

Extending REST

Mod-pubsub and Mark Baker are pointing to Rohit Khare's dissertation.

Best bits so far - scene setting with analogies to the power and money grids... leases (JXTA and Javaspaces folks take note)... a stunning simile from Jim Gray on memory access... non-bewildering explanation of the hard constraints imposed on processes by latency... fair mutexes for replicated resources (a formalism/extension of WebDAV LOCK)... use of global clocks. So, it's excellent stuff and I'm only up to chapter 7 (consensus, where I suspect it gets really good).

The biggest problem (aside from getting buy-in) in building practical REST solutions on today's web sems to be state management. Yes we know that cookies are bad and CGIs suck, but given the current crop of web clients and the (misguided?) commercial imperatives of site-owners, sometimes it seems inevitable that session state will centralize onto the server. State mangement doesn't seem to be addressed directly by Khare's thesis - its focus is on decentralization and two ways comms. I haven't digested it properly to figure whether the architecture's properties would induce an easier life in the trenches, but I can hope ;)

[coldplay - clocks]

January 17, 2004

JunkLetterQueue: when XML envelopes go wrong

Last year we created an XML mesaging hub for eGovernment here at Propylon. The hub connects a number of government agencies interested in life-events such as births deaths and marriages, by bridging various transport and application protocols (which we dub Channels). The original purpose of the system was as a proof of concept for a larger inter-agency hub, but also as an interim solution to provide sufficient connectivity until the main hub was built out. As is the way with these things, the hub has evolved through a couple of iterations, services have been added as needed, and is ticking away nicely.

There is a standard XML envelope that all parties agree to, we didn't use SOAP for this - it was simpler to define an envelope in plain XML (in much the same way RSS is simpler by not being SOAP). This envelope is independent of the details of any particular life-event. One aspect of the envelope is that it provides an identity for each message sent through the system along with the ability to associate messages as being part of a conversation. As long as the envelope identity set is carried through or referenced into backend systems and business processes, there's a fighting chance that the message can be tracked across arbitrary network and organizational boundaries (auditing, reconcillation and tracking is a tough nut even when you have the luxury of an homogenous network and a single network owner).

One the problems in the protocol bridging scenario is what to do when a malformed XML envelope arrives at your front door. Bridging Channels is at heart asynchronous; internally there may be a number of hops across processes any of which can result in corrupted markup. You can bet that when you get junk XML there will not be chain of a processes blocked happily waiting for you to return control - managing referential integrity and call stacks across networks is too hard (this is why many people won't recommend RPC and distributed OO outside a cluster). You might know the Channel it came in from, but you won't always be able to query the Channel in question and even if you could, what can you ask it? If you can't parse the message to find out who it's from there's no easy way to pull out the minimal information to make the query. And importantly (in our case) there's no way to pull out the identity set to log the audit.

While there has been plenty of handwringing about whether XML is a good carrier format (compared to say, multipart MIME or BEEP frames) I haven't seen much discussion about what to do or how to fail on bad XML. It does happen that markup gets corrupted over a network hop or between two processes, but it does not happen very often and you have to weigh the risk of it happening against the engineering cost of handling it when it does. You also have to take the SLA involved into account - some messages must get there no matter what, some can fall by the wayside.

Given all that, our approach for the hub and messaging endpoints was simple - follow the XML spec and give up. This means avoiding heuristics, avoiding regexen, avoid excess engineering, avoid distributed transactions, avoid cleverness. If the message doesn't parse (or something goes wrong) we:

  • trap the exception
  • log that there's problem
  • dump the message received to disk. This is the alluded-to JunkLetterQueue . We choose disk over a database because it makes minimal assumptions about what's running on the server (one less process to worry about).
  • email someone with the message in the body
  • log you're sending an email to someone
  • exit

For example, in java you might write something like this:

  public void execute( String in ) throws JunkEnvelopeException {
    boolean possibleLogFailure = false;
    Document doc = null;
    String decodedReachEnvelope ="";
    try  {
      doc = DocumentConverter.readString(in);
      // ...
    }
    catch(Exception e)   {
      possibleLogFailure = true;
      throw new JunkEnvelopeException(e);
    }
    finally  {
      if(possibleLogFailure)  {
        String location = "";
        try  {
          Log.EnvValidationLog(LOGNAME + " processing of incoming envelope failed");
          location = writeMessageToDisk(in);
          Log.EnvValidationLog(LOGNAME + " writing envelope to [" + location + "]");
        }
        catch(Exception e)  {
          throw new JunkEnvelopeException(e);
        }
        finally   {
          sendWarningEmail(in, location);
          Log.EnvValidationLog(LOGNAME + " sending warning email to [" + getEmailTo() + "]");
        }
      }
    }
  }

(as an aside: the above is an example of a rare case when catching and acting on an exception is a useful or even optimal option)

If the machines can't parse the XML, there's an option that a person can fire up a text editor and derive enough information to inform the sender that there was a problem with message XXX. Perhaps they can fix up the message and push it through again, perhaps it will be resent, but in this case (citizen data) you don't ask software to judge what's best. I feel this fail-fast approach also applies to SOAP messages travelling across heterogeneous networks and of course to application protocols that use self-describing messages as mandated by REST (such as a combining HTTP+XML/RDF sans cookies). It will also apply in the future when RSS/Atom feeds start to be used beyond their target domains of blog and news feeds and instead for enterprise critical data (well-formedness and appropriate aggregator behaviour for malformed feeds is an ongoing argument in the Atom community).

I thought about using "DeadLetterQueue" for this entry, but that's commonly used for messages that can't be delivered and have expired, which indicates a connectivity, protocol or addressing problem rather than a data integrity problem, hence the moniker JunkLetterQueue. The EIP site makes a similar distinction and calls it InvalidMessageChannel but doesn't discuss it - however in the XML/Webservices world an invalid message is quite different from a malformed or junk one. Most important is remembering this constraint: there is no sound way to process or act on a malformed XML envelope. If you can't parse, don't process.

January 13, 2004

TSS.NET

TheServerSide.NET. I hope there'll be someone else apart from Ted posting. Just kidding ;)

How to make a Martini

Garnish: always prepare the garnish first. A green olive (in brine) is for vodka. A lemon twist is for gin. Don't get this mixed up! For the lemon, use a small sharp knife - you're only after lemon oil, so avoid taking away the white stuff underneath (its very bitter). The ideal size is closer a steri-strip, not a band-aid. Twist the lemon rind over the Gin making sure it sprays across the surface - the trick here is to pull, not push, the lemon gently using your thumb and forefingers and then twist. Olives are easy - simply drop into the vodka on a cocktail stick. Stuffed olives are fine, but in any case, never rinse the olive. Small silverskinned onions (aka a Gibson) will go with vodka or gin. And instead of lemon, the dandy in you might enjoy a bruised rose petal (yellow or pink) with gin.

Naked: one or two drops of Noilly Pratt (Not Cinzano!) into a frozen martini glass. An eye-dropper is just great for this. Swirl quickly and flick the glass before the drops freeze. Add 75mls of frozen vodka or gin. Drink it quickly, no more than 10 minutes.

Stirred: fill the glass of a cocktail shaker 2/3 with ice. One bar teaspoon of Noilly Pratt (did I say Not Cinzano!). Stir quickly (5-10 seconds) are drain. Add 50mls of vodka or gin, stir less quickly. Drain into a cold martini glass. This doesn't need to be drunk quite as quickly thanks to the dilution.

Spirit: A Martini is not a mixed drink, so use a decent alcohol. I like K1, Smirnoff Black, or Stolichnya. Absolut is too harsh even when frozen and is much better in a fruit cocktail. Tanqueray is simply the best gin in existence; they also to make excellent vodka (Sterling), but if you can't get it try Gordons (green bottle), Beefeater, or Bombay Sapphire. Noilly Pratt, beloved of chefs, is the best vermouth for making these drinks.

Tools: these days, we should all have a silver barspoon, a glass/steel cocktail shaker and some sprirt measures At home, an eye-dropper kept in the fridge is ideal for holding the vermouth (atomizers are too fussy).

Drinking: the idea behind many classic drinks is to enjoy the drink while it's cold - and few things are worse than a warm Martini. Martinis are wonderful had early on in the evening or before dinner. They are very much for gulping, not sipping. Only drink a few - Martinis are best enjoyed when sober and don't make for an especially interesting drunkeness. If in doubt keep the Martini small and make more of them - feel free to drop the spirit volumes mentioned to 35ml and 50ml.

Cheers!

[Next: the Gimlet.]

January 11, 2004

Introduction to Java code generation

Jack Herrington has a good introductory article on Java code generation on TSS.

On tools for code generation:
There are some tools you should consider when you are architecting the generator:
  • XSLT [...]
  • XML [...]
  • Jython [...]
  • JSP [...]
  • JavaDoc [...]
On the downsides:
The primary problem is lack of maintenance, and this is the one most people have experienced. You often see the remnants of once-generated code in the source code control system which are now tweaked by hand. The solution for this is to integrate the generator as part of the build process and to never check in the generated code. This keeps the generator in the development lifecycle and ensures its maintenance.
This is important advice, but I don't think Jack's expressed it clearly. I've made the mistake in the past of introducing XDoclet generated EJB code into source control. The mistake isn't obvious until you need to rerun the generator - if the generated code has been modified in the interim you'll be forced to merge changes. And we've all run across javadoc that was committed to source control ("for covenience") which wasn't kept in sync with the code base. The process is far easier to manage if the generation is part of the build process as Jack points out - specifically you want to avoid being tied into requiring an IDE or visual modelling tool to simply compile your code, in favour of being to rebuild automatically (the last time I looked, this style of lock-in seemed to be a near universal problem with visual modelling tools). If the generator can't be scripted over or requires manual intervention (read: mouse clicks), that's not good enough.

On introducing code generators:
You need to think about the generator as another engineer on the team who owns and maintains sections of the code in an active manner.
Understandably, this might make some developers nervous. All I can say is that code generation, used well, systematically boosts productivity by letting you focus on the problem at hand.

The full article is here. And if you haven't read it or are looking for more detail, Jack's book, Code Generation In Action is highly recommended.

January 10, 2004

ScaleFreeModel

This entry, AnemicDomainModel, has caused some amount of fuss. I think Martin's saying that there isn't much point in having an object domain model without interlacing that model with behaviour - I agree with this as long as the behaviour is relevant for the domain. For non-relevant or system level behaviour we have patterns such as Visitor, DTO, FrontController, and of course ServiceLayer. AnemicDomainModel has been interpreted as a slight against SOA and Webservices style integrations, but I don't think the criticism applies or is even meant to (modulo EIP I haven't heard much from the Thoughtworks crew on SOA or Webservices, but am looking forward to it). Objects and Services are ideally working at very different architectural scales. Objects we can characterize as suitable for intra-domain work and services as suitable for inter-domain work. With the commercial state of the art today, nobody should be still sending object references or doing things that require reliable connections across unreliable high-latency networks. Contrariwise, asking POJOs or .NET Assemblies running close by to gateway through HTTP doesn't seem to make much sense either. Martin has also said in the past that we should avoid object distribution for its own (or the vendors) sake, something I agree with. I think the point where you need to think about distribution is also an inflection point for thinking about an alternative model - LAN wide distributions can look at message queuing rather than distributing objects and Internet scale integrations can look at application protocols like HTTP. Here's a guide from 10,000ft, based on your network topology:


  • Standalone, cluster - scripts, pipelines, object models
  • LAN, Intranet - object models, messaging, application protocols
  • WAN, Internet - application protocols, service models

I deliberately didn't number that list because I don't want to imply an order or any level of importance to the models, and I want to make it clear that they are a spectrum which bleed into each other. Breakdown by network topology is pretty arbitrary. Others may see administrative and ownership topology as being more critical - still more may prefer a breakdown based on how we manage application state. And the most confusing area is the LAN/intranet space, where theoretically anything from transaction scripts to objects to messaging to services could be applied. It's the scale where versioning issues become apparent as well as published versus public interfaces - if you are hitting these problems you might be hitting the limitations of your model. To compound things, it happens to be the scale where many, perhaps most of us are working (at Propylon we tend to work with customers at the Internet/WAN end of the scale).

Unless you work at the edges, it requires skill, judgement, luck, even letting go of some prejudices and past learning to determine the appropriate scale to work at - feel free to disagree, but I think there are no ScaleFreeModels.

January 07, 2004

Groovy: the more the merrier

Software Craftsmen: Why Groovy and not JRuby?

I wondered about this too. Tho' I usually wonder why JRuby when we have Jython. ;)

But in the spirit of things, here's some reasons I could imagine:


  • It's invented here. There's a lot to be said for a dynamic language that comes from the J2EE nee Java community rather than from a community outside it that's targeted the JVM rather than the needs of Java programmers. It's less of an admission of anything, and while I've never understood the (considerable) resistance and occasionally derision shown to these languages by some programmers, God knows no-one wants a Python/Lisp/Smalltalker saying I told you so ;)
  • It's a compettitive weapon. We live in a different world to the one J2EE and Java was invented in. We need to use the best language for the job and cut every last scrap of fat out of development. Sometimes the best language for the job is not static. In this market it's good for people brought up on static or C style languages to learn about alternatives through experimentation.
  • It's educational. Every programmer should write at least one little language evaluator or a lexer/parser. Beyond understanding the technology (Groovy is still at a size where you grok the source over a weekend), there's lot to be said for thinking about solving problems with computers in terms of designing new languages.
  • It's fun. Above anything else writing code on your own dime should be fun. A lot of the forward movement and evolution we're seeing in commercial Java development in the last two years has come about through programmers having fun and fooling around with alternate approaches.

January 06, 2004

Ward Cunningham on growing an architecture

I hate it when a new requirement comes in that doesn't fit nicely, as if the program were designed to make the requirement hard. In that case, we have a lot of work to do. But the nature of the work is first changing the program so the new requirement is an easy fit, and then doing the easy work to incorporate the requirement. In other words, instead of patching the new requirement onto an architecture that was not made to accommodate it, just buckle under and do the hard work to change the architecture so the requirement is easy to implement. The patch approach means that the next guy who comes along will have to understand both the system that wasn't made to do the new requirement, and the patch that tried to overcome that system without changing it. It's much better to change the system to accommodate the new feature easily. [via Artima]
.

Javablogs' new layout needs tweaking

Javablogs has altered its layout, but it's not a total improvement. You can now see the first few words of the post (very nice). You can also click through to read a blog post without leaving Javablogs. Unfortunately this doesn't seem bring inlined links forward or up the read counter, even though these are counted as "only the reads from this site". And I'm not sure what the point of not leaving Javablogs is. Perhaps it's lining up for future ad revenue ;-) Half the fun is going onto someone's else's blog looking at other recent posts, following new links - and then going back to Javablogs. Making Javablogs "sticky" feels like a step backwards.

January 04, 2004

Is red wine good or bad for you?

A wonderful essay by Michael Crichton.

[via James Robertson]

Assume we don't have an identity card

Jon Udell comments on Jonathan Swartz's strategic view of java and identity cards:

I'm with you, Jonathan. Now as a longtime advocate of this view, I've gotten plenty of useful pushback. And it's true, there are problems. PCs don't come with card readers. It's unclear how the governments and banks and airlines and other entities who currently issue cards will evolve the identity infrastructures this solution implies, how those infrastructures will cooperate, and how revocation can be managed in a scalable way.

As for DRM and identify management. We have wi-fi, infra-red, Bluetooth, RFID, and USB. We have credit cards. We have more encryption technology than we know what to do with. We know how to distribute tokens. We have insurance models for when things go wrong. Who needs a cardreader? Who needs a card? Call me naive, but it seems all we need for identity and DRM is the collective will to get the technologies into a usable state and see them deployed.

Jon Udell meets metacrap

In the assumptions and bias of a single form, Jon discovers two key problems with structured symbolic metadata and ontologies:

  1. cost of data entry
  2. classification bias

Jon focused on our inability to derive his interests from existing data on the network:

Now clearly I lead a much more public life than most, and I create a much more complete document trail for Google to follow. But is that a difference in degree, or a difference in kind? I suspect the former. And if that's true, then I'm skeptical as to the benefit of a parochial reputation system such as LinkedIn, which requires extra effort to join, to feed with metadata, and to use. If we have (or are rapidly evolving) a global reputation system that can absorb and contextualize our routine communication, then parochial systems will need to deliver huge amounts of extra value.

but in this case I find the issue of bias more telling. It's pretty clear where the classification bias is resulting from:

Jon makes it sounds as if he is stuffed, but really it's the end consumer of the collected data that is stuffed. All those relationships are fiduciary or work based, probably hacked out of some sales/marketing breakdown that make sense for those contexts alone, not for Jon's. The bias is evident as should be the end result - the collated data is virtually useless as basis for making inferences. And if you're not familiar with machine learning or search technology, it might interest you to know that bias is a well understood, mathematically appreciated phenomena in those fields. The immediate problem is that bias and absence of context always results in junk data unless everyone does what Jon did (take a raincheck), rather than just pick an arbitrary one. The overarching problem is that you cannot eliminate such bias, no more than you could eliminate latency from the Internet - it's something you manage explicitly.

No matter how we good we get at this or how popular classification systems become, we'll always need to add some statistical and probabilisitic data in the mix to keep things slack. Any classification over you ultimately should only approach 1 or 0, not be 1 or 0 - these things are not certain. Hand crafted logical ontologies are not sufficient precisely because they want to be certain. They don't drift with your interests over time, they're rigid, they're deterministic, they can only see around so many corners. In short they age badly, and they evolve badly.

Jabber and alternate REST architectures

Give it a REST. Nice piece from Joe Hildebrand on RESTful Jabber.

At the plumbing level, IM, Linda and P2P infrastructure can help realize the SOA integration dream, at least as much as HTTP. Unless that is, HTTP evolves - for example to include events (possible, we know how to), or allowing individuals to run servers (with probability approaching zero I think, but see mod-pubsub for an excellent compromise). Linda-like systems such as Javapaces are a personal favorite and represent a very interesting way to bootstrap workflow or events in a tiered system.

One difficulty is that HTTP is deployed everywhere. Thus altering or extending it is pushing against the mother of installed user bases, which is growing daily - 3 years ago a HTTP integration baseline was novel, next year it will be everyday. Also, there is a practical, rubber-hits-the-road aspect to using HTTP - in my experience integration architectures predicated on HTTP traffic tend to be welcomed by admins, ops and security auditors. Other protocols are liable to run into heavy resistance, and may require significant educational effort (or more plainly, the hard sell), perhaps enough to throw out schedules, and the risk to shipping isn't always justified. In fairness, these folks have to live with what you've built, protocol choices and all - but in these times wield they a peculiar moral authority, notably in the security area.

Nonetheless, I'm something of a REST bigot, and HTTP represents a good baseline choice for integrations spanning administrations and trading partners, especialy now that the webservices RPC view is withering somewhat. Which is not to say that alternate REST application protocols aren't welcome, HTTP can only be suitable for so many things, simply that there's a lot of work to be done to get even one such protocol widely deployed.

The optimal strategy today seems to be to gateway off HTTP onto a message queue, n-tiered or tuplespace, and where it make sense leverage HTTP headers to reduce the negative impact of tunneling one app protocol through another, while ensuring that application level resources are named with URIs and neither they nor the state transitions are hidden behind an MVC or Front Controller blackhole.

While many people want to know where the REST toolkit is at for the server side, more than anything we need to focus on REST clients. HTTP clients as the obvious example, have to become better web agents. Much better. It's absurd, that in 2004, browsers do not support PUT and DELETE, in large part because specs such as HTML, XHTML and XForms have been misguided enough to subset HTTP. This really does hurt anyone's ability to deploy RESTful systems and as a result is costing all concerned a fortune - I'd wager that the much of the architectural inanity, waste, and cost in web systems can be traced back to web clients. Right now, major application APIs in what Tim O'Reilly dubs the web operating system are working off an inadequate subset of HTTP's uniform interface (GET and POST). Client limitations are also I believe bleeding into RSS/Atom specs resulting in constraints. This needs to be driven out, and the same mistake avoided in future clients implementing alternate protocols predicated on REST.

January 02, 2004

Functional wannabe

The "ruby way" has started to rub off on my Java personality, specifically in the realm of iteration. I've used a ruby-inspired specialization of the Command pattern (GoF) three times this past week to simplify iteration.Bob Lee

Ruby way, Python way, Smalltalk way, Lisp way - take your pick :) See also:

For a full treatment of where Bob's (neat) hack ends up.

The amount of Java code I (and everyone else) produces just for looping over collections and arrays is considerable, so any idioms that reduce the verbiage are welcome. And yes, Java IDEs will auto-generate iterator blocks - that's not the point. The point is we didn't need all those iterator blocks sprinkled about to begin with. In other languages we have map()/filter()/reduce() and lambda builtins to do the lifting for us. In Python large amounts of code vanish when map/lambda combinations are used; you just have to be thoughtful in handling exceptions mid-loop. A colleague and I looked at a Python application a while back and we figured on maybe a 15-20% line count reduction if functional idioms were used instead of for loops. In Lisp "for" is very much an alien construct. But it's when you start working with headed lists that the block/closure stuff becomes truly powerful.

Predictions for 2004

10+ predictions for 2004.


  1. Java. Continues to become a platform for little languages while Java the language becomes less important. I got some stick for saying this last year, and it's related to the developer productivity prediction below (tho' I didn't know that at the time). I'm just surprised at how much has happened since then - expect more. JXTA and Javaspaces turn some heads as potential SOA backbones since no-one really knowns how to apply Grid technology in the enterprise ;) JUnit gets forked.
  2. J2EE cuts the fat. The economic pressures that are driving developer productivity are forcing people to find better ways to build out middle tiers and integrate systems. This will force the JCP drive out the fat in the J2EE or risk hurting the relevancy of the J2EE value proposition. We're already seeing the better projects such as Spring, WW, Pico and Hibernate force the issue.
  3. Mono. Mono will be good enough before the year is out. And if Mono turns out to be really good (which will only happen if we see the same kind of OS activity we've seen with Java), it will greatly disrupt the middleware market. A lot of people assume that if Mono succeeeds, Microsoft will kill it - but I don't see how they can do that without committing strategic suicide in the enterprise market.
  4. More startups. But not your fathers statups. Software startups have traditionally been product or "idea" focused. These startups will focus on services/education sectors, using small teams of productive developers building on open source infrastructure and will have sustainable business models while not being instant IPO material. Products offerings where they exist will tend to be disruptive - variants of existing overpriced tools which retain better user experience and support that open-source offerings. The goal of these companies is not to burn up and cash out, but to become the next Jetbrains, Atlassian, ObjectMentor, Core Developer Network or who knows, the next Thoughtworks. The single exception might be RSS reader products.
  5. BSD/Mac eat Linux share on corporate servers . I'm only half-joking about this. Redhat's new business model and the SCO suit are making lots of people twitchy. On the other hand, Novell have a chance to become a real force with SuSe (my favourite distro).
  6. XML. XPath and XQuery will drive out DOM hacking once and for all. Lots of frustration with XML APIs except for pull APIs. RELAX NG goes mainstream.
  7. RDF/Semantic Web. In the breathing space left by lots of recommendations getting nailed down, the penny finally drops, and the community gets behind a sane XML serialization. Behaviour takes a front seat as the focus is on query and rules. That should mean more tools and more useful tools, but I suspect this is a make or break year for the semantic web project - we're more than half a decade in with little to show, and real problems (such as provenance, context and spam) are not being tackled.
  8. Web. Personalized search. Learning search. Weblog categories mixed in with social networks. Distributed/Desktop scale search. This is where the real semantic web action is. Not just becuase it's hard to find and organize stuff, but because of the urgent need to eliminate spam. Spam is about the best thing to happen to applied AI since Japan. All the research spent on AI personalization technology in the mid-nineties might just bear fruit. Email gets serious about statistical/learning filters (this is more of a usability issue; the technology is there).
  9. Death of OO. Just kidding! Though I think it will become less controversial to question the validity of using objects in certain areas particularly as an integration technology. This should help to make discussions in OO circles such as Public versus Published interfaces and object versioning somewhat moot, given the likely answer is to center on document and message contracts [obligatory doffing of the cap: my CTO has been saying this for ages...]
  10. Developer productivity increases. This will be brought about by developers, not tools vendors. One good answer to the question "why should I be kept on when 3 coders in India|China|Georgia can do my job and much more for the same wage?" is to become 5-10 times more productive and invest the surplus time understanding the customer and the domain (rather than some hazy spec). To do that, you certainly need better environments (IDEs like IntelliJ, good version control) and practices (tests, build systems). But, ultimately you need high productivity languages that allow you to concentrate on the problem, not the machine. Concentrating on the machine is a great leveller. These days, losing your job is an ongoing problem for any developer living west of Berlin. The growing interest in dynamic and data-driven languages like Jython, Groovy and X# along with agile practices, even AOP, is a reaction in part to tough times and the commodization and outsourcing of what is increasingly referred to as "basic coding". [I also think we aren't anywhere near ready to commoditize "basic coding", but that's another matter.]

Some things I said last year - judge for yourself!

Things to do in 2004

My personal leanings this year will be around:


  • Search. Studying AI in college, search along with machine learning was my favourite area. I'm currently of the mind that on the web, but especially across one person's data, finding things is still in the stone age. A long time ago, when I was trying to get a search project off the ground someone told me you'd have to insane to take on the search engines...
  • Tuplespaces. A RESTful Linda-like technology would be a fine augment to messaging approaches for anyone working on SOA style systems. Mainly I think they can help defer orchestration hell and provide "just-enough" workflow, but also it's fun technology.
  • Open source. In work, I push to use open source whenever possible and I'm lucky to work for a company that gets open source. It's time I started contributing code.
  • XQuery. I have mixed feelings about XQuery. I dislike the distortions XPath has been put through and I doubt XQuery is as simple as it can be. On the other hand I think being able to query XML documents declaratively rather than write reams of code is a goodness we're missing at the moment. I think it's worth spending some time on.
  • JOnAS. When I moved to Ireland, I was sure JBoss would be a certified J2EE container within twelve-eighteen months (that was over two years ago). And as much as I'm impressed by the calibre of people working on Geronimo, some things about the project remain unconvincing. The ongoing spat between the two is the last thing open source J2EE needed and is hugely disappointing. JOnAS on the other hand is turning out to be an awesome J2EE stack.
  • .NET. The reality from the Java standpoint is that if you're in the integration business, you will have to work with . NET at some point (if you're a .NET guy, invert the platforms). And .NET solutions can represent a compelling option for those that don't need the full power of J2EE, or perhaps where domain models aren't needed. There are on inspection, things to like in C# and the System libraries. I've been looking at Mono recently - in time it will force people to eat their hats. And watch out for IronPython! The question that hangs over the .NET offering is how comfortable you are with a monoculture.
  • Messaging and HTTP. I've gotten through the second implementation of a reliable HTTP delivery protocol I've been working on for a while. The protocol spec is published, but not announced as it needs to go be edited after feedback from the last implementation. I'll be announcing it soon. I also want to revisit messaging APIs for Java - the last time I looked at this was for FIPA style software agents for JSR-87. It turns out that JSR (now sadly moribound) was about two years too early. It started before the SOAP hype hit and just when corporate software R&D was being cut - work on software agents wasn't bottom line, nor did it seem relevant to Sun's suite of webservices APIs since they were (and still are) RPC-centric. Ironically it could turn out the programming needs for modern web services and SOAs are not a whole lot different from those of multi-agent systems (half the work involves replacing the word "agent" with "service"). Today there's a real need for an API that helps developers build fundamentally asynchronous systems we all seem to want. JMS, MDB and JAX* are not that API.