" /> Bill de hÓra: May 2004 Archives

« April 2004 | Main | June 2004 »

May 28, 2004

Migrating to Subversion IV

I'm coming to the end of a 1 year experiment with Subversion; initially it was meant be a six month test where I would move some of my own projects onto Subversion, and run with it. That got extended to 12 months, which sounds like a long trial time, but picking the right version control tool is deeply important to me. Version control is life support for programming - a year is worth it.

I believed a year ago that CVS would die off; it would be supported and widely deployed, but no significant development would ever take place. The problems I had would never be solved unless I fixed them myself. Nonetheless, I would come under the banner of CVS advocate. I've invested a lot of time understanding it, explaining it, and developing processes around it. The CVS blockers I've run into again and again with folks are:

  1. Client tools suck
  2. Lack of support for anything to do with directories.
  3. The merging model.
  4. Commandline phobia.

Let's go through each of these, with something of a Java slant:

Client tools suck. This used to be a reasonable argument. Now IDEA and Eclipse blow it away and have done so for years. Their support for CVS is stellar. Eclipse remains intalled of my laptop purely for its CVS support. WinCVS is not a bad tool, it's just not great (it used to be terrible before 1.12...). A lot of people sing TortoiseCVS' praises - on Win2K it slowed my machine down and Windows Explorer crashed constantly - on XP it's fine. Lack of CVS tools are not a reason to migrate away from CVS.

Lack of support for anything to do with directories . This is the one of the first complaints Java programmers will have with CVS. No argument, CVS has zero support for folders. And it's never going to, imo. I think what happens here is you get used to it - when I first moved to CVS, it took nearly a month to get oriented to its file-centricity. Still that's no excuse; CVS sucks here.

The merging model. Ah, well. All I'll say about this are two things. You always merge, even with a locking system. Reliance on locking is masking an unhealthy non-viable software process. CVS does the right thing here, but its tagging and branching model is so obtuse and confusing you might never notice.

Commandline phobia. I'm coming round to this view. If you don't want to have to use a command line you shouldn't have to. Plus given CVS' per file checkin model, you invariably end moving up and down the directory tree. This is ok for C, no good for Java which tends to have deep tree structures.

I choose Subversion as I felt could adress all the issues above, while not incurring expense. Technically it supports atomic commits, directory based versioning and its web interface lends itself to tool support. It even solves yoyo on the command line.

So, how did it go? In June I will write a full review of Subversion. But in short: Subversion is an excellent tool and I'm actively moving all projects off CVS.

A year ago, I said:

it's just a matter of time before one of the main sourcecode hosts support svn repositories along with CVS.

Despite being something of a content free prediction, that hasn't happened yet. But - it seems the ASF Geronimo J2EE project will migrate to Subversion alongside its move out of the Apache incubator. I expect this will bootstrap Subversion usage within the OS java community and result CVS use withering. And when a concerted effort is made to integrate Subversion with Eclipse and IDEA it will be game over for CVS; Subversion addresses too many issues Java developers have for it to remain relevant. Given adoption in the OS community, enterprises will follow suit with a ~2 year lag. That tees up Subversion for mass adoption around 2007.

May 27, 2004

ActiveMQ: RESTful JMS, sort of

Update: ActiveMQ's CVS HEAD now supports HTTP DELETE for consuming of messages (which has no timeout, it returns immediately). POST will return a globally unique URI for each message to enable reliable messaging/avoid duplicates. To which I say...

Open Source response times... Nice.

...Nice.


Is this implementation of queueing a good example of the ReST approach? - Patrick Logan

As a first cut the ActiveMQ guys have done a fair job; there is more they could do. In fairness, this is not so straighforward to get right, since modelling containers with URIs can be tricky. It's cool they're taking a principled approach.

In particular I am wondering about the use of GET to dequeue an item

If that's what they're doing, it's probably a bad idea (but I need to see the code to be sure). GET is for peeking rather than popping. Use POST/DELETE to dequeue. Otherwise if there is a cache between you and the origin server, you may get unexpected, silent, bugs when using GET insofar as a cache will interfere with expected JMS semantics. In the infrastructures I've worked with, this would cause issues.

It helps enormously if each entity being sent into a queue is supplied its own URI- you do this by returning a URI in the Location header. We've used this in Propylon to manage reliable delivery over HTTP; it's very handy when it comes to tracking messages, browsing queues and building reconcilliation reports. This gets much trickier if the client has to have intimate knowledge of the URI structure- computing them instead of dealing with them as opaque strings ups the odds that what I throw together will be ActiveMQ specific (this would be a bit like having a local version of JMX). Aside from that, URI per sent item can scale by suppyling a naturally distributed data structure; in theory it's better to allow random access to items than demanding all clients synchronize access via the controller (queue/topic) URI.

Update: James pointed out that this feature is for streamlets not mq, which is a different matter. I would lose the timeout feature. If there are no messages simply return that information. Holding a reference via keep-alive is a hack to emulate API callbacks - don't go there ;)

Finally, I wonder what they're doing around duplicate deliveries. The odds of that happening are much higher over HTTP than an inproc API call. I'll have to checkout the code to learn more - knowing Bob and James this will be a Maven build...


[jay z: lucifer]

Core Software Processes

  • Programming
  • Building
  • Designing
  • Testing
  • Delivering
  • Versioning
  • Changing
  • Automating

Many folks focus only on the first two, usually because their working envionment or software methodology dictates it. The third, designing, is often outsourced to a specialist role (the architect). XP, TDD and other agile appoaches have helped us remember the value of 4, testing. Delivery, versioning and change - not everyone will think these are core practices - often they seem "happen" to us as a result of working on a project over a long enough time. Automation encompasses the special definition of "laziness" programmers develop.

Here's the point. The processes are not something that should be serialized in time order. They should as much as possible be done continuously. RUP folks should agree with this in principle, with the caveat that at certain times we're emphasizing a particular process more than others. Exclusive focus however and we're back to the descredited waterfall model of development where processes are entirely serial ("let's stop for a week to clean the code up", "let's not write any code until we design the architecture", "let's not do any testing until we deliver the code"). Don't go there. The RUP encourages us to iterate the processes, agile methods want to speed this up to interleave the processes.

There's no point just talking about processes - these are things that we do. We might start out being weak at some of them, but that's ok, and it's not a reason not to get started. We get good at a process by practice. The purpose of constant practice is to push the processes out of consciousness and into second nature. So we're freed up to act on the problem at hand in the most effective way.

[tom petty: freefallin']

Use Cases and Service Cases

Most software specialists, including business analysts, are not aware of how to drive software creation from a process perspective. They believe that software is driven from the Use-Case, not the Business-Case. This will require a change in methodology. - Jeff Schneider

This is similar to the difference between an object oriented model and a service-oriented one. Jeff is right; big change is coming to software methodology.

[justin timberlake: cry me a river]

May 25, 2004

The Service Garden

The more technology we have installed, the harder it is to change your business. [...] Now, customers are looking for simplicity, integration and security across releases. They want standards-based software that doesn't require the labor expenditure of the past. Software CEOs have two choices: They can try to impose their proprietary methods on the market or they can adopt a new service-based approach to providing and maintaining software. - Ray Lane

I just came across this piece from Ray Lane. In it he talks about renovation and innovation in software, how renovation is becoming very important to businesses. In Propylon we're extremely focused on avoiding rip and replace, but industry wide there are issues that need more consideration. It seems that for now, rip and replace is moving out of software systems and into business and software process models.

  1. (Re)Engineering. It's hard to see how to make enterprise systems adaptive and flexible without a corresponding effort in engineering behind and between the service boundaries. By engineering I mean whatever is needed to keep software soft, something that isn't going to be derived from the architecture alone. Responsiveness to business change can't in the long run mean cranking out ever more code or throwing away what's already there - that's a losing approach. But that suggests a consequent investment in engineering things to be changeable that way in the first place.
  2. Maintenance. To some extent, renovation is maintenance in drag. But suppose you have an exposed service running 24x7. Chances are if it's been up for any length of time it's become more or less critical to its callers. This has all kinds of process and management implications - like on the fly upgrades of running systems. Or how about versioning? Scaling up? What about scaling down? Hardware and infrastructure provisioning? I don't see the classic dev to staging to live cycle being sufficient - your live callers won't be on the staging platform. The sane approach seems to consist of gradually ramped rollouts and beta programs.
  3. Metaphors. If the industry is going down this road, the old building construction and factory metaphors as we have taken them might become redundant - worse they might become problematic. An out of control service environment is more like bindweed than a tenement block; and having a service track the business is more like pruning a rose bush than laying down a patio.
  4. Standards and specifications. Attempting to do any of this on internet scales has resulted in a generation of web services specifications written before the issues were fully understood or even acknowledged - ultimately those specs were derived from what was known about distributed middleware not a web of services. Service oriented, as people are learning, is not middleware writ large.
  5. Business models. There's no end of discussion around software licensing with respect to customers and especially with respect to open source. There's somewhat less around services and system delivery. Renovation and system reuse (exposure by service) imply different commercial arrangements to the models we're used to. This goes beyond software process and suggests a commercial model that is not found in either T&M or Fixed Price contracts.

May 23, 2004

Ugly

Stefan Tilkov wonders why so many companies fail at product design. It's much the same reason most software companies fail at usability - they don't try. Good design is sufficiently intangible as to be regarded as surplus to selling something. It's not obvious to many people how good design contributes to the bottom line.

Most of the expenditure that could possibly go into design is usually channeled into marketing. Marketing has two distinct advantages over design:

  • A manipulation of statistics that is second to none. Stats and charts are the crack cocaine of business people the world over.
  • An inherently cynical view of people (most designers are Panglossian in comparison).

One designer with a good retort to the charms of marketing is Lord Conran. He has always taken the view that most people don't really know what they want until you show it to them, which neatly invalidates 90% of all marketing. One way to improve matters is to decide to buy and use beautiful things. Most people don't and our markets are not oriented to value non-disposables. But to some degree I think it's true that most people do not look for some measure of quality, they look for some measure of value.

That's impossible

Tim Bray gets through a rant on hi-tech industrial design without mentioning Sony, Apple or Frog Design. Speaking as an ex-industrial designer, that's an impossible achievement.

But that netgear box - it should have sardines in it.

[beastie boys: root down]

Thus sprach metadata

Seairth Jacobs gets it. RDF triples can be lossy when merged from their originally stated context.

More importantly, all of this leads me to the belief that 'triples' (subject, predicate, object) are not enough to make a semantic web of information, at least not outside of very restricted environments. To make the semantic web a reality, we need to start thinking in terms of 'quadruples' (source, subject, predicate, object).

Uche Ogbuji talked up quads for RDF years ago. Statement provenance is a major use case that future specification work needs to address. I think the reason the RDF specs missed this in the past is because many of us thought that reification would cover it (it doesn't), along with other stuff like quotation (nope). Oh well, maybe OWL can cover it off.

The XML dump format I came up with for RDF recently has an optional fourth member of the tuple called 'context'. Ditto for the RDF backed logging I'm prone to doing - it's not always enough to have a statement - sometimes you need a source for that statement.

May 21, 2004

MT3: are you not entertained?

I'm tempted to buy a copy of MT3 as an act of protest against whingers - talk about knowing the cost of everything and the value of nothing.

[johnny cash: i walk the line]

No wonder Jini doesn't get used

I've thrown out my old Jini and been setting up Jini2. Starting rmid and the core services is a complete pita. The 10 minute test, what's that? No wonder people aren't using it.

Sofware advocacy tip 1 - it starts by clicking on something.

[jay z: 99 problems]

May 08, 2004

The future is coming on

Download Mono. One-oh is coming real soon now.

Frank Herbert quote

Short term decisions tend to fail in the long term. -Frank Herbert

Sharks patrol these waters

Microsoft announces that they're preparing to take on the search engine technology space again, and suddenly everybody's wondering if Google can survive. It's an incredible tribute to your success, but it's also a hindrance to the open-source space - who wants to try and compete with Microsoft, particularly when I'm not getting paid for it? - Ted Neward

A few observations on this otherwise good entry:

  • Some people are doing open source for reasons aside from money.
  • Some people are doing open source for reasons aside from competing with big software companies

It you do anything of value, anything that creates a significant market, a big software company will attempt to compete with you - they'd be silly not to and you'd be silly not to expect it. Bill Gates recently said (effectively) that Microsoft had missed out on search and now Steve Balmer says they have a lot of smart people in search. No doubt. The point is that Microsoft did not create or energize that market, someone else did. What big software companies do well, if they're smart, is turn the ship around as needed.

Steve Grand on ||

Speaking from a purely practical point of view, time matters. In my work I routinely model parallel systems consisting of a few hundred thousand neurons. I can model these in serial form, luckily, but it's only barely feasible to do so in real time, and I can't slow down gravity for the benefit of my robot. Moore's Law isn't going to help me much either. I'd far rather have access to a million tiny processors than one big one, and the compromises I have to make at the moment (specifically the artifacts that serialization introduces) can really cloud my perception of the kinds of spatial computation I'm trying, with such grotesque inefficiency, to simulate. - Steve Grand

May 06, 2004

What's in an envelope?

This seems innocuous enough, but some people claim that by doing so, we're allowing proprietary formats to rule the day once again; all of the transparency and nifty markup tools that XML gives you go away, and Evil Vendors will smite us with their Terrible, Proprietary Formats.
Which, I say, is hogwash. - Mark Nottingham

I hear what Mark's saying, but you know, there's not much in the history of our industry to back him up. Quite the opposite.

We don't even need BeEvil type vendors for this to be an issue - we just need people who don't think clearly about how to inspect encoded goo in a working system running around designing said systems.

The real question here - and boy, is this the elephant in the virtual room - is whether XML is the best way to model data.

Compared to, I dunno- Base64? XML is not for data modelling; it's for marking up. You need a data model model to model data - XML doesn't have one of those and probably doesn't need one (in fact I'm sure it doesn't - RDF/XML, WXS, anyone? No?).

Nah, if that's is an elephant, it's white. The elephant in my room is whether XML packaging over all other considerations is a good idea. Mark mentions this, but almost as an aside. To be specific, must everything go under the root element? XML packaging is currently Atom's biggest technical headache, although people aren't coming out and saying it just yet.

MIME

Sometimes it seems that putting binary in XML is like putting beer in an envelope. SOAP with Attachments was apparently a "horrible mess", as Mark puts it, but I've never heard anything more specific than that. Was it more horrible than multipart MIME? Will XOP be less horrible?

May 03, 2004

C# code conventions I'll be breaking for the time being

Here are two conventions I won't be following, with reasons:

Pascal case method names

Maybe it's just years of Java and Python showing, but I find ObjectName.MethodName() hard to read compared to ObjectName.methodName() . I tried to to adapt to this, I really did, and it's not due to taste matters. My brain keeps telling me I'm dealing with inner classes. I have to saccade out to the () to see what I'm dealing with - this is not irrelevant after a some hours of programming.

Prefixing interfaces with 'I'

Son of Hungarian notation alert. I just don't care enough that something is an interface and not a class to plonk 'I' in front of it. Why aren't we plonking 'C' in front of classes? Even so, I find having this distinction useless in working code - it's going to be irrelevant to me as a caller of your code what type model your implementation went with. One's an object signature, the other's an object generator+signature. But what do I care? If I'm using 'new' I know it's an object and if I'm using a function or factory it's irrelevant. All it does it make it look like Java circa 1999 with that 'IF' postfix convention. Doh.

Seven more ways to improve legacy Java

After reading Robert Simmons article on O'Reilly and the sample online chapter, I'll definitely be buying Hardcore Java. I don't agree with everything he says, but I like the way he thinks about dealing with existing code:

Use a Stronger Compiler for Your Code

What I read came across more as "use Eclipse or IDEA", which is fine ;) The example given could be caught with unit tests or even better with a field prefixing convention (instead of the this keyword):

public class SomeClass {
    private String itsSomeValue;
    public SomeClass(final String someValue) {
        itsSomeValue = someValue;
    }
    public void setFirstName(final String value) {
        itsSomeValue = itsSomeValue ;
    }
}

My convention is to use 'its' for fields and 'the' for statics (some folks use 'my'). I've never managed to self-assign a field. Just look at the code above - it's not hard to spot the problem. This also has the advantage that accessing object fields directly looks painfully stupid (as it should). I believe some folks deal with this issue by only accessing internally through get/sets.

However, if your developers depend on diffs instead of code documentation to determine problems, then you probably already have a serious problem in your code base.

No, documentation isn't worth that much in legacy code - I'd consider taking it too seriously as a project risk. Diffs are worth more. Behaviour is worth more again. We have problems when the tests don't pass after checkin, not whether we're leveraging documentation.

Remove Commented-Out Code

Absolutely. But we should ask why that comment is being left there at all. Blocks of commented-out code can be a sign that version control isn't being used well.

Replace Listeners with Weak Listeners

Ok. A problem I've seen with Listeners is thread safety, not so much GC. I tend to replace listeners with queues (but this might not be practical in Swing code).

So here are seven other ways to improve legacy code, some of which are not so cheap (think of them as investments ;):

  1. Get the build under control. Owning the build is job number one. "Builds in my IDE" doesn't count. Use ant or makefile to build code, not an IDE. Once we've got the build down, then we can start levaraging the IDE, but the build process has to be IDE independent. Dysfunctional build and deployment processes are a good way to derail legacy maintainance projects.
  2. Do a one time reformat over the class and check in. Thereafter stick with that format. Most time spent with legacy code is spent reading it, so make this comfortable.
  3. Prefix fields. See above for an idiom that works.

  4. Buy a copy of Martin Fowler's Refactoring. Buy two copies, and read both of them. It's Money well spent. Refactoring has a list of techniques that can help get the structure of the code under control before we start changing behaviour. As important as the techniques, are the discipline and shared vocabulary this will bring.
  5. Search the code for empty catch blocks. Get them them to output something. But we need to be careful about getting them to throw on the production system until the consequences of doing so are understood.
  6. Start taking version control seriously. The biggest mistake we can make with version control is to treat as a glorified backup system - it's not called backup control for a reason. Version control, used well, is there to help us manage change, not just to have a place to dump the source code. Build a habit of frequent checkins and updates - keep checkins small - you're doing it right when comments seem superfluous.
  7. Add unit tests. This is especially important when upgrading legacy systems that have no regression tests. There are two approaches here. We can do this against the existing code base first to capture its actual behaviour, on the basis that what a functioning production system is supposed to do is another matter again to what it actually does (which also gets the nub of the issue with taking comments seriously - comments don't execute). But if we have clear requirements for new behaviour we can think about capturing that in tests instead and working to make the code pass them - we'll go faster of course - but if the code base is shall we say, intricate, this won't help with the non-local bugs we introduce, however unintentional :)

May 02, 2004

RDF, meet SmartFrog

So you've built your SOA/WS/Grid watchamacallit. Now, how to manage it?

Maybe with SmartFrog. This is clever stuff courtesy of HP Labs and free as in beer. It would be interesting maybe to have the SmartFrog folks talk to Semweb folks, as they both incur description language research (one domain specific, one generic). For example, imagine running SmartFrog data through RDQL or Jena.

This is one of the things with RDF - it gets passed over by domain specific languages - "arbitraged". Maybe that's not a bad thing. A well specced mapping into RDF might be preferrable to using RDF as the direct notation.

Update: this is much edited - Steve Loughran was less than impressed first time around - I wasn't being anywhere near clear enough.

Javablogs is timing out on my feed: anyone having a problem?

Javablogs is timing out trying to pick up this blog's feed for the last few days, but I can wget it. Is anyone else having a problem picking up the rss feed?

Steve Vinoski on WS inside the enterprise

Most of the people I encounter in my circles, for example, use "web services" to describe intra-enterprise business services that can be accessed over a variety of protocols. These services are not and will never be offered over the WWW. They're not necessarily RESTish. They don't need to scale to web scale, and never will. Some are stateful, some are not. They might in fact be steaming piles of poorly designed and poorly written code, and in fact are often built on what many purists might consider to be obsolete technologies. Regardless, what's most important about them is that they work, and so these businesses want to make them even more valuable by making them much more accessible within their businesses. They're happy to describe these services in WSDL, since that helps them abstract the services, but they don't want to have to redesign them or rewrite them or change their protocols or adopt new unproven approaches just so they can ensure that the services conform to some purist's view of what's a "real web service." - Steve Vinoski

Very true. We might call this space Dark-WS. A part of the SOA/WS driven work we do involves placing service facades on top of legacy systems. It's not acceptable for some of these systems to risk breakage through alteration; thus they can't be touched during engagement. One of the great attractions of using Internet protocols and markup is that they induce an integration strategy of "least interference" with existing systems. Compared to something like JCA connectors at least.

Often these systems were engineered for batch work and not as servers. The interesting engineering problem focuses not only on the integration but on enabling the existing systems to cater for increasing and variant forms of load. You also have to take operations into account - batch systems tend not to be set up for 24x7 mangement or anything like application level fault notification - often you have people monitoring the job as it runs.

For me, one of the trickiest parts in all this is managing the gateway between web and legacy namespaces - we have a pattern called T3 to help us with this, based on asynchronous messaging; I hope to write that up soon :)

RSS over P2P: one of those aha moments

Perhaps we don't just need to argue over the feed format and blog APIs but also on the whole interaction model of feeds. Mailing lists are a good push model but we've stopped using them because they add to the 'information overload' we all feel when working with an email client in these spam-filled days. In contrast, an RSS aggregator is a place for one-way information -- we can ignore the feeds until lunchtime, or the end all day, without worrying that there is some important message which we will have to respond to (although how many of us actually do resist the temptation to peek? RSS aggregators are far too compelling and addictive... 'what's the world writing about today?'). But, under the hood perhaps we need a new distribution model.
They say 'popularity comes at a price' and this is especially true of blogging, where a popular blog can get pounded by greedy RSS aggregators costing the owner money in excess bandwidth charges. - Jamie Lawrence

That's an excellent observation.

The problem in the business world manifests itself as overprovisioning of hardware - again and again you see big iron deployed to cater for the fraction of hours in a year a couple of workstatations couldn't handle the traffic. Most of the time those servers are idling, unless you happen to be in that part of the power law curve that says you're going to be hammered frequently. The amount of hardware underpinning the Web today is phenomenal.

This is all good news if you're in the business of pushing tin. And it's tin pushers who potentially have the most to lose by any arbitrage induced by Grid and P2P technology. So expect to see those guys increasingly involved in future Grid and P2P standardization efforts, in the same way middleware types overran web services. Intel and IBM are already all over this stuff. Sun has bankrolled two P2P technologies to date, but I haven't see any discussion about how they think P2P/Grid might affect hardware revenue.

May 01, 2004

Biological ad filter

I was browsing around ObjectWeb's website today. It took me quite a while to 'see' the links in blue below:

badword.jpg

I suspect I've become conditioned to visually filter out Google ads.

HTTP over SOAP over HTTP over ...

[From the MC Escher school of standardization]

This has been picked up by Danny Ayers Sean McGrath, Joe Gregorio and Tim Bray. Bray pinged the W3C TAG on it and I'm eagerly awaiting Mark Baker's comments. I don't get it, but who knows, maybe there's somebody out there who needs it.

Yours via a complementary splash screen:

HTTP over SOAP over HTTP...

Why I'm not a JCP member

I'm too damn lazy. To join the JCP I have to download a PDF or two, print them out, fill them out in a funny way because I'm not a business entity, get my employer to countersign and them fax or mail it. Somebody please automate this process.

Thoughts on open source J2EE

Mark Watson is a JBoss fan. Interesting that Geronimo is starting to produce releases. Progress in Geronimo has been solid, but the project's community building could be better - it's something of a tight crew and I'd be concerned the project would stall if certain folks stopped working on it. I have no idea what to make of JBoss Group right now. They've made some interesting 'acquisitions' but there's still too much emphasis on talk rather than code for my taste, but the community around the container seems solid.

Jonas is looking like the best possible option for open source J2EE. I like ObjectWeb's organisational model, but especially it's technical focus - call me a cynic, but it seems it's too easy in J2EE to confuse press releases with systems. We're evaluating switching some systems over to it. I don't know if I will use JBoss again and Geronimo looks like it needs 8-12 months. Plus that spat between ASF and JBoss Group wasn't impressive.