" /> Bill de hÓra: January 2005 Archives

« December 2004 | Main | February 2005 »

January 31, 2005

Get Everything Done

I've noticed of late that David Allen's book Getting Things Done seems to be popular among software types. I have that book and it's good, if a bit complicated for me - I'm inclined to think if you had the discipline to follow David Allen's system, you're already halfway there. Still the idea of deferring anything you can't action inside a couple of minutes seems to work.

My favourite book of this kind by far is Mark Forster's Get Everything Done. There isn't a system as such, more of a collection of techniques that are to be performed frequently. What holds the book together is the insight into the mindset of ineffectiveness and procrastination - Forster is quick to point out that he is naturally disorganized person. There are great (and seemingly hard-won) overviews of the problems with some techniques such as todo lists or prioritizing tasks. The burst technique of short (5, 10, 15 minute) task iterations is extremely clever - if you're a programmer and are comfortable with agile or test-driven approaches, you'll like it. The sections on dealing with procrastination, interruptions, and those floating items that don't seem to fit anywhere are great. And the book is short, less than 200 pages of crystal clear writing. So while it's a different take to time management that some might find too unusual, on the upside you'll know inside an hour whether it's for you.

January 29, 2005

A registry for one click feed subscription, anyone?

You'd think we'd know by now that centralized registries are often a bigger problem than the one they purport to solve. The problem posed this time is One-Click Subscription, one I think whose solutions invloves getting the client browsers to dispatch on something. Almost all browsers have a plugin/dispatch architecture - that's how IE or Mozilla knows to pops up your mail client or run Acrobat Reader.

Aside from the considerations that come with any centralized model, a registry for subscribing to feeds feels like overkill. The argument for a registry seems to be based on the notion that feeds are not served with the correct media-type so getting the browser to dispatch is a problem. This can be solved for most users by providing a right-click "subscribe to this" menu option for the use to use on a URL (we do it already for "blog-this"). It can be solved by creating a new URI scheme (feed://) - that seems to lack technical elegance in some quarters (including my own it should be said, but if a registry is coming down the pipe I might reconsider). It can be solved by getting ISPs to have Apache serve content correctly. It can be solved by providing an @rel attribute in the link.

Yet, some people already want to figure out the business model for supporting a registry - that should be setting off alarm bells right there - central control points are not a sound basis for robust open architectures. Even then, getting a client browser to open up a client feed reader via one click doesn't like a basis for a business, not when existing infrastructure is good enough.

Probably it'll be solved using a collection of techniques - feed:// URIs, media-type, right-click, content scanning for @rel and so on. A hybrid approach may lack technical purity but the user for the most part won't have to care about what's going on - and it beats the bejeesus out of a registry, which is only sustainable by conserving complexity for the benefit of the registry owner.

January 25, 2005

Deprecating Metacrap

Dare Obasanjo is coming around to structured metadata:

One thing that is clear to me is that personal publishing via RSS and the various forms of blogging have found a way to trample all the arguments against metadata in Cory Doctorow's Metacrap article from so many years ago. Once there is incentive for the metadata to be accurate and it is cheap to create there is no reason why some of the scenarios that were decried as utopian by Cory Doctorow in his article can't come to pass. So far only personal publishing has provided the value to end users to make both requirements (accurate & cheap to create) come true.

The fact that those who are working with RSS are getting a sense of the value of metadata, especially aggregated metadata, is all upside. Despite what some people might still believe, there is a growing set of metadata out there where the burden of creating it is close to zero. Creation is a side effect of using a computer - the only interesting cost is bothering to deploy tools to collect it. There is other metadata again that is cheap to create, such as Movable Type categories, Wiki backlinks, or recently Technorati Tags. Start to mix and match this stuff with statistical techniques and you have the basis for powerful ways to organize information.

When Google asks for more metadata, all bets are off.

The Metacrap article has been given too much credence over the years. When you've been working with Wikis, semantic CSS hacks, or RSS, it's hard to ignore the benefits of metadata, so if that meme is fading, perhaps it's not a bad thing. Clearly not everyone requres a rigorous approach. If RDF or Topic Maps can come down to integrate with loose and fast approaches, things could get interesting.

January 23, 2005

Windows XP reinstall list

I had to wipe down Windows XP on my Dell inspiron and start over. It's been 18 months* since purchase and the system has become completely unstable and very slow. Getting a new motherboard recently probably didn't help any. There was scads of software installed (4 columns deep on the start/programs list). All of it seemed needed.

So, after reinstallation, the system seems stable and faster. Here's what was installed aside from the drivers. This should come in handy the next time I have to do this.

  • WinZip
  • Cygwin
  • Emacs 21
  • IDEA
  • Andale Mono
  • Araxis Merge
  • Gaim
  • Thunderbird
  • Firefox
  • Ultraedit
  • MS Office
  • WinSCP
  • NewzCrawler
  • iPodder
  • Subversion
  • TortoiseSVN
  • A folder of Java stuff (jdks, jini, ant, junit, etc)
  • Python 2.4
  • Apache
  • iTunes
  • WinAmp
  • WinDVD
  • Acrobat Reader
  • updated 31st Jan 2005
    • MySQL
    • Open Office
    • Smart FTP
    • MS Project
    • Visio

* 18 months is the longest I've gone without reinstalling Windows. 8-12 months has been the norm in the past.

Java get/set - not that harmful. Version control for refactoring - harmful

update, 2005-01-26: Ted Leung: "Having a good language doesn't mean that you won't get more leverage from good tools."

I'm enjoying Ryan Tomayko's series on Python for Java programmers, the second of which has a good overview of Python attributes compared to Java's get/set idiom. I think the article strays a bit when it looks to enter into the mind of a Java programmer.

Accessors, Attributes, Autoboxing

Here's where I see a huge difference in mind-set between Python and Java coders... Practices like getter/setter that lead to code bloat are generally met with less resistance in the Java community because the intelligent IDEs and other tools go a long way in managing the excess code.

I write a good amount of Java and Python code, at the moment it's probably close to a 50/50 split. With my Java hat on, I don't find code bloat cause by the get/set idiom to be an important issue. And yes, I read a lot of other people's code. [In any case, Java 1.5 has autoboxing, which provides some kind of parity with Python attributes.]

The real problem with getters and setters is to do with code management, not bloat. Alan Holub has laid this out in a JavaWorld article. If you're not careful you end up with a small fraction of classes doing all the work by pulling the data they need from a lot data objects. Holub's argument is that get/set makes it easy to fall into that trap - procedural programs obfuscated by object oriented gorp.

Then again, if you can pass around functions as easily as you can objects, procedural clumping doesn't seem to be as much of an issue. So whether or not you have autoboxing doesn't seem to matter as much from a code management standpoint as whether you have function passing and/or lexical closures. In that regard, use of Command and Plugin patterns are indications of trying to emulate that function passing capability via APIs. Certainly Command and Plugin are two of the more powerful ways I know for organising code in C# or Java.

In short: object languages that have a get/set idiom and do not have function passing naturally lean a programmer towards larger centralized units of behaviour orbited by collections of fields masquerading as objects.

IDEs

This is the primary reason why there is little serious demand for Python IDEs. Many people coming to Python can't believe no one uses IDEs. The automatic assumption is that Python is for old grey beards who are comfortable with vi and Emacs and refuse to accept breakthroughs in programming productivity like IDEs. Then they write a little Python code and realize that an IDE would just get in their way.

I would say that the day you need an IDE with Python is further away than with Java. But is there really little demand for Python IDEs? That might be rationalising things a bit. I know I want a better one.

Debuggers, wizards, code generators, code folding, code tidy, syntax checking, tree views, all that stuff normally associated with an IDE, are not the key productivity boosters. The facilities a good IDE gives you that count are:

  • autocomplete,
  • refactoring,
  • easy testing,
  • version control

I suspect refactoring and autocomplete are some of the reasons Guido Van Rossum is noodling on type declarations for Python. Once a Python IDE exists that has anywhere close to the refactoring and testing functionality of IntelliJ or Eclipse, it'll get used. Not having anything other than the likes of Emacs is affecting Python adoption.

update, 2005-01-26: from the comments: "there are two nice free IDEs for Python besides the simple idle IDE: DrPython (no relation to DrScheme) and eric3. Eric3 provides refactoring capabilities too, though whether they're as good as Eclipse's yet, I don't know.". I also had ActiveState's Komodo mentioned to me by a colleague.

Refactoring and version control

The next wins in IDE productivity could be profiling support and direct mapping of refactoring steps onto version control operations. Paul Graham has made the case for better profiler support, so let's consider the situation with refactoring today and how integrated version control might be beneficial.

Currently with automated refactorings you're doing twice the work you need to - first to tell the IDE what to refactor and second to manually emulate those refactorings with version control before checking in without losing version history or getting entirely frustrated. [This is one reason why CVS is a dead-end for Java and C#. CVS gets in the way of refactoring and as the refactoring capabilities for those languages improve, the pain of using it will increase.]

In that case the IDE extends beyond the developer desktop and onto the repository server. You'd have to think that if Jetbrains had either Resharper or IDEA refactorings mapped onto svn operations, and went on to offer a complimentary custom build of Subversion, they would shift plenty of units.

January 21, 2005

Extensibilized

Here's a quote from an interesting paper/manifesto from Gregory V. Wilson entitled Extensible Programming:

Language extensibility has been around for years, but is still largely an academic curiosity. Three things stand in the way of its adoption: programmers' ignorance, the absence of support in mainstream languages, and the cognitive gap between what programmers write, and what they have to debug.

Wildon talks about code generation and the above very much nails the concern I have with code generators, even while liking the idea of code generators quite a bit.

The paper has some good analysis of existing mixed-in languages such as JSP and XSLT. It concludes that we'll end up storing our programs as XML but looking at transformed representations, which is something of a "tools will save" us argument. Wilson addresses a list of possible objections, though one important one is missed - how is that going to work with version control, specifically version diffs? XML is tree based and diffing algorithm stend to line based, which makes for dissonance. More tools? Better XML diffing algorithms?

Jon Udell reckons Wilson skewered the objection, 'I Want to See My Programs As They Really Are', but I found the answer somewhat specious. I don't know anyone who wants to see disinterred programs in the manner suggested - they use debuggers and profilers for that kind of thing. What they do want to see is source code - because the code is the technical specification of the program. And the fact that the two (program and specification) are being conflated here makes me wonder - is that what happens when you spend too much time looking at abstract models of syntax?

As for XML representations - why not go further? Why use XML when you could use Lisp expressions - you'd have the added benefits of being able to manipulate the parse tree directly if you wanted - plus diffs would be sane. The argument for XML seems to be an ad populum one And yes, there are better (i.e. more succinct, and hence easier to process) ways to represent the semantics of programs than XML, but we believe that will turn out in practice to be irrelevant. XML can do the job, and is becoming universal; it is therefore difficult to imagine that anything else will be so compelling as to displace it. There's nothing inherently wrong with a worse is better thinking approach, but as an XML advocate, I'm a tad wary of XML world domination arguments :) If it came down to it, I think I'd rather have source in pyxie syntax than XML.

As for advances in tools, the most signficant mainstream advance I have seen is in IntelliJ IDEA, which treats the codebase as a tree of syntax trees rather than as a collection of flat files. That makes semantically sound transformations of code possible - allowing the immediate automation of refactorings and restructurings that once could have taken hours, or days - or simply not be done at all for fear of breaking something. Subversion does something similiar for version control when compared to CVS.

I would love to see tools developers eventually shift focus from code organisation to code runtime and offer semantically sound refactorings based on profiling and analysis of hotspots - debuggers tend to get all the attention, but I think profilers have more to offer. Anyway, in practice I tend to use multiple editors when working with source code, and my sense is that many others do too; which is one reason why I don't think standardizing on a single IDE is neccessarily a productivity win for a team.

The real issue with extensibility of this kind talked about in this paper is not so much the suggested ignorance of tools and techniques, but a lack of appreciation of how difficult it is to define extensible rules of evaluation and a supporting syntax. In most programming languages semantic extensions can only be achieved through new syntax, usually new operators - 'new' insofar that they are not defined in terms of existing language primitives. Eventually the language gets bogged down in its tokens or the semantic inconsistencies introduced by new evaluation rules for those tokens - saving the the language is exactly what kills it... until the another language is created to replace and we start over. This leads to cycles of reinvention. We're not so much building on the programming state of the art as continually have each generation of programmers rediscover it.

January 20, 2005

We, The Observers

While there are reasons to think that the nofollow solution to the current abuse of comments, applying more metadata, is not a good approach, Robert Scoble has come up an interesting use for Google's solution to comment-spam:

So, now I could link to that store so you all would be able to visit it, but I could add "nofollow" so that Google, Yahoo, MSN, and other search engines wouldn't consider my link in their ranking system.
This will change how I write. And it will encourage more people to link to their competitors.
Think about it. If you hate me, why should you add to my Google juice just by linking to me?
It means that the link now can have editorial comment itself. Oh, and it takes away a lot of the incentive for people to spam in comments because they won't receive any Google whuffie either.

There is some delicious irony here. First, that weblogs are a new open medium has been a cause celebre of bloggers. Yet here is an attempt to control other's visibility; a) obtain a high PageRank for your weblog through open inbound links, b) shut the door behind you by manipulating how PageRank is to be distributed from your closed outbound links, thereby creating more scarcity than is actually the case (the thoughts of folks like Dan Gilmor, Cory Doctorow and Doc Searls on this matter will be interesting to hear). Second, Google has not in the past paid attention to metadata (for example they don't place any stock in the HTML meta tag), yet here they are asking for... metadata.

So, bloggers want to exert a controlling influence and the kings of statistical search want more metadata.

What's going on?

What's likely is that putting PageRank results on the web has permanently altered the web's link dynamics, in a way that serves to dilute PageRank's value. It's a curious feedback loop - the fidelity of the algorithm is de-amplified as its measurements are made available.

It seems that search engines are destined to be participants, not observers. Yet, Google asking the web to retag its markup to sustain the PageRank theory of links is like a physicist asking subatomic particles to stop moving about so he can take some measurements.

qotd

1.1 after 2.0?! The two camps really should have used different names for their formats instead of duelling version numbers. It's as though IMAP had been named "POP 4". - - Jens Alfke on the RSS1.1 spec

January 18, 2005

1060 NetKernel v2.0.2 ships

1060 Research have shipped 2.0.2 of the 1060 NetKernel.

RSS 1.1 ships

RSS1.1 is a bugfixing upgrade to RSS1.0. Read about it here: http://inamidst.com/rss1.1/guide

January 14, 2005

TechnoratiDescriptionFramework

Technorati Tags: looks like a cross between a WikiWord and RDF.

January 10, 2005

TSS and podcasts: yes please

Wouldn't it be cool if the TSS published their Tech Talks via a podcast feed?

January 08, 2005

XOM 1.0 ships

Congratulations to Elliotte on shipping XOM 1.0.

Data above the level of a single site

About LML, Danny Ayers asks why use it when there are formats such as XHTML and OXO.

So please tell me again why should I use LML rather than either/both of these..?

Fair question. I had a similar conversation with a few guys from work about this. We all agreed that a markup for lists was, well, sort of absurd. Put it this way, if LML was published on April 1st, a lot of people would think it was a joke. Which to some degree it is.

But. We've seen those kinds of arguments play out with formats like RSS and FOAF. Which format, or the absurdity of actually bothering to define a markup, don't matter so much as the fact that we are inundated with lists. The thing about lists is that they have a lot of untapped value. I believe a lot of information gets left behind when all you have to work with are <ol>, <ul> and <li>. In terms of moving up the semantic and social software food chain, lists + metadata are a natural next step. Arguably an Amazon wishlist or a list of people on LinkedIn have more value if they're decoupled from the sites themselves. Passing them around and sharing them might be cool. Much more important than using LML itself is getting people to turn their attention to mining this seam of data. Really, it plays to Tim O'Reilly's take that the future of lock-in is about data, not APIs.

Danny mentions RDF. I have a RDF/XML variant (dc:subject, foaf) that might get done next weekend - I'll definitely being publishing it. As to why I shipped a vanilla XML format first, let me say this. In many respects RDF is an excellent choice for working with this kind of data as it has comes with a linking model, exactly what you want for merging data sets; the sticking point is that operating over RDF is still a big ask for a lot of people. At some point you have to stop writing things down and do something with the data. Doing with RDF or the other Semantic Web formats still requires a bigger commitment that most are willing to undertake.

To give you an idea of how low hanging list fruit is, the LML 'spec' (if you can call it that) took about 4 hours to write. I figure the RDF version will take about twice that.


re comments: I'm currently going 12 rounds with an MT upgrade and I think they're fixed now... anyway trackback is king ;)

LML: List Markup Language


Introduction

LML is a markup language for lists*. There are many lists on the Web and in web content, but most of them are published using HTML elements which don't carry much interesting information about the list. LML can enable sharing of these lists along with useful metadata. LML documents can contain lists of most anything. They can be ordered lists, such as a top ten lists of books, the world's twenty most populous countries, or unordered such as a wishlist, a playlist, or a shopping basket.

The markup is straightforward. Here's an example skeleton:

   <i:lml xmlns:i='http://www.dehora.net/lml/2005/01'
       i:version='001' >
     <i:published i:when='' />
     <i:changed i:when='' />
     <i:author>
        <i:name></i:name>
     </i:author>
     <i:category>
        <i:name></i:name>
        <i:subject></i:subject>
     </i:category>
     <i:list i:ordered='' i:href=''>
       <i:item i:href=''></i:item>
       <i:item i:href=''></i:item>
       <i:item i:href=''></i:item>
     </i:list>
   </i:lml>

Let's go through it.

General notes

  1. None of the elements are optional.
  2. None of the child elements of the root element are ordered.
  3. Attributes are in namespace.
  4. You're free to insert foreign markup into an LML document.
  5. You're free to ignore foreign markup in an LML document.

i:lml

The i:lml element in the http://www.dehora.net/lml/2005/01 namespace,

   <i:lml xmlns:i='http://www.dehora.net/lml/2005/01'
       i:version='001' >

says this is an LML document. The i:version attribute contains a textual identifier - it's not optional. The version we're talking about here is '001'.

i:category, i:name, i:subject

     <i:category>
        <i:name></i:name>
        <i:subject></i:subject>
     </i:category>

This lets you associate the list with category metadata. You can have as many i:category elements as you want. The i:name element is the name of the category. The i:subject indicates a URL that qualifies or contextualizes the category - it can be empty. Preserving the XML document order for these items is sufficient.

i:published

The i:published element,

     <i:published i:when='' />

indicates the date the list was published - it's an empty element. The i:when attribute contains the date conforming to the date-time BNF rule in RFC3339.

i:changed

The i:changed element,

     <i:changed i:when='' />

indicates a date the LML document was changed - it's an empty element. The i:when attribute contains the date conforming to the date-time BNF rule in RFC3339. The first time a list is published i:changed and i:published will be the same date.

i:author, i:name

The i:author construct,

     <i:author>
        <i:name></i:name>
     </i:author>

is lifted from Atom's.

i:list

     <i:list i:ordered='' i:href=''>

The i:list element tells us the list of items has started. The i:ordered attribute can have the values 'yes' or 'no' - it's not optional. The i:href attribute indicates a link where the list can be found or read on the Web - it's not optional. The semantics of ordering is undefined - specifying it would be a) interminably dull, b) of minimal benefit.

i:item

An i:list contains one or more i:item elements:

     <i:list i:ordered='yes' i:href=''>
       <i:item i:href=''></i:item>
       <i:item i:href=''></i:item>
       <i:item i:href=''></i:item>
     </i:list>

the i:href attribute indicates the resource name of the item - it's optional.

What you put in the content of an i:item is up to you. If you put some LML in an item, i:category elements in your outer LML document do not apply to the embedded LML.

Media type

The intended media type for LML is application/xml+lml. Who knows how long it will take to get that registered, but try winging it for now and see how things work out. If you don't like using unregistered media types, application/xml is fine.

Future work

It would interesting to

  • define an RDF serialization of LML and accompanying XSLT sheets which specify the mappings.
  • hack support for LML into RSS aggregators and blog publishing tools.
  • scrape exisiting web sites for their lists and republish as LML.

A RelaxNG schema and an LML to XHTML transform should be forthcoming in a future edition.


* Credit is due to Clyde Hatter for coming up with the idea of a markup for lists.

January 06, 2005

Werner sums it up

All Things Distributed: Official vs. Personal Voice.

January 03, 2005

Versioning: some strategies and tactics

'Any software needs to have a coherent versioning strategy baked into the initial version. Version 2.0 (or 1.1, or 1.0.1, or whatever) is too late. By then, the horses have already left the barn. When calculating the TCO and ROI for a software project that does not have a coherent plan for enabling loosely coupled versioning that can enable forwards- and/or backwards-compatible changes, the initial deployment cost (at least) should be used as a measure for the inevitable version dot next and added to the overall support cost. -Chris Ferris'

I agree with Chris and so do these folks. I don't read Chris as saying that using numbers to represent versions are misguided but that version numbers alone are only symbols and are thus insufficient - there must be a defined policy in place from the beginning.

Something else came to mind while reading Chris' observations on the versioning discussion between Norm Walsh and David Orchard. There's plenty of architectural thinking around versioning in the fields of web services, service orientation and protocol design, but there's much less advice available to practitioners on versioning. So, here are some thoughts for dealing with versioning...

  1. Treat versioning as an architectural necessity. Retrofitting a versioning policy onto an existing system is hard work, especially if that system's stakeholders are independent and possibly conflicting actors. Also, it's unlikely all stakeholders will have an incentive to upgrade down the line, even within the confines of an single enterprise. To avoid friction and ensure the system can evolve in a controlled manner it's important to be clear that the architecture need cater for change. Therefore be explicit in the architecture about what the versioning policy will be.
  2. State a versioning policy as a requirement. Define the ability to handle multiple versions with respect to that policy as a requirement. Work versioning stories into your use cases. This encourages everyone involved to start thinking in operational and cost terms about versioning and ideally tees up developers to implement with versioning in mind.
  3. Treat versioning as an engineering risk - early on. From a delivery and initial implementation standpoint, dealing with a versioning policy can be considered an exercise in risk that needs to be actively managed. That's slightly different from the architectural and stakeholder view where not having a versioning policy is itself a risk. A versioning policy will imposes a complexity burden on developers that affect scheduling and initial delivery costs (this is the investment or 'opportunity cost' versus the overall lifetime cost incurred by not doing so). So, if versioning is an architectural necessity it's also a engineering risk early on - understanding this tension will be important. Aside from that it's natural for fear of breakage to set in when it comes to running a system of services; the longer a service runs the more likely it is to be critically important to one or more people. In that case change can become more difficult and risky than it needs to be - worst case is the service or its data format will become effectively frozen. Later as the service evolves the initial risk undertaken will reap it reward as the developers will be in a position to undertake changes.
  4. Get feedback early - implement with multiple versions. Customers and users will need reassurance that the versioning will work from the outset. Here's an approach for delivering services that can act as expected with respect to a versioning policy - begin with two versions in play. Not only will you have demonstrated the system can handle one or more versions, you'll have empirical evidence that the versioning policy itself is sound. As well as helping with such things as calculating ROI and managing risk, it will add confidence that the system is in fact, evolvable.
  5. Software processes won't be enough - tailor them. If you use agile or open source approaches to ship code this may all just sound like hot air - 'release early, release often' with more words. To some extent in that is true and it we might think it true in terms of delivering components. Yet even for components, versioning has proven to be a real headache - consider the various approaches to the issue in COM, J2EE and. NET1. For services and systems that expose web based APIs the stakes are raised yet again - interfaces and data formats become critical points of breakage2. Getting stories and use cases to arrive with versioning requirements (which are non-functional) is important. You can get into situations in purely agile or TDD approaches where you will end up breaking users of a service by altering data formats or protocol interfaces. Indeed you may be breaking users you didn't even know you had. On the other hand approaches such as the RUP or agressive change control management run the risk of deferring feedback on versioning problems late in the development cycle or after the system has been delivered3.


[1] In Internet protocol design the default approach has been to reduce the risk of breakage by having highly constrained but highly generic interfaces.

[2] Perhaps the best articulation of this issue for those coming at it from the developer or OO side is Martin Fowler's essay, Public v Published Interfaces[PDF]

[3] Steve Loughran has argued that for Web Services, the development cycle should be extended to cater for post deployment integration issues and consequent feedback on the basis that you don't really integrate a Web Services system until you go live. This is called 'Continuous Deployment'. See chapter 2 of the excellent paper 'Making Web Services that Work'[PDF]

Two sites that need RSS feeds

The Irish Times. Depending on the content, I'd consider paying 30 euros a year for a feed; not for email. The front page uses URIs and CSS in such a way that a feed scrape is possible (there's even a CSS class called 'headline'). The terms and conditions don't allow it however.

The RTE Guide. The UI for figuring out what's on is difficult to use (and not bookmarkable). A feed for TV channels would be so nice. I don't think scraping this is an option given the page design. The terms and conditions would seem to allow a scrape to be run locally and a feed driven into an aggregator via a file, but not published and downloaded via a URL.

January 02, 2005

Ant, workflow, orchestration

I've been looking at some of the new features coming through for Ant 1.7; they look very useful and seem to have a common conceptual theme - better control for conditional reasoning about dependencies. The fact that the Ant 1.7 feature set is conceptually clear is a good sign in itself.

It occurs to me that if tools like Ant and make tend to break down as projects grow larger and more conditions arise, expectations need to be managed for the family of XML based business process languages and tools. Building software is a specialized problem domain, narrower that the gamut of business process orchestration or even the basic workflows you can expect from something such as a write/review/edit cycle for document publishing. Powerful underlying formalisms like pi-calculus and Petri-nets notwithstanding, the risk is that users are left disappointed and cynical with business process tools due to hype.

I suspect for the general problem of capturing a business process as an orchestration, our reach exceeds our grasp*. This is not an argument against the declarative approach. There's no question that too much business logic is locked up in middleware and systems programming languages. But we need to avoid the industry standard hype cycle on these technologies and be clear on what they can and cannot do, which is why online voices of clarity such as Stefan Tilkov's and Paul Brown's are much appreciated.


* I'll avoid going down an Artificial Intelligence history rathole here, but AI researchers hit on this declarative/procedural impasse as far back as the 1970s - Drew McDermott in particular has written some thoughtful essays on the subject.

January 01, 2005

Predictions for 2005

10+ predictions for 2005, with my tongue firmly in cheek! And a Happy New Year to you all.

  1. Java: XStream becomes the de facto XML/Java mapping tool. Jython will grow its community as will Groovy. Spring, Hibernate and lightweight framework backlashes occur. Netbeans will continue not to win the hearts and minds of Java folks compared to IDEA and Eclipse, but Sun will continue to back it. Java developers get themselves into bother with generics. java.util.concurrent will be a talking point and will produce a rash of articles and guidance as developers weep like children in the face of Doug Lea's 3rd edition of CPIJ. The upside will be that more Java programmers will be able to write concurrent code, the downside will be that managers everywhere will wince as the orders for dual-CPU developer boxes come in. Microsoft will try to hire Doug Lea. Others will mumble darkly about Jini and tuplespaces and people not getting it, but Sun will confound everyone by releasing Jini under a developer-comprehensible license. JUnit gets forked - status quo advocates and dormant committers are initially irate but collectively breath a sigh of relief as the community takes over. Everyone gets bored talking about AOP.
  2. XML: Object and Doc heads have a Coke and a Smile, and learn to get along. WS-* are de-emphasized by vendors over the course of the year but WSDL and SOAP are generally accepted as Ok. Parallel/Concurrent XML processing becomes the new xml-dev obsession throughout the year - running code results. Unicode and internationalization become sought after skills for XML specialists. O'Reilly get around to publishing XML-DEV - The Best Of; Len Bullard writes the preface. Somebody will invent an XML grammar for ratings lists and top ten rankings, called TopTenML; it becomes the second most popular use of XML after the RSS family. Amazon get on board with the syndication community to bring it through a standards body as the more generic ListML. As usual, James Clark does something amazing.
  3. EDA: slated to replace SOA as the buzzword de jour, you will be sick to death of this acronym by year's end. Expect anyone who has a SQL trigger or an email ping in their system to call it an Event Driven Architecture. Meanwhile even as the acronym's meaning degrades, people actually build useful event driven stuff and there are grudging concessions that it is a bona fide architectural style. Gartner, Burton and Forrester get entirely bored towards the end of the year and start looking at Infoware for 2006.
  4. IT and Open Source: the industry will look healthy, especially professional services, but open source will continue place pressure on software companies to find viable business models as margins shrink. Over the year, enterprises big and small will start to consider open source the default strategic and implementation option over vendor offerings. Use of open source will be perceived to be a more important cost rationalisation and strategy than use of WS and XML standards. Open source technical knowledge and processes become sought after skills for developers, architects and project managers.
  5. Most innovation is commercial: 2005 could be ho hum - much of the innovation will be in vendors finally getting creative about new business models. Support and service offerings for open source systems is seen as big business. Bob McWhirter gets seriously rich selling Codehaus In A Box.
  6. Rich Internet Applications: RIA hacking hits the offline/online sweet spot as developers get fed up waiting for browsers, virtual machines and UI toolkits to evolve in response to Web Sites Which Are APIs. Flash will be considered as something other than the default means of producing rubbish websites. Tim O'Reilly's declares victory as the Web OS vision becomes reality. Adam Bosworth says I told you so.
  7. Programming languages are the new black: 2005 witnesses a revolution in how most developers are prepared to use obscure languages in production scenarios. Paul Graham declares victory and ships Arc, which end up being Common Lisp without the libraries. The terms process-oriented, crash-first, concurrent message passing, and little language enter mainstream developer lingo. By the summer everyone gets closures and they replace IDE support as the popular distinguishing factor between languages. People start fooling with Erlang after they realise EJabberd is written in it and Herb Sutter's Fear and Loathing in Concurrency article scares the beejesus out of everyone. Smalltalk and Lisp people continue to be smug about the whole thing while Patrick Logan and Ehud Lamm become superstars. Pragmatic OCaml and Kent Beck's re-released Smalltalk Patterns book become huge sellers.
  8. Mozilla and Mono: 2004 will be the last year anyone makes jokes about either Mozilla and Mono being a joke. Jamie Zawinski starts contributing patches to both.
  9. Instant messaging and systems integration:. IM becomes a viable alternative to the heavier Grid and P2P technologies for integrators and data crunchers working at federation and Internet scales, but will initially be frowned on as a simplistic and inadequate for 'real work' - the debate will initially look like a rerun of WS-* v HTTP/XML, but runs out of steam as architects can't be bothered after the last one taking four years. More Cokes and Smiles result.
  10. Fewer technical arguments: Having pointless heated arguments over the merits of various computing trivia becomes increasingly unfashionable. Concillatory behaviour and good manners breaks on out on tech mailing lists and blogs everywhere. Folks who couldn't handle their opinions being challenged are miraculously converted into third way thinkers, except for Slashdot posters, GPL advocates and syndication technologists, who remain steadfastly rabid and hostile. Philip Greenspun smiles knowingly as nerds everywhere learn to spell tourniquet.

Some things I said last year - judge for yourself!