" /> Bill de hÓra: June 2004 Archives

« May 2004 | Main | July 2004 »

June 30, 2004

You're in a maze of twisty encoding thingies, all alike

Effbot: cp1252toUnicode for those in cut and paste hell.

Never mind the printers, here's the makefile

Mike Clark:"Pragmatic Project Automation has gone to the printers."

July 19th! No download yet, and we wants it - compiles it to PDFses and sells it to us!

June 25, 2004

Transactions and Web Services

In comments on the complexity of WS-* Ted Neward mentions REST:

I think the WS-* guys would agree with you, that simpler messaging systems, like RESTful systems, are easier to develop and maintain.

And then transactions:

But when you *need* the security and transactions and reliability of the implementations that conform to the WS-* stack, the platforms have to be able to deliver that without requiring you to build it yourself.

Transactions and web services are not a good mix. Probably I'm being overly literal in interpreting what Ted is saying (I haven't seen anything like this on his blog). But do keep your TP Monitors to yourselves folks.

Jabber monitoring

Cool: Jabber Log monitoring. The article has a link to Janchor, an RSS over Jabber hack. Layer 7 Syslog maybe.

[via Darragh Sherwin]

June 24, 2004

Jabber and IM walled gardens

It seems Yahoo! are intent on locking out Trillian clients:

Regardless of whether IM spam is a growing and gathering problem, Yahoo's actions against Trillian illustrate the Web giant's efforts to further wall-off its proprietary IM network. The Big Three equivalent for Internet services have all watched their IM software proliferate thanks to their closed-door environments. Consumers must download multiple clients to chat with buddies in different IM services. [CNET]

This is like AOL and Compuserve walled gardens all over again. If this keeps up, using proprietary IM is going be more and more like using those services back in the Nineties. You would think Yahoo! (and Trillian) would understand the futility of this and adopt Jabber as IM's answer to the Web. Anything else is a race to the bottom.

June 21, 2004

Managing stuff with a web server

In 1753 Samuel Johnson said:

I saw that one enquiry only gave occasion to another, that book referred to book, that to search was not always to find, and to find was not always to be informed.

We're still nowhere.

Sean is asking interesting questions about URI applications and local data access linking to a piece by Micah Dubinko.

These days, I use mostly webservers to manage my files, having gone through a number of iterations using folders and ad-hoc scripts. The current modus operandi is to install Apache, running or backing the following:

  • Python CGIs
  • Wiki
  • Subversion

I also have public webspace set up with Python and a Wiki (I might get around to begging my provider for Subversion one day, but I'm not hopeful). Spidering into Lucene is imminent. So is RDF and TM metadata (every folder I've ever created was ultimately an exercise in shoehorning descriptive metadata into folder names).

It's not complicated . You don't need a fullblown setup to manage this and enter data. Apache can be configured to serve from a directory, which gives you web access without the overhead of forms or WebDAV. Just drag and drop files into the folder and you're done. Access control is a problem with any naive client machine running web servers. The upside of Apache is that if you're worried about access control, adding .htaccess files to a folder is straightforward.

Here's an example of how this is useful. In Propylon we have a Redhat backed server called Nimoy that has an smb share mapped into Apache document space. Folks can drag and drop dependencies for any project into that share - jar files, databases, app servers. For example with a Java project we can zip up all the jars for the project and drag them into a folder on Nimoy. An ant file can pull down the dependency zip using the get task and unzip it into the project's lib/ folder, thus keeping the version control free of binary cruft, providing a single canonical place to hold jar files, letting people get started without downloading files from all over the net or having them emailed. [Yes, Maven can manage jar files using HTTP as well.]

The next step is split the filesystem from file management altogether. What do I mean? Well, over the years I've moved away from a place where I would think hard about how to file everything away (where what I could do was predetermined by the file system at hand). I haven't be able or willing to do that for years - there's too much to classify and too many ways to classify it and I'm not paying myself to be a librarian. Then consider that folder based classification doesn't help with retrieval anyway unless you carry that classification scheme in your head all the time. Life's too short.

I prefer the fire and forget mode that is enabled by giving things URIs and putting them behind web servers. Everything else I've tried or seen was too complicated. I could imagine never classifying or sorting anything based on folders within a couple of years, preferring something like a Topic Map instead to tag the files with metadata - not that as I user I'd actually care how it's done. WinFS seems to be going in this direction, we'll see.

June 19, 2004

That's entertainment

"I remember those cheers they still ring in my ears, and for years they'll remain in my thoughts. Cuz one night I took off my robe and what'd I do, I forgot to wear shorts. I recall every fall, every hook, every jab, the worst way a guy could get rid of his flab. As you know, my life was a jab...Though I'd rather hear you cheer when I delve into Shakespeare. A Horse, a Horse, my Kingdom for a Horse, I haven't had a winner in six months. I know I'm no Olivier, but if he fought Sugar Ray, he would say that the thing ain't the ring it's the play.
that's entertainment
So gimme a stage where this bull here can rage. And though I can fight I'd much rather recite That's entertainment.

That's entertainment."

June 18, 2004

Thus sprach Bosak

John Bosak:

In reality, XML just clears away some of the syntactical distractions so that we can get down to the big problem; How we arrive at common understandings about knowledge representation

It's Sabotage

Boing Boing: Beastie Boys disc has DRM

Real programmers write tests

Tim Bray:

Debuggers are OK, but when the going gets tough, the tough use 'print'.

...and leave unit tests in their wake.

June 17, 2004

The Great Escape

Sam Ruby has some predictions about the weblogs.com migration:

A few relatively safe predictions: some of those who didn't much care about "unusual character" and escaping issues will suddenly get religion. And some of those who are devoutly religious about well-formedness will find the temptation to use a regular expression or equivalent technique necessary to scavenge what data can be salvaged and succumb to the temptation.

I'll throw some predictions into the hat - there'll be more progressive dialogue about permalinks and guids with respect to domain names after this happens.

And perhaps about data ownership.

And perhaps about the curse of popularity.

June 12, 2004

Why I uninstalled Thunderbird

I use mozilla for my mail. I figured, time to switch to Thunderbird (and my colleagues are raving about it). I installed it but I uninstalled it soon after. The reason is it doesn't auto-import from mozilla:

Note: This method brings over your mail and account settings, your junk mail training data, and filters. Make sure your POP account in Thunderbird is configured to leave all mail on the server in case you want to go back and read pop mail from Mozilla Mail. Also, copying the Mozilla profile results in a lot of preferences and files no longer needed, since Thunderbird is a mail client only. Remove redundant files at your own risk.

I guess I won't be switching right now, given that mozilla mail is already excellent. No, I am not going to write a script to do this. Yes, I am a lazy ingrate.

But, if you use Outlook or Eudora, you're covered; import will just happen for you. Go for it.

Java feed readers?

It would be nice to use a compelling Java based feed reader... any out there?

RSS Bandit gets ambitious

Update: Luke Hutteman in comments:

You can turn off the popup notifications in SharpReader through the AlertNewItems setting - do this at the top level (subscribed feeds) to turn it off for all feeds. I've also made some major threading and UI based changes to the upcoming release that should fix the CPU lockups.

Sounds good!

I'm using RSS Bandit nearly as much as SharpReader these days, which has been the reader I tend to use most because of its simpler interface. But SharpReader frequently chews out my CPU locking up the entire machine. and those pop up notifications can drive me nuts [Right-click the subscribed feeds folder and set the AlertItems property - I was looking for this under tools/options - duh].

we should support what I consider are the three primary differentiating features of the commercial desktop aggregators I've seen... - Dare Obasanjo

The RSS Bandit people want to to add WSIWYG editing (nice), NNTP (nice), newspaper views (ugh).

Having the source code at hand for RSS Bandit since it moved to sourceforge is also nice . I recommend the RSS Bandit code and project structure to anyone with a Java background learning C#.

It would be nice to see a compelling Java based feed reader...

[All of which reminds me - make a donation to SharpReader.]

June 11, 2004

Atom/RSS: relating entries and feeds

I threw an XSLT stylesheet together that maps Atom onto RDF triples. It's still an alpha, but it's producing decent information. Here are some thoughts that came out of that exercise.

Feeds and Entries

One decision to make was how to relate an entry to its feed - after all, the point of having RDF is to have a graph relating all the information items to each other and having a feed subgraph detached from the entry graphs isn't that useful!

It was interesting then to find out that Atom doesn't relate an entry to a feed. It was even more interesting to find out that none of the RSS formats do this. What they do is relate a feed to an entry. This is implicit in the XML document structure - an entry is a child of the RSS feed so we can assume it belongs to that feed. In the case of RSS1.0, a feed has to explicity state its entries using RDF constructs. On the other hand the feed document structure won't be always be around, as we'll see shortly.

Composite Feeds

There is an increasingly common syndication use case called "composite feed" (aka synthetic feed). This is a feed made up of entries from other feeds. Bob Wyman et al's pubsub service relies on composite feeds, as does javablogs, java.net and a lot of others. As RSS/Atom usage grows, this kind of feed is bound to become more common. In terms of filtering, theming and aggregating likewise content, you could make a non-specious argument that composite feeds are potentially more valuable than individual ones. Aggregator authors in principle could also find this relationship useful to avoid displaying duplicate entries (atom:id can also be leveraged for this purpose). However none of the current RSS formats will support this use case - you have to infer or guess the source from the entry URI or hope you can introspect the derefenced entry's entity body for a feed URI.

Relating feeds and entries

Here's a picture of the RSS1.0 relationship:

FeedHasEntry.png

Here, a feed points to its child entries (rss:items). This is useful, but does not cover off the composite feed case where you want a detached entry to point to its origin feed.

At the moment the XSLT I wrote inserts an atom:feed tag into each atom:entry. Here's a sample taken from the output:

  <atom:entry  
rdf:about="http://www.dehora.net/journal/2004/05/mt3">
    <atom:feed rdf:resource="http://www.dehora.net/journal/"/>
    <atom:title>MT3: are you not entertained?</atom:title>
    <atom:link>
      <rdf:Description 
       rdf:about="http://www.dehora.net/journal/2004/05/mt3" 
       atom:rel="alternate" 
       atom:type="text/html" 
       atom:href="http://www.dehora.net/journal/2004/05/mt3"/>
    </atom:link>
    <atom:modified>2004-05-21T20:57:11Z</atom:modified>
    <atom:issued>2004-05-21T20:57:11+00:00</atom:issued>
    <atom:id rdf:resource="http://www.dehora.net/journal/2004/05/mt3"/>
    <atom:created>2004-05-21T20:57:11Z</atom:created>
    <atom:summary>foo</atom:summary>
    <atom:author>
      <rdf:Description rdf:about="mailto:bill@dehora.net">
        <atom:name>dehora</atom:name>
        <atom:url rdf:resource="http://www.dehora.net/journal"/>
        <atom:email>bill@dehora.net</atom:email>
      </rdf:Description>
    </atom:author>
    <dc:subject xmlns="http://purl.org/atom/ns#"></dc:subject>
    <atom:content atom:type="text/html" atom:mode="escaped"/>
  </atom:entry>

The picture of that relationship looks like this:

FeedHasEntryHasFeed.png

Which implies that you can find your way back to the origin feed when the entry is detached from it. [By the way, there is a problem with applying this inference in the general case - kudos to anyone that spots it.]

Entry uber alles

Over the last month there's been some discussions in the Atom community that seem to lean in favour of pushing information down into the entry from the feed.

I suspect that the argument to support composite feed will only continue to grow and by the time Atom gets to 1.0 atom:entry will have to be a first class resource that can be fully detached from its orginating atom:feed while maintaining some kind of link back to that feed. Much the same can be said for any RSS format.

For Atom, one option is to place atom:feed inside atom:entry rather make the inference as I did. The more I think about it the more I think it's needed and I hope to put a proposal together for Atom soon. With it you don't need to make hazy guesses or embed RDF/XML constructs into the markup.

Ephemera

Other things have come up from this exercise. The conversion into RDF of atom:link is clearly a mess:

  <atom:entry rdf:about="http://www.dehora.net/journal/2004/05/mt3?id">
   ...
    <atom:link>
      <rdf:Description
        rdf:about="http://www.dehora.net/journal/2004/05/mt3?id"
        atom:rel="alternate" 
        atom:type="text/html"
        atom:href="http://www.dehora.net/journal/2004/05/mt3"/>
    </atom:link>
    ...
  </atom:entry>

but it's hard to know whether this is result of atom:link being something of a woolly construct or RDF/XML tag noise - I suspect it's a bit of both. Also there's more going in the RDF /XML for author than I would like; again this may have something to do with how authors are modelled in Atom.

June 07, 2004

Open Source views

They understand the real issue - it's about sovereignty. They no longer want to funnel Brazil's wealth abroad when they have a growing and excellent software community of their own. They want local people to provide service and write software for the government and industry. They want local skills to enrich the F/OSS world and build exportable skills. They have a vision for how to both enrich the culture and skills of their country while creating a power-house for the export of services in the future. They get it. -Simon Phipps
Most of the comments I've heard from folks about open sourciing Java have been negative. Hmmm... Not so much negative as concerned: Developers value Java's cross platform interoperability and reliability. They're afraid that if Java is open-sourced then someone will try to fragment the community by creating incompatible versions of Java and ignore the community process, just like Microsoft did. Microsoft did a lot of damage to the community and many developers strongly do not want that to happen again. - James Gosling

I find these views entirely compatible. Java has been a fine platform for open source and Sun does more than most to promote open source, even if it does come across as strategically confused at times.

Now, as Sun has been looking intently at open source recently, I wonder if they'll review the SCSL licence for Jini/Javaspaces?

Atom: XML and MIME

A while back I said that packaging in XML is currently Atom's biggest technical headache. Ken McLeod is laying out why this is so on atom-syntax:

Once we have reduced Atom's inlinable content types to non-schizophrenic portions, we're basically left with XML characters and XML fragments.

XML is not architected for carrying non-XML content and this seems to be generally insoluble - insofar as whatever solution one comes up with will always be inferior to MIME.

No worries then

blogs.sun.com: "It's based on Roller".

Breaking change

Elliotte Rusty Harold as usual, has been making a ton of sense on xml-dev about binary XML, while others persist on throwing the baby out with the bathwater (after having muddied the water), so they can save a few CPU cycles or bits on the wire, while remaining on message as regards interoperation.

Years from now, if XML itself becomes a useless quagmire of incompatible binary codecs, you can look back and admire the stunning naivety of it all. Interoperation, even basic interchange, requires constant maintenance and effort. Managing localized and short term interests in a networked system is effectively managing entropy. If there is a better optimal means that XML 1.0 for doing this, I'd like to hear about it.

Nonetheless, there was this:

Additional, fine. (Think XML Namespaces, XSLT, XML Schema, XML Query Language, xml:id, etc.) none of which in any way alter the basic nature of XML. - Elliotte Rusty Harold

Elliotte's rarely wrong, but he's wrong here. XML Namespaces do alter the basic nature of XML - enough to be backwards incompatible. Just try mixing non-namespaced and default namespaced markup and see for yourself.

June 06, 2004

Embrace failure

If you've become disillusioned with J2EE and you've never looked at Jini, well, you're seriously making a mistake. Whereas the J2EE spec is, what, 1700 or so pages, the entire Jini spec can be read in a couple of days. It's simple, elegant and powerful. -Rick Kitts

June 05, 2004

Open source and product business models

There's a suggestion by Geert Bevin that when big outfits contribute to open source the standard goes up and it squeezes out the smaller players. I haven't seen that. What I've seen is a growing awareness that high quality software requires dedicated people and a fair amount of time, and that there are social communities who are prepared to invest that effort. It's taken years to get things like MySQL and the Apache webserver into shape. Eclipse still isn't in shape. IIS is barely there with version 6. If you want to build quality software products you need to be aware of what you're undertaking. Thus, some thoughts on how open source can help be beneficial to small and medium product businesses.

Open source benefits

Open source offers dual opportunity for smaller companies.

First, it provides a massive pool of horizontal and infrastructure level software to choose from. If you're Atlassian you don't want to have write your own servlet engine to provide Jira - you can use Tomcat. If you're Propylon you don't want to have to write an XML aware word processor to provide Parlimentary Workbench - you can use Open Office. This isn't much different to saying if you want to write software, you don't want to have to write your own compiler. There is still some resistance to open source based on notions that "free stuff" can't possibly be good. Do try and get over this - some of the best software around is open software. Some of it is junk, but at least you have a basis for determining that.

Second, open source can help your company focus on what its core business is and what the nature of the target market is. Are web servers or rendering engines or message queues your core business? If not, are they a cost or are they an investment? If you don't know, you should find out :) If you are coding stuff that is essentially a cost item, you can consider open sourcing it, as Kevin Dangoor suggests in a comment to Geert's entry:

I'm a big believer in open sourcing things that are not your core business. I'm even trying to convince my current employer to open source a piece of software that is not part of our core mission. It's a good way to generate goodwill and potentially get some good assistance on your software. But, there still has to be a core mission that keeps the enterprise afloat.

If a business does this it will also need to follow through on building a community - it's not just about giving stuff away. Alternatively, it can kill the work and adopt an open source project instead - ideally contributing something back in some form or another. Either way, you'll drive out cost.

A small to medium size outfit will generally be targeting vertical and domain specific applications. It is a sensible long term growth strategy to focus on taking specific verticals as "beachheads", and look to launch future products or enter new markets from a position of strength. Sometimes you see horizontal plays, but they're the exception - given that they offer greater reward for greater risk this shouldn't surprise anyone. A company with a horizontal offering can end up fighting on two fronts. It can wind up competing with bigger organizations who can leverage significant technical and commercial resources once the company or its market come onto their radar. And it can see its offering commoditized by an open source project. So it might take two or three years or more for an OS project to be worth considering, but if there is a community impetus behind it, this will happen and inevitably place pressure on both market share and margins. This happens to organizations at all scales - consider the impact of Linux or the open source J2EE containers. Some of the most interesting projects in this sense are Mono, Nutch and Chandler.

Abuse of open source

What I am concerned about are open source communities being leveraged, or worse, synthesized, to fight commercial proxy wars between software superpowers. Eclipse is certainly a strategic play by IBM but at least they had the good grace to create a new project space, though you could still argue they run the risk of collateral damage to their golden goose, Sun, by promoting it. The bigger problem is what happens when that is no longer a strategic issue. Even Microsoft has been dipping its toe in the water and some of its employees are working on significant projects like RSS Bandit and dasBlog. I remain undecided about the nature of Jakarta in the ASF, but it seems that the idea of a platform/language specific community is fading there in favour of incubation and top level projects. Nonetheless, having the open source space become polluted and devalued the way the specification and standards space has would be a bad thing for the entire industry.

Products vs Services

For each significant commercial application business model there is almost certainly a competing model that would prefer to offer solutions and services for open source tools. The truth of it is that open source tends to create tools and infrastructure that are configurable. That doesn't make them products. So there is room for companies to leverage open source by productizing and supporting them, or using open source as to lower the cost and time to market of developing specific products. The former is what companies like Redhat and JBoss do; the latter is what companies like Atlassian and my employer Propylon do. People are not buying software based on the fact that you wrote the web server or the rendering engine or the message queues - 99% of customers just don't care about that stuff - unless it doesn't work. They're buying it from you because it helps them. There's a huge difference between opening your product and chasing services, and leveraging open source so you can focus on the product specific aspects which add value. Any costs driven out by using open source represent savings that can be passed onto the customer.

Finally, two things. First, in the comments to Geert's entry I think Anthony Eden mostly has it right:

One other thing: selling a service does not necessarily mean selling support of software. I think that it is a big mistake to expect to build a business around selling support contracts or documentation, and this is pretty much in line with what Geert originally said. I think it makes much more sense to sell a service, most likely web based , which gives people and companies something that they need which large companies do not or will not address. Some one consider that this is selling software, however I see it as selling a service (which is, once again, not to be confused with selling a support contract.)

Except that it's not so clear we can just say small companies should not be building products. This debate is as much about open source being something that throws into question the assumption that software products and software services are truly distinct as it is that open source hurts pure product plays in favour of service ones. Second, I haven't spoken about contributing to open source, but I think a business can do this in ways that do not mean it has turn over what is making its products commercially viable - not every use of open source implies a services model.

June 04, 2004

503 Service Unavailable

Simon Brown enters the hell that is J2EE fineprint:

The reason that we discovered this is because we were trying to figure out the best way to implement logic like, "sorry, not quite ready to process this message yet". - Simon Brown

That would be 503 Service unavailable in HTTP. I love HTTP.

I came across this problem recently in a different context to Simons' transaction case - mine was MDB throwing an exception to the container and the container continously resending the message. The simplest answer seemed to be to have retry/dead-letter queues - otherwise you end up tied into someone's JMS provider via configuration lock-in. Sometimes you don't want to have a message retried after a number of efforts; sometimes it has to run within a certain time; sometimes there's no point in resending (like invalid XML in a text message). Having retry and dead-letter queues is more work, but it is somewhat portable. Sometimes what JMS provides is too low-level to represent the application problem - there's a world of difference between a transaction and an order fulfilment.

And that's why I love HTTP (and Internet protocols in general) over APIs - quite often they have exactly the semantics your application needs.

June 03, 2004

Right Services, Wrong Tools

An entry on the nature of appropriate tools in document and service oriented systems, the importance of understanding the nature of the IT market and the need for good engineering irrespective of architecture. Finally some speculation on where an SOA backlash might come from.

Wrong tools, wrong place.

And, unfortunately, the tendency for developers to immediately see in their minds "distributed objects with angle brackets" instead of "services", and lo and behold, Web services are destined to suck. Vendors, please: don't make it easy for developers to make the same mistakes. I know it makes for a great sales story on the show floor, but have some dignity and try to Do the Right Thing instead--make the fact that this is a service more explicit, don't let developers build distributed object systems (this time using WSDL instead of IDL) again. - Ted Neward

Ted's right insofar as the tools tend to work from the objects out and that's known to be a losing approach; you should work from the documents in (but I can't believe this is news at this stage). Another, bigger consideration, is that the document oriented approach reduces the overall need for development tools. Potentially this afflicts anyone running with "a tools will save us" business model for distributed systems. Which is quite a few people.

Right tools, right place.

The upside is that we still need good tools, they're just not critical for development. Where Service Oriented (or Document Oriented or Webservices or REST style) systems do need better tools is post-deployment. There is a entire layer of application/document level monitoring and alerting needed to realize ROI on Services and make them viable. Think of it as SNMP or Syslog for Level 7 and above. This is what you see when you get past evangelism and early adoption and into the business of rolling out and running services. Since many people are still at the point of talking about and planning doing SOA we can expect this to take a while to sink in across the industry. In any case, saving 30c on the dollar during development but losing 10c on the dollar post-deployment because there is no visibility is something of a false economy.

Even so, this ignores the "rich" (complicated), but fundamentally important relationships that exist between tech analysts, tech press, vendors, system integrators and the enterprises buying the solutions. That whole ecosystem is strongly based on development tools provision (and let's face it, good demoware), irrespective of what the current needs of buyers are. I think you can look at this in two ways. First, as this huge, huge, market opportunity to deliver winning solutions and products. Second as being inevitable that systems are built out based on whatever the tools at hand allow; even if that is a precisely backwards and actively harmful methodology.

SOA != Architecture + Tools

One nightmare scenario for SOA deployments is a complete rupture between architecture and engineering, or where engineering becomes "deprecated". Here we would see more and more consultants saying just what Ted is saying (good advice), but where the tools and the development practices are stuck in a non-complimentary paradigm. Or worse, that SOA hype persuades enough people that the need for robust construction of software is somehow eliminated. Personally, while I'm seeing plenty of decent architectural thinking of late, the engineering thinking is practically non-existent, and this is reflected more or less in the use of inappropriate tools and methodologies.

After you ship, what then?

Aside from my concerns about an engineering void, if there is a backlash against SOA, it may come to pass when the cost of running, changing and managing these things is found to be exhorbitant. And I mean a real backlash, where people kill projects and stop going to the market, not the current buzz of "'SOA' is so overused, let's, like, drop the 'A'". Never mind that even the bigger enterprise IT divisions are not neccessarily equipped to be running globally visible utilties and services and that "maintenance" (as we like to call it), has always been the bulk of IT cost. Services by being much more visible and relevant to the business (that's the point, remember?) will make such costs apparent. Your CEO could be even left thinking these things are more expensive that past approaches, where the cash bleed was somewhat less obvious.