" /> Bill de hÓra: January 2003 Archives

« December 2002 | Main | February 2003 »

January 29, 2003

REST and session state

One issue with the REST hypertext model is its view on managing state. REST constrains that state reside on the client, However the real web works precisely backwards to this: all interesting state is kept on the server, as sessions. And when exprienced enterprise practitioners like Martin Fowler say that state belongs on the server, it makes you wonder whether REST has anything to offer here. On the other hand after you've built your tenth session backed web site, you start to realize that managing state for users can get complicated and expensive, fast. However, Paul Prescod had this to say on the W3C TAG mailing list:

I think that in most cases there is virtue in making temporally extended sessions into URI-addressable, HTTP-retrievable resources. HTTP does not itself have a notion of temporally extended session, but neither does it have a notion of "map" or "auction" and yet it delivers representations of resources of those types. I don't dispute that HTTP has limitations. But I think that there is a lot of "shortcut thinking" when it comes to enumerating those limitations. "HTTP doesn't have X as a first-class concept therefore HTTP is not appropriate for X." That needs to be demonstrated, not asserted.

Exposing session state as resources. That seems like a start towards understanding how REST and the web might be squared on state management.

January 28, 2003

Exceptions considered pointless

Three rules of thumb are forming in my mind around exceptions:


  • don't swallow exceptions (with two exceptions)
  • avoid using exceptions
  • avoid inventing exceptions

The first many Java developers will understand. Rarely, you know an exception is not going to be thrown, but you have to catch it anyway, and sometimes you have to catch InterruptedException as part of normal program flow.

The last two are maybe controversial.

I've never found knowing the type of exception as useful as the actual execution trace or the messages held in the exception. Usually, the package or domain of the exception, is far less interesting information than the nature of the failure. I don't care that it was a SAXException or an JMSException that got thrown, I care about the nature of the violation that caused the exception. What constraint or assumption was broken? Also having written a good bit of multithreaded and distributed/server Java code in the past, slinging exceptions about tends to be less useful and less robust than designing with partial failure cases as being normal. Protocols, not exceptions, rule when it comes to bits on the wire.

And in code, most of the time, exceptions are pure line noise. It seems OO developers have real problems designing good exception hierarchies - I suspect this because exceptions are really about execution flow and runtime behaviour, whereas OO is slightly more static, being about managing dependencies and code organization. You can't really slice up error handling into package structures and APIs the way you can Java objects, though heaven knows lots of people try to do just that (I suspect a useful side effect of the rise of AOP frameworks will be to eradicate many checked exceptions from Java).

For all the said benefits of exceptions I honestly think 90% of my coding could be done with less than a dozen types, most of which are in the JDK proper. Recently I've been finding Exception is perfectly adequate for a lot of of code; in the past I'd be scratching my head wondering what type to throw. Having said that, I would prefer a totally different system of exceptions, oriented around forms of failure instead of packages and APIs, but there's zero chance of that happening at this stage in Java's development. Today most of my headaches are dealing with third-party exception models, which do a lot to wriggle their way into my code, but very little to tell me what actually happened.

A troll from the server side

TheServerSide.com Thread - Extreme Programming Is Evil

This is the kind of flamebait I expect from /. On the other hand , most of the responses are level headed. TSS is usually better than this.

The hardest problem

Jon Udell: Ceci n'est pas une pipe

The packets blown to date on URI naming over the years, if concentrated, could comfortably take down half the root name servers. Interesting to see it spill into weblogs:

Do those URIs identify the map or the territory? We can never fully resolve such questions. Nor must we in order to build information systems that make sense to people.

Too true. And the thread is a mindbender as Sean calls it. Windley has picked up on it too:

The very essence of Applied Mathematics, and by extension, Computer Science is the notion of representation, naming, and abstraction. Confusion in these issues shows up quite frequently when students are learning about naming in a programming language theory course.

But, (there's always a but)... this isn't about making systems useful to people, not directly anyway. The whole reason this argument keeps rearing its ugly head on W3C lists is because people want machines to take on some of the inferential heavy lifting via the semantic web. And if you want the inferential machinery as being postulated by semantic web technologies to spit out something other than garbage, a URI has to be pointing to only one thing. However coordinating every web node to agree on that one thing seems like a losing approach; better to agree that there is some room for referential ambiguity and layer in the machinery to manage it. I'm on the record for making the relationship between URIs and Resource many to many, as a matter of practicality (which kicked off yet another thread on xml-dev). Some really smart people don't agree with that position by the way, but really the point of it is, as I said is a layered system.

[the second hardest problem]

January 27, 2003

review: Component Development for the Java Platform

Component Development for the Java Platform was something of a surprise. I came across it, when I was looking at the author's Jawin Java/COM bridge. My first reaction was it's just another Java book, and there are surprisingly few good Java books. But no. It's an excellent survey of lesser understood Java functionality, which to otherwise gather you'd otherwise have to spend half a day trawling the web for articles, the other half checking a flock of app servers out of anonymous CVS and the rest of the week being an inveterate spec reader. Do all that anyway, but use this book as your guide.

There are chapters on reflection, JNI, dynamic proxies, serialization and program generators. The coverage of the classloader architecture is probably the best in print. Highlights were the detailing of the jar format, object serialization and particularly the rationale behind the design decisions for Jawin in the appendix.

If you want to know what's so cool and useful about dynamic proxies, need to write a classloader or a plugin framework, this is a good bet for your next book. And I'll admit to liking this book especially because after reading it, you'll start to appreciate that Java's static type system is something that can get in the way when you want to create managed components or just write Java software that's designed to stay up for weeks rather than hours.

Finally, if there's a second edition in the pipeline, I'd hope it would include a section on JMX, and maybe Isolates and JSR175 (both slated for JDK1.5).

Scott Sterling: CruiseControl, Maven, Centipede, AntHill

Java Testing, Tools, and Engineering

Scott goes through some options for build management. And still nothing out there quite fits the bill.

Fwiw I used to use .sh and .bat files to drive an Ant setup including blowing out a project structure. That stopped when it became clear they didn't integrate as well across IDEs as straight Ant files - and when I caught myself using an uber Ant file to call the batch files to call the Ant files... well it was time to back out and start over. At the moment I'm very close to having a single ant+properties file for all projects, getting that right is the main thing (not strictly true, it's a shell file with about 5 entity includes). When that's down pat, onto a platform independent scheduler... as an Ant task.

January 25, 2003

The utility theory of coupling

Loosely Coupled weblog - on-demand web services

Phil Wainewright:

Heaven forbid, after all, that anyone should be able to link to Google's, or Amazon's, or any other provider's URI in ways that the system's designers hadn't originally thought of. That might lead to horror of horrors unintended consequences.

Fortunately, an increasing number of people are beginning to see that there are potential advantages in promoting transparency in URIs, in part prompted by Jon's experiments with his LibraryLookup project, as he describes in his column. He notes, too, that there is a perfectly viable means of ensuring clients continue to be supported when URIs change transformation: "An URL-rewriting engine could continue to support old-style links, but transform them to the new style."

Unintended consequences to one side, the thing is, scraping URLs is the antithesis of loose coupling. If that URL changes, you scraper is broken.

But maybe there's a principle at work here. The more useful something is, the more coupling it can withstand. Increased coupling is acceptable with increased utility.

It is time to bring down the walls that surround the citadel of software automation once and for all. Resistance is futile: the walls are coming down anyway. Technologists can either help dismantle them from within, or else helplessly watch as the rest of us tear them down from the outside. Which side are you on?

Utility.

Jon Udell: The name game

The name game

Jon Udell's reaction to rest-discuss' reaction to Jon Udell's Library Lookup.

To be honest, I've sometimes had trouble following the ongoing REST debate. But the words "value" and "utility" come through loud and clear. Of course, LibraryLookup is hardly the first demonstration of such value and utility. Consider, to take just one example, Erik Benson's All Consuming[6], an aggregation service that gathers Weblog postings about books. It depends in two ways on a URI pattern that includes an ISBN. When it scans Weblogs, it relies on that ISBN to identify entries that refer to books. Its own pages, in turn, use the ISBN to create referrals back to Amazon. Such linkage is, of course, the entire foundation of the Amazon Associates program.

January 24, 2003

Enterprise class

Sean McGrath, CTO, Propylon

(full disclosure: I work for Propylon, Sean is CTO)

A wee Propylon specific entry this am. Propylon's Mission Control won the Professional Innovation award at the Digital Media Awards last night. Mission Control is a pervasive application server, built from the ground up on XML technologies - long before it was trendy to do so. It is also deeply, rather than superficially based on semantic xml representations of data assets and on-the-fly XML transformations. Its gratifying to get the award but I think I'm proudest of the fact that the technology has proven itself in mission critical, carrier grade deployments such as O2..

Most of this development happened before I joined Propylon, but considering that O2 need seriously scalable and performant systems. I think the work that was done on O2 rubbishes any arguments about XML's unsuitability for high perfomance enterprise or federation class projects. And yes, we is kings ;-)

January 23, 2003

The second hardest problem

James Strachan's Radio Weblog

James and Cameron are posting very good stuff about cache management (lazyweb: this is well worth capturing and writing up for oreilly.com or developerworks). And excellent to see someone else watching SEDA. At InterX we built an event driven webserver/proxy in Java - damn hard to do, but it was fun and lot of what Miles Sabin (who architected the server and is from Brighton) learned, was fed back into java.nio. So if you've got beef with nio, you can blame him, along with Doug Lea, Dan Kegel and Matt Welsh - but getting into a technical argument about Java with those guys would be... intimidating :-)

Now if we could just get the Servlet and EJB specs off threaded architectures... ;-)

Developer dependencies

Pushing the envelope: favourite things

It's interesting to find out what other folks are using to get the job done and to write down what you're dependent on. Fwiw the tools I touch the most in work, and would affect my ability to do my job, if they weren't available or as good (or some as bad) are:


  • Ant
  • 'Perl style' regexes
  • WinCVS
  • Putty
  • VSS
  • Tomcat
  • JBoss
  • Cygwin
  • Python
  • JUnit
  • JPad Pro
  • IDEA
  • Emacs
  • TextPad
  • Mozilla
  • Trillian
  • WinZip
  • OpenOffice
  • SmartFTP

Oddly, I don't seem to be reliant on operating systems, at least no more than I am a particular PC maker; I'll take that as a blessing.

Things I'll be seeing more of this year:


  • PropelX Studio
  • Relax NG
  • JIRA
  • Jython
  • JDepend
  • Axis
  • C#
  • NUnit
  • MSXML
  • VB6
  • BizTalk 2002
  • Somebody's AOP system

And finally, things fading from memory:


  • make
  • gcc
  • Antlr
  • Outlook
  • IE
  • XMLSpy

Java generics perf test

Abort, Retry, Fail?: January 16, 2003 Archives

Good news. Diego Doval ran some numbers and reckons Java generics aren't resulting a performance hit in the early access release (heck I didn't even know such an ea was out, I need to pay more attention...)

January 22, 2003

New release of Trang

xml-dev - ANN: Trang (multi-format schema converter)

Trang can be downloaded from here:

James Clark:

I am happy to announce a new release of Trang, my multi-format schema converter. Trang is written in Java, and available under a BSD-style license. In this release, I have added an input module for DTDs based on my DTDinst program. This implies that Trang can now convert directly from DTDs to W3C XML Schema (XSD). This may make it of interest to people outside the RELAX NG community, which is why I am announcing it ...

January 15, 2003

java -cp classpath.xml ...

We're deploying a messaging app onto GNU/Linux at the moment. We had some problems, as you do, with declaring the classpath. So we were thinking: it would be just grand if you could declare the classpath passed to JDK tools in an XML config file. You could point the JDK tools to the file instead of this constant messing about with delimited path lists in batch files. Things might even become manageable.

java -cp /usr/java/myapp/classpath.xml ...

January 11, 2003

Apologies to Darren Hobbs

Pushing the envelope

F2k. Sorry ' bout that.

WS-Reliability goose chase

Fujitsu's activities > WS-Reliability - FUJITSU XML

This took a while to find, like most of the WS-* flock.

January 10, 2003

A Conversation with Adam Bosworth

article detail

Best bit:

WLDJ: You also mentioned the importance of quality of service.
AB: We believe the best way to get quality of service is to support a model with queues. You can write intelligent tools to take things out of queues and process them in the order you need. If you simply spin up threads on machines every time a request comes in, you can't scale beyond the fixed limit of the hardware. A challenge for people on the Web has been delivering a better quality of service to a high-profile customer than a low-profile customer. They're both coming in over the same Web gateway. At the end of the day, the Web service gateways are synchronous and spin up threads immediately in computers because they're synchronized. When you have a queue in the model, you can deliver very good quality-of-service characteristics. You can see how long something is sitting there. It's harder to see if it's spun up as a thread.

Worst bit:

WLDJ: Do you foresee the facility, like in C#, to store attributes as one of those areas?
AB: We've already done that in WebLogic Workshop. We use annotations and comments. JSR-175 has been submitted just for adding annotated metadata to Java. In addition, we submitted JSR-181, which is the JWS JRS itself and formalizes it for this particular case (Web services). The only difference is that in .NET it was done intrusively into the language. We do it more quietly, in comments. Endemically, in WebLogic Workshop we use metadata all over, in just that way, as attributes.

January 09, 2003

Dilbert does XP

The Extremo

Hoping this becomes a run...

Jorgen Thelin: scale this

Architectural Cross-Pollination

Jorgen Thelin on the UK national lottery network:

This was the world's largest networked lottery system when it was launched in November-1994 with 10,000 terminals. It now has about 25,000 terminals, and always has some extremely high system availability and peak volume processing requirements. Just before the ticket sales cutoff point at 19:30 on Saturdays and Wednesdays, you can pretty much guarentee that the network access will achieve close to 100% concurrency, especially for Roll-over or other special draws. Contrast this with other big networked systems of this size where on a bad day you may get up to 5% concurrency. Also, there are very severe reputational and good-will risks, not to mention customer anger, if the system is ever down - and particularly if it crashes just before the ticket sale cut-off time. To my knowledge, this has only ever happened once or maybe twice in the last 8 years.

Wow. I can't begin to imagine what this kind of capacity would cost to roll out using today's web technologies (most people might surprised at how few concurrent requests it takes to slashdot a site). I know there are huge improvements to be had in web capacity using event based rather than thread per request architectures - but maybe I've been thinking too low on what a performant system can be.

January 08, 2003

Ted Neward: AOP != interceptors

Setting the Story Straight: AOP != Interception

Good piece. This one will run and run.

And not a drop to drink

:::::___coin-operated.com___:::::

H20/IP: Using water as an organic network between two computers.

On the other hand, I have always held great hopes for using pigeons to implement TCP.

Sing Li's third JMX article

Manageability

Sing Li has written Part 3 of his JMX series "From Black Boxes to Enterprises". Despite numerous books about JMX, its the first time I've seen anyone write about actual integration with an existing network management system (NMS). In fact, the article explores OpenNMS, an open source, java based network management system. Quite good reading, just like the previous articles in this series.

January 06, 2003

Joshua Allen on the semantic web

Better Living Through Software

Joshua Allen has consistently had interesting thoughts on the semantic web. Here he's redefining (slightly) the semantic web in terms of human voices rather than truthful assertions. I guess human voice means our opinions. That's a useful reduction, and should help make RDF scale in the face of mutually inconsistent claims.

He also talks about indelibilty of information. This ignores an interesting feature of storing data digitally on the web. Web data is not indelible - in fact it's easily changed in retrospect. Consider the Church of Scientology forcing the removal of information from the Internet Archive. I suspect this a dangerous characteristic, just as Orwell said.

Now, time for some nit-picking:

RDF is often a whipping-boy, but a red-herring in this discussion. To know why, you need to understand that RDF is simply a syntax for exchanging knowledge representations, and not even a particularly ambitious or cutting-edge syntax.

I think I know what Joshua means here, but RDF is not a syntax. You need to give it a syntax like n-triples or RDF/XML, before what Joshua say makes sense True that RDF is not especially novel or expressive in terms of other Knowledge Representaion languages, but it has a strong semantic basis that syntactically driven technologies, like XML, do not.

From a minor infraction to more serious ones:

In fact, Dare Obasanjo has remarked to me long ago that XML is not really different from Lisp's s-expressions -- a point elaborated in the paper by Jerome Simeon -- so in a sense, Mark Pilgrim and the XHTML advocates are lobbying to have people write their web pages in Lisp instead of HTML.

Mark Pilgrim is doing nothing of the sort; to say so is to confuse on a number of levels. I haven't read Jerome's paper (yet), but beyond passing similarity, XML is nothing like Lisp. XML is pure syntax, Lisp has semantics due to its status as a programming language. the evaluation rules set out for Lisp are not those set out for RDF, though you could certainly use Lisp to encode those rules in a reasoner. You can map back and forth between XML and S-exps syntax, but you can map back and forth between XML and CSV files as well. In fact CSV files have much more in common with XML than Lisp has (or any programming language). Mapping XML to S-exp syntax is not the same as mapping XML to Lisp - throwing RDF into the pot only muddies things further.

Later: I followed through on Jerome's paper. Seems it's about adding a semantics to XML Schema and XQuery. So I don't need to read it in full - XML Schema and XQuery are no more XML than RDF.

# "RDF is too complicated" - This also is a very potent argument. The primary serialization for RDF is XML. This really starts to hurt your brain when you realize that RDF and XML are almost the same thing. Too much meta and your mind can't bootstrap. And the two main non-XML serializations that exist are named "N3" and "N-Triples", but bear no resemblance to one another -- a prank that lends credence to the allegations of gratuitous complexity.

No. Why? Because RDF has semantics. Those semantics are the same, irregardless of the way you decide to inscribe an RDF graph. XML has no semantics, it is at the very most, invariants imposed on a sequence of Unicode characters. RDF's relationship to XML is so fleeting as to be hardly worth mentioning, except for the fact the two are continually conflated when they should kept very distinct. Granted however than arguments pro complexity of RDF are often spurious.

And by the way, N3 is not RDF. This seems to be such a widespread misunderstanding I've marked it in red so you won't miss it. Allow me to quickly elaborate. N3 is much more expressive than RDF, although it lacks RDF's semantic precision. You can say more interesting things in N3 than in RDF. But to understand what some N3 means, you need to look at the source code to cwm, Tim Berners Lee's N3 inference engine. To understand what some RDF means you need to read the RDF Model Theory. N3 and RDF happen to look quite alike, but N3 statements are much more like a regular programming language's.

Furthermore, the existence of multiple serializations leads people to the understandable misconception that RDF is not simply a syntax for exchanging knowledge representations.

Proof by repeated assertion is no proof at all.

I'm being somewhat pedantic (but just try running Joshua's reasoning on the rdf-logic list). I'm doing this because you can't make good sense of RDF and the semantic web without some precision of thought, however irritating and stuffy that may be. The RDF community spent years in a state of confusion, talking past each other, because of fundamental errors in understanding such as not distinguishing between syntax and semantics, or use and mention. If we don't take the effort to be clear about RDF, there's little chance the machinery and information we build on it will. It's precisely because computers are so brittle to nuance and context that we need to be precise when it comes to talking about a language for interchanging knowledge between machines.

People coming to RDF, especially from XML, may well come away with the wrong ideas altogether about what RDF is; that will doom them to a frustrating experience with the technology until they take to time to pick things apart.

On the Pepys

Pepys

In the morning before I went forth old East brought me a dozen of bottles of sack, and I gave him a shilling for his pains.

Sigh, those were the days. A case of sack at Tesco is about 25, not including anything for their troubles.

OWL classloading: a language smell?

Semantic Web Research Group

More Java workarounds. This is a great example of using OWL to do get something useful done - runtime plugins (bonus points for not being applied AI). Even better example of why server-sided development today strongly favours using languages who have reflection as a first class feature and not an API -I'm pretty sure a reflection ontology isn't neccessary in Python, Javascript or S#.

January 05, 2003

Pushing tin: the slashdot effect, web services and web infrastructure.

[ t e c h n o \ c u l t u r e ]

Karlin Lillington links to a Kuro5hin article on how linking can get a site slashdotted:


The ethics of linkage. "If you read "meta" sites like Slashdot, Kuro5hin, Fark, Met4filter (natch), and Memepool you've probably encountered links to stories that you can't reach -- namely because the act of linking to a server not prepared for massive traffic has brought down the server, or worse, put the hapless soul over their bandwidth cap denying any use to anyone for the rest of the month or day or whatever time period the ISP or hosting provider uses to allocate bandwidth."

Many will know this as the 'Slashdot effect', when a thundering herd of readers follow a link on slashdot.org's site to another site, the site's server melts down and the majority can't actually hit the site. The article suggest the answer is think before you link. thoughtful idea, but not really workable.

Interesting that Alan Mather blogged recently on the UK's Environment Agency's website site going down under load, and what we (particulary e-Government, since that's Alan's field of expertise) can learn from it:

The heavy rain over the last few days has meant that the Environment Agency's website that gives details on which areas are likely to be flooded has been overwhelmed with demand and is presently down.

Alan offers three options, none of which are explicitly ethical:

1. Robust Design

If you know this kind of thing is going to happen, you design your site to take that into account. [...]

2. Centralisation

If the economics at a local level or departmental level don't justify the kind of spend on resilience that's required, then you move the content and the applications somewhere that does.

3. Syndication

The science of syndication is not well understood for things like this, but it's certainly feasible that the main pieces of content could be offered up to a variety of major sites so that no single site is hit heavily.

These are good suggestions, but remain problematic in that they treating symptoms rather than causes And syndication is better understood that Alan suggests (we'll talk about content delivery networks below), but is not yet part of the web and internet protocols.

Ironically, most have as as likely as not have more than enough computing capacity to cater for events like a website being slashdotted. It just happens that the capacity is not in the right place the right time. Limited bandwidth by the way is not a major concern. Bandwidth economics are interesting in their own right, but a lack of bandwidth is not the issue when it comes to sites falling over (albeit bandwith is not as well distributed as we might like). Instead, the problem here is very much one of deployed computing infrastructure, not neccessarily best solved by linking ethics or buying more kit.

The architects estimate (guess) the maximum traffic a website can expect and buy for that case, or as much as can be afforded. You don't dare buy for the average or median cases. Worst of all, the common case for the majority of sites is typically a trickle of hits that could be handled by a six year old computer pulled out of a skip. And there's a reasonable chance that when you do get heavily hit, your maximum estimates will be too low, perhaps by an order of magnitude. Good news, if your business is pushing tin. The end result is that organizations and individuals are paying too much for running applications on the web, organizations for server infrastructure and development costs, individuals for bandwidth.

The rest of this piece looks at two of the usual suspects for the Slashdot effect, and one that will come to town, soon enough. The point is that Slashdot itself is not a suspect; at worst it's a messenger.

The protocol

HTTP cognoscenti are quick to point out that HTTP 1.1 has a lot to say about caching web resources and that the web has scaled fantastically well. Both are true, but only on a macro level, and that even requires a specific interpretation of 'scalablity', closer to 'ubiquity' than any ability to ramp up efficiently against demand. HTTP was simply not designed to distribute load at the speed a site can get stampeded today.

The bitter truth is this: at the micro level, any individual site is punished directly in proportion to the perceived value of its content. The web does not offer an economy of scale, quite the opposite. Being popular is expensive, being very popular may prove to be a website's undoing. The deployed HTTP infrastructure is clearly not able to deal with the Slashdot effect, where the burden of cost is levied on the supplier of content. Given that the web is a meant to be a democratic, enabling medium, there's surely an impedence mismatch here.

One of the reasons the web scales at all as it does is not due to the implicit nature of the internet but due to an underground sea of machinery known as Content Delivery Networks. CDNs live in a twilight zone between the TCP/IP transport layer and the HTTP application layer, caching and moving static content around the web nearer to where the demand is. You have to pay to place content on these networks, they're not part of the web as designed. Efforts are under way to standardize CDN protocols, but ultimately this is renting tin rather than buying it and may not be the best long term approach.

The servers

Claims of the web's scalability or facilities for caching in the HTT protocol are irrelevant when your servers melt down. Beyond protocol design, the immediate technical problems with sites falling over are to do with how webservers are designed. We're only beginning to properly understand the characteristics of web topology and the nature of web traffic. Most web server software was designed and deployed before this understanding - their principles often go back to operating systems research twenty years old.

The vast majority of deployed web servers are built to use what is known as the 'thread per request' model. In essence each request is giving slices of computing resources and will time-share with others for the CPU (this model is also the basis for CORBA and J2EE server architectures, which may help explain why it can be so expensive to make them highly performant). The singlemost interesting characteristic of the model is that the computing resources required is made directly proportional to the number of incoming requests. When enough requests come in the server must either generate new threads, or quickly turn around in use threads for new requests. This model made sense once upon a for time-shared operating systems and mainframes, but much less so now for the web.

To make an analogy. Imagine a restaurant gets an excellent review in the paper. Everyone wants to go there and eat. The thing about this restaurant is that here, each diner gets a personal chef. The chef takes the order, brings the drinks, cooks the food, serves it, runs the bill, and washes up afterwards. With enough chefs on the go, they'll start banging into each other on the floor, slowing each other down, spilling drinks, arguing in the kitchen over who gets what pot, fighting over the next clean plate, not putting out the garbage as it piles up because no-one has the time. When enough diners enter the restaurant, it will either keep taking new diners until the chefs logjam each other and service grinds to a halt, or will close its doors to new customers until someone pays the check and leaves. When one or the other happens, the restaurant had been slashdotted. Everybody who gets service gets poor service, some people leave without telling their chef, whose time has then been entirely wasted making an uneaten meal, most people don't get in, and the restaurant reputation is in tatters. Sending people to another eaterie helps, and represents a clustering of servers to balance the load. But if 95% of the time you only have a half full restaurant, you have to wonder does it make sense to pay to have two or more restaurants around. and so many chefs, just because every now and then it gets seriously busy.

There is another, more complicated, but more vastly more scalable approach to server architecture for processing web requests. The model is called 'event driven' and is most commonly seen today in desktop GUIs, but is growing in popularity as a way to build web servers and I hope in the future, application servers.

It works much the way a real restaurant works, not by assiging a chef to each person eating, but by breaking the job of serving across a group of specialists who work on one part of the meal. Each specialist has an in tray and and out tray of things to do. If the head chef gets too busy, the not so busy sous chef can pitch in for a while. The end result is a better quality of service for the happy eaters and an economical basis to running a restaurant.

In other words, the event model works the way we design restaurant, factories, shipping ports, or almost any real world production systems where resources need to be used efficiently. What makes it scalable is that the work is done by specialists who don't a) get in each others way, b) are optimized to do a particular job - you can't logjam the system early the way you can with thread per request architectures. If you're concerned about avoiding the Slashdot effect in an economic way, the first thing to do before you run out and buy that clustering solution is consider whether your web server is up to the task. And if you can't, at least consider improve your existing servers' policies.

The webservices

Web services as currently designed will make the Slashdot effect worse for two reasons.

First is that the speed at which links are followed will increase. Today sites go down at the rate of people's ability to click on a link. That's quite a low rate, compared to the speed a computer can click links. Machine to machine web services will greatly up the overall clickthrough rate. We've already since this happen; Google, among others in the past, has had to tune its spiders to prevent them swarming on a site, even taking it down. The spiders turned to locusts. Another case in point are the periodic harvesting of RSS feeds. As we continue to automate the web we can expect to see an explosion of web traffic, orders of magnitude greater than todays.

The second is that basis by which HTTP can faciliate caching is violated by webservices, particulary the RPC variety. Web caching depends architecturally on intermediaries (or proxies) understanding what they can and cannot cache from entities they know nothing about. In HTTP this is possible since it has a handful of standard request methods whose meaning and implications with respect to caching are quite clear, standard header metadata for those methods and standard responses to requests whose meaning is also clear to any intermediary that is coded to HTTP. In other words, while it is not fully adequate, HTTP is designed with caching in mind and documents can be cached. Web services, particulary SOAP messages, have no such facility. Method names are arbitrary as are their reponses; there's no basis by which an intermediary could begin to cache web service requests and responses from arbitrary sources unless the webservices methods are mapped directly to HTTP. Not only this, since web services usually tunnel through HTTP they'll affect the overall quality of service on the web if they become a significant fraction of web traffic. For some this will be a disaster, for other it will be a lucrative opportunity for pushing tin. Either way it represents something I've mentioned before; you have to think differently about programming on the web scale, it's not just a extension of middleware. Arbitrary names with arbitary semantics make sense on the LAN, not on the web. Designing web services under the same principles as a J2EE middleware solution is just asking for performance and availability trouble.

Technically this state of affairs might not look much different to denial of service (DOS) attacks today, except that it will be the order of things, rather than the exception. DOS is one class of attack that is of great concern to security analysts; it's extermely hard to prevent and not hugely difficult to mount - the best known preventative measure is fail fast: to let servers fall over until the attack has subsided. In any case, that's what most of them do.

Rogue plane over Frankfurt

There's an armed man in a small plane circling over Frankfurt at the moment. He's threatening to fly into the European Central Bank tower.

5 minutes later: it's just landed.

90 mins later: feeds:
CNN
Google news clump

January 04, 2003

Tom Klaasen: false dilemma

Thought & feeling

Software development is all about programming. I don't agree on this. I basically suck at programming (well, compared to some real programmers that I know at least), but I'm great at analysis and design. (Remember my saying: in a weblog there is no place for modesty ;) ) It comforts me to think that A&D are at least as important as the actual coding. I feel a long blog coming forth from this in a not too distant future (alas, not much time now)

Coding is important. Design is important. They're so important, we should be doing them all the time. And that's very much what programming is, designing while coding. It's a mistake to separate the two.

Two quotes from Roy Fielding on SOAP

I'm blogging these quotes because I'm convinced a lot of people working in the Java, .NET and enterprise computing worlds think REST is some kind of wacko nonsense spouted by standards wonks who don't build real systems, and thus can safely be ignored. This attitude is can be neatly summed up: 'where's the REST toolkit?'. If you've ever had that thought, please don't hit the back button yet. Roy Fielding as many of you will know, is was chairman of the ASF and coiner of the term REST.

Correction: Sam Ruby pointed out that Greg Stein is the current ASF chairmen, not Roy Fielding. Thanks, my bad. Sam took a lot of flack last year from REST advocates, stuck with it, and built a lot of bridges as a result; he's a key guy working on making sure people can have their REST and eat SOAP too.

I'll be posting more snippets on REST and webservices in the future as I come across them. Enjoy.

www-tag@w3.org from April 2002: Re: FW: draft findings on Unsafe Methods (whenToUseGet-7)

Anne Thomas Manes:
"Please review the charter of the XML Protocol working group. It explicitly says that the XML Protocol should be based on SOAP 1.1 This isn't a request to rubber stamp SOAP 1.1. We want and need to improve SOAP, but we don't want to make a complete change to its architecture. I'm also pointing out some basic realities. W3C is, at heart, an academic organization. And its perfectly reasonable for W3C to pursue its academic goals (REST and the Semantic Web). But if W3C wants to play a major role in business systems, and if W3C wants to continue receiving funding from the big software vendors, then the W3C TAG must be willing to accomodate the requirements of big business. If the REST faction continues to try to undermine the existing Web services architecture, it will alienate big business."

Hmmm, well, speaking as an academic and an open source developer AND a commercial software developer, I can say with authority that the W3C was created by big businesses specifically to prevent their own marketing departments from destroying the value inherent in the Web through their own, and their competitors', short-sighted, quarterly-revenue-driven pursuit of profits. It was not created by academics. Open source developers actively opposed the creation of a pay-to-play consortium. The only reason it is at MIT is because that's what was needed to attract the people with a clue to an underpaid job.

If we are to remain silent on this issue, then the W3C should not exist.

The Web creates more business value, every day, than has been generated by every single example of an RPC-like interface in the entire history of computers. Businesses would have to be collectively insane to place that architecture at risk just because a group of software marketing giants claims that it is the next big wave. They want people who are experts on the Web architecture to stand up and defend it when needed.

The only reason SOAP remains in the W3C for standardization is because all of the other forums either rejected the concept out of hand or refused to rubber-stamp a poor implementation of a bad idea. If this thing is going to be called Web Services, then I insist that it actually have something to do with the Web. If not, I'd rather have the WS-I group responsible for abusing the marketplace with yet another CORBA/DCOM than have the W3C waste its effort pandering to the whims of marketing consultants. I am not here to accommodate the requirements of mass hysteria.


www-tag@w3.org from April 2002: Re: draft findings on Unsafe Methods (whenToUseGet-7)

The problem with SOAP is that it tries to escape from the Web interface. It deliberately attempts to suck, mostly because it is deliberately trying to supplant CGI-like applications rather than Web-like applications. It is simply a waste of time for folks to say that "HTTP allows this because I've seen it used by this common CGI script." If we thought that sucky CGI scripts were the basis for good Web architectures, then we wouldn't have needed a Gateway Interface to implement them.

In order for SOAP-ng to succeed as a Web protocol, it needs to start behaving like it is part of the Web. That means, among other things, that it should stop trying to encapsulate all sorts of actions under an object-specific interface. It needs to limit its object-specific behavior to those situations in which object-specific behavior is actually desirable. If it does not do so, then it is not using URI as the basis for resource identification, and therefore it is no more part of the Web than SMTP.

If you're coming from a CORBA/J2EE/DCOM background to webservices you really need to absorb the lessons learned by the people who built out the web infrastructure over the last decade. If what you build isn't going to be an expensive failure and your credibility is not to go down the toilet then you need to appreciate that going from distributed object computing to webservices is as big a shift as going from desktop computing to distributed object computing. There's more to successful webservices than trying to run object method calls over HTTP.

In terms of developer mindshare and understanding, REST today is roughly where extreme programming and agile methods were three years ago. It has other things in common with agile software; its core principles have been around for a long time, are simple, are mature, are well understood and implemented (but by not enough people), and tend to get ignored in favour of stuff that doesn't work so well.

Hans Reiser: The Naming System Venture

Name Spaces As Tools for Integrating the Operating System Rather Than As Ends in Themselves

Great paper. Someone should invite Reiser onto the XPath 2.0 working group.

Search and replace marketing

More News


From: T.D. Wilson: The nonsense of 'knowledge management.

'Search and replace marketing'

The review of journal papers, the review of consultancy Web sites and those of the business schools, suggest that, in many cases, 'knowledge management' is being used simply as a synonym for 'information management'. This has been referred to by David Weinberger, citing Adina Levin as the originator, as 'search and replace marketing' in reporting the KM Summit of 1998: 'Andy Moore, editor of KM World and the event's genial host, asked the group how you reply to a customer who says, "Isn't this just search-and-replace marketing?" That is, do you become a KM vendor simply by taking your old marketing literature and doing a search and replace, changing, say, "information retrieval" into "KM"? The question rattled the group. Answers sputtered forth. This was obviously a sore subject. It seems to me that there are three possible answers to the question "Is this search-and-replace marketing?" given that this question expresses customer pain and suspicion:

1. No, we've added important new features designed to help you with your KM chores.

2. Sort of. We have the same features as always but have discovered new applications for them.

3. Yes, you pathetic loser.

The first two answers are perfectly acceptable. The third is perhaps a tad too honest to make it in this imperfect world, although undoubtedly there is some "kewl" company that is contemplating using this as the center of its advertising campaign. ("Companies will love that we're being so upfront with them, man.")' (Weinberger, 1998)

I haven't heard this term before. But it's perfect for the software industry.

Refactoring databases

VERSIONARY::BLOG: interesting read on database and development

Scott Ambler has been working out database refactorings for a while now. Start with the essay database refactorings and then the catalog of refactorings for more useful information on the actual mechanics of refactoring databases.

Publius Syrus

It is a bad plan that admits of no modifications. - Publius Syrus (ca. 42 BCE)

Shark jumping Google

Better Living Through Software

And assuming that Sergey was not leading Dave on at the conference last week, they are gung-ho about allowing people to update metadata directly into Google. Am I the only person who is grasping the full potential of this?

No.

If you publish to Google's cloud, you get automatic indexing, metadata like who is linking to you, and more. And Google can add little semantic web-like features such as webquotes every few months to keep you hooked. Then, the advantages of a central index really kick in when metadata starts to explode. Obviously Google isn't pushing the "we made a better Internet" angle yet, but they could -- and the fact that they are so carefully surrounding key strategic bits of territory is not a coincidence. I think AOL and MSFT both blew it already, and the Google guys are not as "aww, shucks, we just like to write web crawler software" as they talk. Game over; the tired old Internet can't compete.

I talk to colleagues at work about search engines on and off especially with Sean and Conor. And onlist with Paul I said Google is now treated as web infrastructure. Paul reckoned you can't compare Google to something like DNS; maybe, most of us don't think much about DNS. But it seems that greatly underestimates the social importance of search engines.

What perhaps bothers me is the casual attitude we have to Google - this is a private company that has effectively, no competition (the one other company that comes to mind that gets such an easy ride these days is IntelliJ, but at least it has Eclipse snapping at its heels). We see rants incessantly about Microsoft, yet Google has web search by the short and curlys and somehow we're all comfortable with that. But Google is just another company, and least we forget, one that is not afraid to bare its teeth (the publication of this email is the point when Google shark jumped for me). Google has also shown it will game its rankings under external pressure; what would it do in time under internal pressure? I say all this while being continually awestruck at their ability to innovate without serious competition and a card-carrying believer that their statistical approach to managing information on the web will continue to wipe the floor with ones driven by logic and knowledge representation, such as RDF and OWL. Google are a class act.

At my previous job, with the since defunct InterX, I banged on about the collective insanity that are centralized search engines, even coming up with a model for distributing search index feeds. I was told, more than once, that no-one in their right mind would take on the search engines and the CDN providers, even though we were sitting on code to make it so. And after what the late Gene Kan was doing with InfraSearch became common knowledge, it seemed like it was game over for centralized search.

It hasn't worked out that way - and whether that's a lesson in believing too little in one's convictions or too much one's ability to see around corners I'm not sure about. Yet at the end of the day, having the web downloaded into a database for indexing and querying is such a insane state of affairs, it's hard to comprehend. The very fact that search engines continue to exist at all in their present form is a failure of imagination. There's so much more work to be done in web search. We already know from RSS and blogland that we can distribute content in a decentralized fashion - the question is will we distribute content indices?

One technology that might evolve into a distributed search mechanism is trackback; the sooner more people in blogland start using it and experimenting with, especially with referer ranking schemes, the better,

Here is something to think about: if you could "push" your web pages to Google to be indexed, and Google already caches those pages for access, why would you even have a web server?

So we get to what's Josh's blog leaves out. Not that we push pages into Google but that we break Google up and scatter its bots across the web for individuals to use. We then upload the locally generated indices or start moving them around the network. The key point is not pushing content but decentralizing the building of indices. The uncomfortable dependency we have on a few key engines as information hubs and brokers will inevitably become obvious, and developers will move to balance the power of search engines with open code. When indices that are built from the content models rather that presentation scrapes the search engines performs today are sent around the web the way send RSS content around today, that will be an evolutionary step forward for the web.

Finally, having a conspiracy theory coming from Joshua, an MS employee is somewhat... weird, given MS's behaviour in the past. Though I happen to agree with Joshua this time; Google is one to keep an eye on. Buy shares when you can.

January 03, 2003

Irish WISP

A flyer came in the mail today. 512kb/s broadband access over WiFi for 49 euros a month; that's roughly half the price of ADSL here. Some group called Irish WISP; I'll hit the website later. In the meantime any Irish bloggers out there know anything about them?

Tim Ewald on XML pipelining in .NET

Pipeline XML processing in .NET. with the WSE

Interesting article by Tim Ewald. It starts out described a filter model for munging SOAP envelopes Filters accept envelopes as their argument:

SoapEnvelope env = new SoapEnvelope();
XmlElement body = env.CreateBody();
env.Envelope.AppendChild(body);
TimestampOutputFilter tsOutput = new TimestampOutputFilter();
tsOutput.ProcessMessage(env);

If you're in Java, you'll see it works a bit like Servlet filters except that what's being passed around are SOAP Envelope Infosets (I'll get back to this point).

Web Services Enhancements (WSE) ship with 10 filters. You can compose these filters, and your own, into ordered collections, called Pipes:


SoapInputFilterCollection inputFilters =
new SoapInputFilterCollection();
SoapOutputFilterCollection outputFilters =
new SoapOutputFilterCollection();
outputFilters.Add(new TraceOutputFilter());
outputFilters.Add(new TimestampOutputFilter());
SoapEnvelope env = new SoapEnvelope();
XmlElement body = env.CreateBody();
env.Envelope.AppendChild(body);
Pipeline pipe = new Pipeline(inputFilters, outputFilters);
pipe.ProcessOutputMessage(env);

WSE has the right approach, but could do with some refinements (again for those in Java, Axis is also architected on a pipelined model). One criticism of WSE is that it passes around an Infoset instead of the raw XML (strings or streams), but this is consistent with MS philosophy for XML processing - don't process it, process the bound data. Technically there is one difference that results from this to the classic pipes and filters approach - the things being piped, in this case SOAP envelopes, are passed through the components of a pipe. With WSE looking like its passing around envelope references, it's more like the pipeline is moving over the envelope. Two things appear to be missing in WSE for high throughput pipelining. First is queuing of documents between filters - if you're not careful, one overloaded pipe (for example one that's reading or writing a disk resource or into a DB) might propogate a blockage upstream, possibly as far back as causing the webserver to stop accepting requests. The second comes from my experience building uber-scalable filtering intermediaries for HTTP and XML messaging. At some time, a developer will want to be able to drop down to the document, character or even byte/stream level to get things done. As long as your dealing only with Infosets/DOM that facility is locked out of the API- and herein is the downside of making developers access XML through object models.

Joey Gibson's mobile testing code

Joey Gibson's Blog

I just had to share this. I am working on a very large project using BEA's WebLogic 7. This project takes a good 10 minutes to go through an entire compile/deploy cycle. This is a real hassle when I need to noodle something out, test some idea, etc. I use Jython for all manner of nifty things including testing things from the client side of my app, but what if I wanted to test something from within the container? Sure I could write a Cactus test (and I have written many), but adding/changing a Cactus test results in a compile/deploy cycle. What I needed was something better.

What I came up with is a truly (dirty) hack that is totally insecure yet really useful. (IOW, don't put it on a production server.)

I embeded a Jython interpreter inside an Apache Axis web service. I then have a short Jython script on the client side that sends an arbitraty script to the server, which is executed and then the stdout/stderr are bundled up and sent back in the SOAP response. I think this is really cool.

Speech Acts in ebXML

From a comment on my predictions, Bob Haugen provided this link to Anders Tell's page on
speech act theory in ebXML. I completely missed this having only followed the ebXML messaging spec. Thanks Bob!

Newkirk and Vorontsov: Using Metadata

Later: bad link day. I linked to the wrong article here. James pointed out my link to Bob Martin's book was wrong, and at lunchtime I found that my gutter link to Uche's weblog pointed to the the main page here. Let's try again.

Using Metadata is in fact a short piece by Martin Fowler which looks at code generation v reflection. It's a good read, but I'm not sure where the metadata bit comes in.

On the other hand, The effect of .NET attributes on design. by Newkirk and Vorontsov is about how using .NET attributes for metadata affect your programming. Might be interesting to those in J2EE-land who think about how to add metadata in one form or another to objects, or are watching whether the meta keyword makes it into the Tiger JDK through JCP 175.

Personal License for IntelliJ Idea

The Occasional Blogger

Just ordered mine... eek, people are involved, and they'll get back to me!

back on java.blogs

java.blogs - Days Entries: 2/1/2003

I've had two entries just appeared on java.blogs, the first in a few weeks. Through a mail from Mike Cannon-Brookes, I know they had a problem picking up some feeds - by the number number of entries coming through in the last few hours, looks like it's being addressed. Well done!

is XPath-NG a fork?

James Strachan's Radio Weblog

On the subject of XSLT/XPath forking, I do think XPath 2.0 is a huge disappointment. Most developers I know will be giving it a wide berth. XPath 1.0 is a thing of beauty - sure its got a few rough edges here and there but it hits the sweetspot of being an excellent expression language for working with XML without being too complex. XPath 2.0 is almost the whole of XQuery which is way over the top and too complex.
Its looking like already XPath has forked with the formation of XPath-ng.

James has me thinking. Is XPath-NG a fork? I think Uche would object to calling XPath-NG a fork. There was some kerfuffle a while back on xml-dev about whether it was ok to use 'XPath' in this case. Some though it wasn't, Uche claimed it was no different than JDOM. Happily that thread died off before turned into an RSSesque melodrama. To paraphrase Kevin Burton, "forks are much worse".

Forks aside, I suspect it does confuse people when multiple, somewhat related technologies play capture the flag with names. It seems brand recognition is just as important to the technically minded as the marketeers, albeit a gut reaction.

Later: yes, XPath 2.0 is a huge disappointment. Smart people who care about XSLT are working hard on finding a middle way; we'll see how it pans out.

Sing Li on JMX

developerWorks : Java technology : From black boxes to enterprises, Part 1: Management, JMX 1.1 style

I picked this link up from a blog and have lost who's (maybe manageability's). My bad. The OB said Sing Li is one of the best tech writers working in the Java field.

Update, Jan 4: Carlos Perez (manageability) was kind enough to send me the link: from black boxes to enterprises

I consider Sing Li as being one of the best Java technical authors out there, I came to this conclusion after reading his JXTA series of articles. Sing Li has also written books on JINI and JXTA, however I'm pleasantly surprised that he's now writing about JMX. I mean, Java manageability isn't as cool a topic as JINI or JXTA, but who knows, there might actually be something cool about Java manageability.

Too true. I have three Wrox Press books - two of them because Sing Li was a writer, the other because Danny Ayers was. Sing Li is one of the first ports of call when it comes to understanding JavaSpaces and JXTA.

I've been trying to get a mental hook into JMX on and off over the last few months, mainly by reading the JMX spec and JBoss code. These two articles have been a real help. Highly recommended.

January 02, 2003

Scott Sterling tweaks JDepend

Java Testing, Tools, and Engineering

Scott Sterling has a made small but excellent hack to JDepend so as to look at the java.* packages. It turns out that java.lang has dependencies on the likes of java.io and java.awt. Probably that's worth filing as a bug report; on the other hand one could argue that the JDK itself is the granule of release.

Let it be said, there are disclaimers on JDepend and in Martin's work pointing out that dependency analysis and metrics are imperfect, meant to be used as guides and taken with a grain of salt. I agree and understand that, but I also see a lot of sense in Martin's work that I think isn't generally applied or followed in the software business, but should be. A big problem, as with many "advanced" engineering tools and techniques, is that most people know nothing about them or have only an academic knowledge of them.

Yep. On the other hand, JDepend is incredibly useful as a compass for big refactorings. I ran it over a messenging hub we've built at Propylon. This project has seen two short, intense bouts of work during the last six months with some simmering inbetween. To go forward I felt the package structure surely needed cleaning up (in particular to move stuff around to enable plugin transports). Now, before I ran JDepend over it, I wrote down where I thought the packaging hotspots would be - this is not a huge codebase, I'd guess less than 20,000 lines including JSP bits and bobs, but it has a lot of packages. It turned out I was about half-right, which is to say I was as good as flipping coins. Once you don't take the numbers too seriously JDepend is unquestionably your friend, the I Ching of dependency management in Java. Kudos to Mike Clark and Bob Martin.

Btw, Scott's looking for exempler package structures. As I recall from before Christmas, it won't be Catalina, Axis, or JBossMQ, but JUnit has decent numbers, as does James Clark's Trang and Jing.

Agile Software Development: review

I spent last year eagerly anticipating reading Robert Martin's Agile Software Development. Uncle Bob as he's known, has been a long time evangelist for Object Oriented programming, Patterns and more recently, Extreme Programming. Indeed he's successfully converted his business around XP; the next time someone tells you XP doesn't cut it commercially, point them at http://www.objectmentor.com.

We'll begin by saying that this will be an initially controversial text that will become a classic, one of less than a dozen texts in the last ten years worthy of the term, such as Design Patterns, Refactoring, Programming Pearls, and Effective C++. At 500 pages it looks unthreatening but this is an information rich book that will require more than one sitting. It's not an on the job book, or a howto guide. Instead it's very much one you will read and savour with a cup of cocoa or a good single malt, and reread over the years. It's rare that we point to a technical book these days and say that. Bob Martin recommends certain paths through ASD depending on who you are. No matter, I recommend starting with Jack Reeves' essay in appendix D, it will set the tone for the rest of the book.

ASD is a concrete, unapologetically technical book of great breadth, depth and enormous wisdom. It's chock full of code, case studies, objects, patterns and hard-won experience. Woven throughout is the emphasis on agile practices such as unit testing, refactoring, continuous integration and managing the reality of changing requirements. All this plus chapters on UML and managing code releases. What's most delightful about ASD is the absence of posture and mindless evangelism that accompanies software methodologies. You come away with sense that everything in this book is here because it works, not because it's supposed to work. One area where the book is uncompromising is that the purpose of a software project is to only ever deliver a working software system as defined by the customer. Delivering such software and only such software is what defines a successful project - everything else is secondary to code. Those steeped in traditional SoftEng methodologies and the variations tailored for enterprise and federation class projects found in the big services organisations and corporate IT departments, where resisting requirements change, paper deliverables and the overall process are highly valued will have trouble with this premise. Even if you are comfortable and happy with high ornament approaches, this book is still worth reading for its insights into developing OO systems alone (it's probably going to become an authoritative work on OO), and finally if only to see how the other half live. Another less stringent viewpoint is in Bob Martin's take on OO itself. To him, OO is primarily a technique for manging dependencies in software. He's held this view for as long as I can remember, but those who think of OO as an analysis and modelling paradigm may find it a little strange.

ASD is a treasure trove for any developer, but it will be especially useful and enlightening for three types of folk. First the ex-technical who have either moved into purely architectural or consultative roles, or have become engineering VPs, and need or want to get in touch with cutting edge development practices. Make no mistake, this is the direction software development is going to take over the decade, and this book perhaps better than any other in print will explain agile development and how it works to manage change in your terms. Second, developers who have a good grasp of OO and Pattern based programming but maybe can't square agile techniques with the popular emphasis on upfront design & architecture, requirements lockdown and generally being expected to see around corners. If this is your working environment and you can't adopt agility wholesale, or you find the agile mindset alien, ASD will help you both cherrypick, then articulate. Third and most unusually, it will benefit those who have already been using agile methods for some time but aren't sure how software design, and particulary design patterns and UML are supposed to work with these methods, particulary XP and SCRUM. ASD will help find a balance.

What's not to like? The one part I didn't like was the tale of two companies- it's been floating around the XP community for a few years now and most people there think it's hillarious (oddly I've come to find it tasteless, although I don't know anyone that shares this view). Thankfully it's tucked away in an appendix. What's missing is the chapter that tells us how agility deals with legacy systems, in particular databases. In fairness not just the agile movement, but the industry as a whole still isn't sure how to splice new code with the old. Nonetheless working with legacy systems is increasingly the norm today as organizations seek to keep IT costs down and justify expenditure by maximising what they already have. It's also true that legacy systems, being invariably difficult to test and integrate against, are where agile approaches can logjam.

SOA Papers

SOE

SOA, stands for Service Oriented Architecture. Also, Doug Barry: Getting Ready for a Service Oriented Enterprise Architecture

Dead of Night

I stayed up to watch the second Planet of the Apes movie. Either there was a schedule change or the RTE guide got it wrong - it wasn't on, Dead of Night was. I haven't seen it in years. It's perhaps the scariest most unsettling movie I've ever seen - filmed back in 1941 no less. I remember forgetting its name and posting onto Usenet years ago to ask for it. And a few months ago, I went searching on Google for that Usenet post because I wanted to order a copy- I'd forgotten the name again. I never got around to that order. It's been sitting in my Amazon shopping basket since.

Tonight, very late, I finally ordered the movie, with The Big Sleep, along with what I intend to be my single tech book order for 2003. Tonight, before I knew it was on television. If not when it started, then only a few minutes before. Coincidence? Well, let's just say that I can see the goosepimples on my arms as I type this.

The ventriloquist's dummy is downstairs, shouting...

January 01, 2003

New Year Resolutions

Arrgghhh...

Must haves:

  1. Spend less on more; I am waste.
  2. Get a driver's licence. I am denial.
  3. Spend more time with my family. I am sad.
  4. Get fit; mean fit; military fit. I am weak.

Nice to haves [you can tell, as they require commentary]:

  • I am bibliophile. Buy no tech books. [I have the opposite problem to most developers, too many tech books bought, too many to read in any case. I worked out recently there's about a 6 month backlog on the shelves in front of me, not including a folder stuffed full of papers and OS code. Tonite, I place my one and only book order, for 365 days. What cannot be described, here, is the pain.]
  • I am tabloid. Work on my writing and email style: it's obnoxious, opinionated, critical. [I'm not obnoxious, opinionated, critical, no more than the next guy anyway. Shit, I guess that means practice. ]
  • I am the Great Irish Novel. Write the book. [But which book to start with??]
  • I am touch-line judge. Evangelise some technology, with implementation. [I don't contribute code, the technologies I care about, receives no love or toil from me, only platitudes]

Ouch. That was hard to put down.

ISBN linking: architecturally challenged?

Loosely Coupled weblog - on-demand web services

Very cool. Or is it?

The experiment began with the discovery that the ISBN numbers used to identify books in Amazon's URLs can be linked to the same information in the online catalogues of US public libraries.

Ah, hold on a moment. Amazon does not use ISBN numbers to identify its books (at least not publicly). It uses http: URLs- if you look in those http: URLs, if you parse them according to Amazon design (not the grammar of the http: URL, note), you'll find an ISBN. One thing me strikes about this. The opacity axiom of Web architecture mandates that URIs are opaque. This means we don't look inside them for information, we just use them as unique, blank identifers, or as is the case on the web, as unique, blank locators.

I've said before, on a few lists, that this opacity isn't tenable as an axiom, if we believe 'axiom' here is meant as axiom, ie a self-evident truth of the web and not just a good idea (Jeff Bone has been saying it isn't tenable for a lot longer than I have). Sure, opacity of keys is a best practice in systems design; putting semantics in identifiers, or using a key with semantics in one system as identifiers or keys in another, has always been a bogus idea. But an axiom? Hardly.

So, what Jon Udell is doing is cool and has been blogged extensively as being cool, but architecturally it's a bust? Mmm... but even assuming that the opacity axiom is merely a best practice, does that mean Jon's hacked an antipattern together, despite the utility? Jeez, I'm being a pedant, but consider what's Jon Udell's done against the wording of the axiom itself:

The only thing you can use an identifier for is to refer to an object. When you are not derefencing you should not look at the contents of the URI string to gain other information.

Craig Johnson has described this work as an exemplar of RESTful design:

So, given a creative soul (Jon Udell) and some common sense (that URL looks like it has the ISBN embedded, wonder if I can use that to see what's in a library), and some readily available tools of the web (bookmarklets), it is possible to use sources of information in an interesting new way with wonderful interesting effect. Is this the best possible end-all and be-all solution to linking repositories of books (libraries in this case but could just as easily be used book sellers for example) with book information repositories (Amazon reviews, BN reviews, could be my own personal reviews as well). No, and that is the point. That is where REST and agility come together in this case. Particularly that part about "Working software over comprehensive documentation".

I'm not at all sure about the RESTfulness of this app, then again, I don't recall Fielding's thesis saying anything contra burying information like this in identifiers. I'm too lazy right now, post christmas to go checking, but I will ;)

I am sure of this: what Jon's done is usually called screenscraping. But does it matter if it's useful? I think not.

Predictions for 2003

30 something predictions for 2003.

  1. Java:
    • Developers and architects cease to use stateful and entity EJBs in their designs.
    • JMS and Message Driven Beans get overused.
    • C# and VB.NET continue to gain ground on Java; the JCP moves to an open source model in response.
    • The J2EE value proposition continues to struggle as people find cheaper and faster ways to build out middle tiers.
    • In-container testing becomes a priority - a tester role is added to the J2EE spec.
    • Regular expressions, nio, assertions, aspects and generics breath life into Java projects.
    • Java increasingly becomes a target language for little languages.
  2. Web, Web Services:
    • Versioning becomes the other hardest problem in webservices, along with security.
    • People start to become openly frustrated with using SOAP XML documents as message envelopes.
    • REST goes mainstream, under the guise of the 'doc/lit' style- RPC becomes a failed model for integrating over public networks.
    • Reliable messaging becomes a stick to beat HTTP with.
    • The Weblog supplants the homepage as the way everyone provides a web presence. ISPs offer weblogs in their packages.
    • How information flows through weblogs and the graph characteristics of community weblogs receives great attention.
    • People start talking about using something called performatives in conjunction with doc/lit style services.
  3. Semantic Web:
    • Ontologies are not as useful and more difficult to design than first expected; the upfront costs of creation become a concern.
    • Specialized metadata formats continue to be favoured over RDF.
    • RDF is used for systems integration.
    • Machine learning comes to the semantic web.
  4. Agile methods:
    • Agile methods, test first and refactoring will all go mainstream, as clients and corporate in-house teams realize it's cheaper and more effective to have systems built that way...
    • ... and webservices based integrations and tighter margins from fixed price contracts coerces software houses and consultancies to move away from traditional project management approaches, as the business models built around them become untenable.
    • US and European software houses declare agile methods a compettitive weapon (largely against outsourced Indian and Asian development).
    • Pair programming gets (quitely) de-emphasized in XP - solo coding on the job becomes ok.
  5. Wireless:
    • Mobile games will not be as big as people think...
    • ...although a market will appear to provide multiplayer infrastructure.
    • Wireless and weblogs continue to synthesize in interesting ways.
  6. XML:
    • Users of of W3C Schema make all the same mistakes OO users did 10 years ago.
    • XSLT/XPath are forked.
    • W3C Schema is subsetted.
    • Use of entity references for characters becomes an antipattern.
    • Digitally signed XML becomes prevelant.
    • Pipeline processing goes mainstream - performance worries slow adoption.
    • A binary XML note reaches the W3C and a working group is formed.
    • Ant and RSS become far and away the most popular XML languages.
    • DTDs remain relevant.
  7. Finally, Lisp continues to be the Latin of computing.

Opening quote for the year: If you don't like change, you'll like being irrelevant even less.