" /> Bill de hÓra: October 2003 Archives

« September 2003 | Main | November 2003 »

October 28, 2003

JVM ascendent

Brian McCallister

What's growing? Ruby, Python, Java as platform (not the language). The scripting languages need no explanation. The Java as platform might: the most real innovation happening in Java is not happening in the Java language, but in other languages that run on a JVM.

My thoughts on this: The Java Ceiling. See also: Groovy (but Jython is there already).

Understanding REST

Blogbody: A Bit Harsh

But really, I want to make it clear that I do understand the point of REST -- I just don't see it as a viable alternative as long as there aren't really good tools available.

That's like saying Design Patterns aren't viable as long as there aren't really good tools available ;) Maybe Patrick does understand the point (if there is one) - that REST is an architectural style, not a toolkit. Not that he cares, as he says elsewhere :)

What I'm saying is that for a concept like REST to really take off, there needs to be a large and complex framework available that makes it easy to expose complex resources -- not just simple CRUD operations thinly masked over a database.

The last thing the industry needs is another large and complex framework, that just gets you things like SOAP RPC-encoded. But for what it's worth, maybe this REST stuff isn't going to catch on until we have a Web Antipatterns book to wave at people.

October 27, 2003

I know, I'll use a framework

To paraphrase Zawinski: Some people, when confronted with a web application, think "I know, I'll use a framework." Now they have two problems.

I've spent the afternoon reminding myself how struts works. Here's what I had to do to spark up a login page that doesn't check anything or log anyone in.

  • write 2 java classes
  • write two jsp pages.
  • link them all up in a struts xml config file.
  • deploy the lot using a servlet config xml file.

Some initial thoughts:

  • This stuff is going to be hard to test
  • Splitting Actions and Forms isn't making sense to me
  • Looks like there's going to be lots of downcasting - bad sign
  • No interfaces
  • If I'm not careful, I'll drown in my own Action hierarchy
  • perform() is invoked but execute() is not (this is 1.1)
  • magic .do extensions are being added to my html - ugh

I chose Struts for a console/admin webapp over WW/Rife/Spring because I'm the only person at work who knows WW/Rife/Spring, I'm liable either to be pulled out of that work unit or be forced to delegate it, and we have about half a dozen folks who know struts well, will get the job done if needed, and don't whinge as much as I do about frameworks. But here I am, and I'm already thinking about ways to abstract struts out of existence or hook in WW/Rife/Spring, but in a way that will let someone else work within struts. Nonetheless, thank goodness for Ted Husted.

Anyone out there integrating struts with WW/Rife/Spring?

October 26, 2003

The Uniform Interface

[warning: this is rant, essentially correct, but a rant nonetheless]

REST advocates (myself included) are big on uniform interfaces. Not just REST advocates though: SQL advocates, CVS/SVN advocates, TupleSpace advocates, Browser advocates, RDF advocates, Speech Act advocates, SOA two-pin advocates. I like to think that the lesson of the uniform interface was learned for the last time in the middleware and web industries in the flight from SOAP RPC-encoded to Doc/Lit. If you bet wrong on RPC-encoded a couple of years back, Doc/Lit is like having your cake and eating it - it's something much better and you get to avoid admitting you willfully ignored the RESTful option.

Now what about user interface design? Every software application I own has a bespoke interface. No doubt this has a lot to do with differentiating oneself in tough markets, and I do like variety and novelty, but come on. These apps are needlessly different. Mozilla mail is nothing like The Bat!, which is nothing like Outlook. Yet they are essentially identical tools in the same way cars are essentially identical tools. The difference between car-makers and software-makers is that if sofware-makers made cars, they'd think they were doing a bang-up job by putting the brake above your head. Strangely, car-makers have found saner ways to compete with each other and enthrall customers.

And yes, some tools need to reflect highly specialized domains, but even within such a domain, you'll regularly see Really Stupid Differentiation. The dirty secret is that these interfaces are there for behavioural lock-in, a complement to data-lock-in, something a vendor does to heighten what Shapiro and Varian call "switching costs". The other dirty secret is that as software technologists, we deliberately reflect how insanely complicated writing software can be in the applications. To you, it's just a damn button, what's all the fuss? To us, it's six-months, round the clock. Security is also like this - in a world obsessed with security, all the really good security technologies are lying idle because they're inconvenient on the users.

It's far, far easier for the developers to call the users stupid, or for the execs to fall victim to the Innovator's Dilemma. Think about folders and filesystems- it's the 21st Century and I'm still having the details of how my computer works forced down my throat every time I have to open a file or have to sort my mail. Why am I doing this, instead of the computer? After all, at the end of the day, there are are limited number of actions I perform on my data. I join some things together, read or write something, or maybe play something - often these amount to doing the same thing. While I've heard and read a variety of theories as to why we're stuck with god-awful user interfaces, I think the answer is simple: hardly any-one has the guts or imagination to make a simpler, uniform interface.

I hate learning new applications that are doing the same job as the old ones - that's such a timewaster. I only sort, sift and organize because I have to. I loath doing it. I probably have as much information across my computers right now as the Internet did twenty years ago.

I can't be really expected to manage all this data by myself, nor can you. Badly designed software applications are not helping.

October 18, 2003

The Web scalability myth

O'Reilly Network: The PHP Scalability Myth

The ideal multi-server model is a pod architecture, where the router round-robins each of the machines and there is only a minimal session store in the database. Transient user interface information is stored in hidden variables on the web page. This allows for the user to run multiple web sessions against the server simultaneously, and alleviates the "back button issue" in web user interfaces.

Well said. Let's take it a step further. The ideal multi-server model is where state is manged on the client. It's the one place on a client-server or service oriented network topology where a single point of failure is ok (think about it).

Now, this idea, this management of state of the server is also where the deployed web (HTTP +browsers) breaks with REST architecture. REST advocates (including me) are happy to point out that the web is the epitome of a scalable, flexible system. But this has been helped by spending billions of dollars on scaling sites to manage session state, an invisible web of content delivery networks and geographic caches, some questionable ideas, such as DNS round robin, session based URL rewriting, and cookies, and any number of hacks and compromises in millions and millions of lines of application sofware to cater for primarily one thing: sessions on the server.

If you're running a web site or service, this is costing you a lot of money - very possibly the bulk of your development and running costs are sunk in making sessions scale up and out. It costs you more as you get popular - for a site there is no economy of scale in the current web (which is why web based business plans derived from the economics of broadcast media often go to wall). Sometimes we call this the curse of the popular, denial of service, or in vulgar tongue, the slashdot effect. All those servers you shelled out for are idle almost all the time, yet the day you do get slashdotted, you won't have enough computational horsepower to hand (cue business models for P2P and utility computing).

And as far as I know, REST advocates (including me) have no good answer to change the state of affairs on the deployed web, other than to encourage people to avoid state where it's not needed (it's often not). We get enough grief from WS middleware types as it is, and wouldn't want to goad them by making insane arguments, for example fixing every browser on the planet to store user sessions, so that it became your shopping cart, not Amazon's *.

Honestly in the long run, this problem may only go away as lesson learned from P2P, Grid, telco and utiltilies architectures are absorbed in mainstream web development. I imagine this will happen via SOA projects. There is already considerable interest in Grid and utility computing as complements to the web for SOAs, and P2P can't be far behind.

In the meantime there are web servers like Matt Welsh's seda and Zeus that can help alleviate against the slashdot effect. We built a blisteringly fast web server at my previous job, architected by Miles Sabin (one of the java.nio architects along with Matt Welsh), but it never made it to market - today it would make a fantastic basis for a SOAP router or XML content firewall.



[*] This issue of the deployed client base also relates to a debate that occured in Atom - whether to use PUT and DELETE, or just POST. In my mind there is zero technical justification for using only POST. But there is a key practical one, which is brutally simple- the HTML spec, and therefor browsers, don't support form upload with PUT and DELETE, so what's the point of specifying a technology almost no-one can use? My answer is that blogging and RSS represent green fields in web development and don't have to be considered in terms of legacy browsers and bad decisions in web specs, but not everyone agrees with that.

October 17, 2003

80% done

Bob Martin:
"[...] They begin with a date. Let's not kid ourselves, all projects start with a date -- probaly before they have requirements -- probably before they have a name. An endless stream of requirements follows. A project plan is put together, and then reformed and reformed until it meets the date. Then the project is launched, and from a management point of view it goes dark.
Managers ask "How's that project going?" The answer: "Pretty good." If you want a more detailed answer it will be something like this:
  • (10% in) "We're currently building data models. We're about 80% done with them.
  • (20% in) "We're currently building use-case models. We're about 80% done with them.
  • (40% in) "We're currently building class models. We're about 80% done with them.
  • (60% in) "We're currently building sequence design models. We're about 80% done with them.
  • (80% in) "We're implementing the necessary infrastructure and architecture components. Were about 80% done with that."
  • (90% in) "We're starting to implement the main features. They'll be a snap because we've got all the design and architecture built. We're going to make the deadline."
  • (110% in) "We're about 80% done with all the features."
  • (120% in) "We're about 80% done with all the features."
  • (130% in) "We're about 80% done with all the features."
  • Repeat until complete or cancelled.
I realize that this sounds flippant -- and it is -- but it also strikes too close to the truth for a vast number of projects. No real data comes out of the project, and so there is no way to make any management decisions. Projects that produce no data cannot be managed. Period. [...] "

October 16, 2003

What? Unicode?

Ted Leung: There Ain't No Such Thing As Plain Text

I got my introduction to character encodings and Unicode the hard way, when I was working on XML. Joel Spolsky has written a good introduction.

I'm glad Joel Spolsky is blogging about this because now tens of thousands of developers will realize their XML generation code sucks.

If you're not up on this issue, you need to be. If want more detail, you can read Unicode: A Primer is a good book.

Second the book. This is also worth printing off: A tutorial on character code issues, As is Uche's Proper XML Output in Python. Tim Bray also wrote a fine series on encodings not so long ago.

My take all this urging, is that while it's good to insist people know Unicode, please keep in mind it's somewhat difficult to ThinkUnicode at the start. I mean difficult in the same way event driven programs or parallelization can be, except with Unicode you have to learn to stop believing your eyes (literally).

[I'd probably give a present to the person who wrote an IntelliJ plugin, something like a hex viewer, to display XML files as Unicode code points.]

And while we're on the subject: in Java, as char represents a UTF-16 codepoint, not whatever we grew up thinking it was (a character probably), that makes String a UTF-16 codepoint API - an undocumented one naturally :)

[bubba sparxxx: bubba talk]

On the job: recommended reading

These are the books I seem to use the most in my work in the last few years. They're not neccessarily my favourites, or ones that I'd consider classics, but they're never far out of reach, and are each hugely useful.

Refactoring
Refactoring is the best book ever written for programmers in the trenches. Seriously, if you don't have it, get a copy. It will help you keep a codebase under control like no other text.

Patterns of Enterprise Application Architecture
There's a good amount of bunk spoken about middleware and enterprise computing. And dogmatic bunk at that. This book cuts to the chase and focuses on software techniques to help get enterprise solutions under control. As an added bonus, Martin Fowler is also one the best writers in the industry.

Agile Software Development
There's even more bunk spoken about agile. But you can just call this Sofware Development - it's the sanity check to the reality distortion fileds created by every softeng book you ever read in college. Years and years of refined wisdom and experience - if I could make every developer and technical lead out there read just one book it would this.

XML in a Nutshell
Very useful book to have lying around, once you've gotten over the initial XML technology curves.

Python Essential Reference
Love this book :)

Mastering Regular Expressions, Second Edition
Not just a classic, not just making a sometimes dull subject a pleasure, but updated to cover regex technologies for most of the languages you're likely to use on the job (Java, C#, Perl, Python, shell).

Algorithm Design Manual
Probably the most useful or even the only book available on applying algorithms to problems, as opposed to problem spaces, although those into algorithms may prefer something else. Like the Owl book, makes a potentially turgid topic interesting.

Structure and Intepretation of Computer programs
If there's a better book about the art of programming, I'd love to hear about it :) Don't let the Lisp put you off, I never use Lisp at work either, but my copy is falling apart nonetheless. Every class of problem and technique you're likely to come across is in here.

October 13, 2003

RDF, pedantry, and the web

Warning: of interest only to people who care about RDF graphs, are comfortable with jargon like "surface syntax", "model theory" or "entailment", web architecture minutiae, and have heard of a guy named Tarski.

I think that what the semantic web needs is two rather different things, put together in a new way. It needs a content language whose sole function is to express, transmit and store propositions in a form that permits easy use by engines of one kind and another. There is no need to place restrictions or guards on this language, and it should be compact, easy to use, expressive and syntactically simple. The W3C basic standard is RDF, which is a good start, but nowhere near expressive enough. The best starting-point for such a content language is something like a simple version of KIF, though with an XML-style syntax instead of KIF's now archaic (though still elegant) LISP-based format. Subsets of this language can be described which are equivalent to DLs, but there really is no need to place elaborate syntactic boundaries on the language itself to prevent users from saying too much. Almost none of them will, in any case. Pat Hayes

Mark Baker was wondering:

Self-description and namespace mixing If I produce a multi-namespace document, am I automatically importing the entailments of those namespaces? Dan Connolly says yes (at least for RDF Schema), and I disagree with him. But I lack the background in this space to be able to convince Dan (or even myself, for that matter). It's just a hunch at this point, but the issue has very important consequences, especially to REST which requires self-descriptive messages.

Let's get "entailment" straight. Entailment has to do with true sentences (or formulae) in a formal language. If a sentence A, "entails" B, that's to say "when A is true, B is neccessarily true too". Then any interpretation which holds A as being true, neccessarily holds B as being true. Roughly, for our purposes an RDF graph is much like a sentence. Contrariwise, there can be no intepretations where A is true and B is false. Indeed, searching for such "nonsense" interpretations is a technique to determine the internal consistency (or not) of a formal language.

If we produce a multi-namespaced document, we don't import any entailments. Namespaces don't imply entailments. There's no notion of namespaces or QNames in the RDF Model. They're specifically a hack to get URIs into XML, for some definition of hack. Or, we could reasonably say that namespaces in XML are a surface syntax macro without which we couldn't use XML to ship URIs around. In themselves they have no bearing on the RDF graphs being shipped about. And they certainly have no bearing on the RDF Model.

Now suppose we dispose with the namespace macro for minute and said we produce a multi-URIed document. Strictly, we still don't import any entailments, because URIs don't suggest entailments, sentences (graphs) do. Also, while abstractly URIs are terms, within a document they are simply marks and as such have no semantics.

We imply entailments, not through the use of terms, but by announcing the formal language of discourse. When I say I'm speaking OWL you may assume the semantics of the OWL language as expressed through the sentences I impart to you (becuase in turn you assume I wish to commmunicate clearly). Once we have shared semantics we can begin to agree on things like entailment. But, as a practical matter we might want to use URI terms to do exactly that (importing semantics), if it turns out the mimetype mechanism is unsuitable to describe semantic web languages.

One approach is to say that each semantic web Model Theory ("MT"), a theory about formal language, gets its own mimetype. In this approach and with respect to the web, the semantics of something like RDF/XML is defined by fiat - whomever defines a mimetype for RDF/XML gets to say that the RDF MT applies, and it's up to the rest of us to follow that convention or not. Now on the web, we can drop some OWL into an RDF graph, serialize it as an RDF/XML, declare the RDF mimetype, and we're set. However unless the RDF mimetype used has something interesting to say about using the OWL MT, we can't really apply any computations over and above the RDF MT without crapping all over any number of principles that make the Internet work. Well actually, of course we can - after all, who's going to stop me interpreting OWL URIs as OWL? But in terms of the reality of clients and servers, this is bit like the GET-7 rathole of the consequences of your (and your user-agent's) actions - the publisher of OWL in an entity body who declares it with an RDF mimetype incurs no risk by having it interpreted as OWL. The representation is to be understood as whatever the mimetype says it is. If that happens to be RDF and only RDF, then the consequence of interpreting it as anything else is at the intepreter's cost, not the publisher's. Just as interpreting application/octet as HTML is your problem, so is interpreting application/rdf as OWL.

The problem with this approach is that is doesn't lend itself well to mixing and matching formal languages (as opposed to URIs). Today we only really mix subgraphs of a particular formal language, but it's not going to be long until we'll start to construct hybrid domain models using a variety of formal languages each with their own MT, and you would assume, mimetype.

The other option is to drop mimetypes (except for application/rdf+*) and target the URIs themselves for import. In other words, if you use a term unique to a particular formal language you are bound to the theory of that language, even if you didn't know what you were saying.

There are immediate problems with either approach (or any approach using mimetypes). First is the exclusion of hackworthy processing of RDF, such as is common with RSS 1.0, Dublin Core and FOAF today - I doubt more than a fraction of code processing these vocabularies is compliant with the RDF MT (and why should they be, if what they do is useful?). The second is further away but quite serious - individuals and organizations may not care to be held to the logical entailments of their published graphs. As an industry we don't expect to be held responsible software defects - will it be any different when new software is data driven in this way? Then again this may work out just like OLAP and Data Warehousing - where we pay a lot of money to figure out what the hell we've actually said across a number of domains, without much concern about where the inferences lead.

Deep down, I have the sense that this might well become a big a mess as the URI name/addressing debacle. While there are only a few ratified semweb languages it's tolerable to use mimetypes. But if the semweb is even remotely successful, and is even remotely like the KR, ontology, and AI fields it borrows heavily from, then we can expect a myriad of formal languages, all keyed off RDF and we can expect users to mix and match terms from these languages literally without knowing what they're saying.

There are other alternatives, such as negotiation to a language. This is not pie in the sky. There have been real results, and real work done in internet protocols, AI, economics, and multi-agent computing that allows two entities to automatically agree on how to impart information, including utilizing an interpreting entity.

Mark also points to something Dan Connolly said over on rdfig as part of an argument pro people accepting the entailments of their sentences:


we need as many model theories (i.e. constraints on terms) as we need terms
neither RDFS nor OWL is special.
they're just like the C standard library.

Only the second sentence is true. We do not need an MT for every term. We need an MT for every formal language. For every term we need an intepretation (I) that maps a meaning to a term- RDFers usually call this "denotation".

And an MT is nothing much like the C standard library, as I understand the analog (C ~= RDF). OWL is closer to java/javac than time.h, and an OWL vocabulary is more like an EJB domain model than a C program. You can't define the theory of OWL in RDF the way you can define time.h in C. OWL is a distinct, more powerful formal language to RDF and as such has both a distinct theory and set of formulae. Nevermind that the semantics of C and Java are decidely non-trivial compared to RDF and OWL - so much so that the comparison quickly breaks down. To get an idea of the sense of this breakdown, try running your EJB source through the gcc reasoner and see what happens.

October 09, 2003

How many mailing lists?

8 propylon internal lists
bugtraq
focus-ms
focus-linux
xml-dev
www-tag
www-ws-arch
geronimo-dev
extremeprogramming
junit
refactoring
axis-user
xml-dev (openoffice)
chi-web
chandler-dev
p2p-hackers
agents@cs.umbc
rest-discuss
xml-sig
ietf-announce
atom-syntax
xom-interest
vapours.rdfweb.org.
rdf-interest
emacs-nxml-mode
chat@fipa.org
concurrency-interest

That's about 35. I need to cut back. I'm not just on these lists, I follow them.

[flc: friday night]

On the value of Turing machines

Programmers, on the other hand, love general-case solutions. Algebra is cool. Calculus is cool. Cellular automata are cool. Turing machines, as universal computing devices, are really, really cool. But Turing machines by themselves don't do a damned thing. A raw, unprogrammed Turing machine puts no food on the table, chases off no wolves, and wipes away not a single tear. - William Pietri

[via Carlos]


[the specials: ghost town]

October 07, 2003

Does anyone really think RSS-Data is a good idea?

Danny, being polite:

What we're looking at is application-specific data structures being encoded in XML-RPC and RSS being used as a blind transport.

It all sounds like SOAP RPC-encoded. Why bother reinventing a failed approach?

RSS 2.0 itself is defined essentially as an application for pumping newslike information. Nothing else. You can extend it using namespaces, but there is no standard way of doing so, so every new extension module exists as a parallel pipe.

For some strange definition of extensible. This is like saying org.apache.tools.ant.* is an extension of Java, or that jelly is an extension of a peanut butter sandwich. Really, extensible here means someone can tunnel another vocabulary through RSS 2.0 to you using namespaces and if you can dispatch on the namespace you're laughing. It does not mean the RSS2.0 vocabulary itself was extended or that those two sets of names have anything whatsover to do with each other by implication. Worse, without RDF, namespaces are 100% overhead. The most useful purpose of XML Namespaces is to map XML elements onto URIs, and the most useful purpose of mapping URIs to XML elements is to ship RDF around. However, this not what most people are using XML namespaces for. When it comes to namespaces, if there's no RDF, there's no extensibility.

Extension, in my book, comes through a shared model of content or processing, the former which RDF happens to provide in a manner quite similar to the UML or the relational data model. If you provide RDF content inside RSS1.0 all mapped onto XML+namespace, I can map that content and the RSS into an RDF graph - there's no tunneling other than tunneling of RDF. Think of RDF as being a bit more pedantic about how you express the domain model - sometimes that will make sense, sometimes straight XML is the way to go.

Unfortunately, the semantic web AI hoopla has put many people off RDF. I guess if all we ever did with the relational data model was talk about theorem proving, that would put people off too. No-one (almost no-one) calls relational data a solution in search of a problem or pie the in the sky AI research - with a good query language (SQL) and largely excellent tools (RDBMS), of course they don't. Quite the opposite, enterprise architectures today depend way too heavily on SQL and RDBMSes - whole platforms and industries are predicated on their presence.

In the meantime we have to wait for the W3C to standardize the Ontology Query Language and for someone enterprising to invent the RDFMS before RDF becomes an obviously good option.

But beyond very simple stuff, getting RSS-Data from RDF would almost certainly be a waste of time because it's so lossy.

What would be the point of mapping a struct or a string to RDF or vice versa? All the data types are telling you after all is the allowable range of a value for some property. And there's not much interesting we want to assert about struct or a string itself. Although, if we're talking about objects in the domain, that's different. What's interesting is the property and what the property is slotted into. For example, in a OO language we might want to say things about a User, which happens only as an implementation detail of the programming language to be written down as a struct of strings. We might want to associate business logic to that User, or assign certain permissions to her, but ultimately we're not really interested in mapping strings and structs onto domain objects; these are just things we use to keep our compiler and our managed runtimes happy. I'd argue, strongly, when you break down domain objects into essentially programming language primitives and send those across the wire, you're making exactly the same mistake as the SOAP RPC-encoded approach. This approach only ever makes sense if you control both endpoints and specifically can ensure the computing platforms are interoperable - on the web this is a nonsense approach. The right approach is for parties to agree to share some minimal assumptions about a domain structure without fussing unduly over data types, which is where XML shines (especially running over REST or an SOA), or to bite the bullet and use RDF to describe things in the domain, which is not unlike the approach anyone using ER diagrams or the UML is taking with a domain model (and will have something of the same limitations). Surely the idea is to protect your data from the vagaries of things like structs and varchars not force agreement on them?

RELAX NG book in pre-publication

RELAX NG

Eric van der Vlist:

Although the book shouldn't be drastically updated at this point, you
are still welcome to submit your feedback using our annotation system.

John Cowan picked up on James Clark's forword on xml-dev, and it's worth repeating:

XML standardizes only a syntax, but if you constrain XML documents directly in terms of the sequences of characters that represent them, the syntactic noise is deafening. On the other hand, if you use an abstraction that incorporates concepts such as object-orientation that have no basis in the syntax, then you are coupling your XML processing components more tightly than necessary. What then is the right abstraction? The W3C XML Infoset Recommendation provides a menu of abstractions, but the items on the menu are of wildly differing importance.

I would argue that the right abstraction is a very simple one. The abstraction is a labelled tree of elements. Each element has an ordered list of children where each child is a Unicode string or an element. An element is labelled with a two-part name consisting of a URI and local part. Each element also has an unordered collection of attributes where each attribute has a two-part name, distinct from the name of the other attributes in the collection, and a value, which is a Unicode string. That is the complete abstraction. The core ideas of XML are this abstraction, the syntax of XML and how the syntax and abstraction correspond. If you understand this, then you understand XML.

October 04, 2003

Loading resources from the classpath

The suggested idiom in this blog post doesn't always work. The statement underneath:

I have not compiled the above, it is just an example.

is a clue :) It doesn't work because the system classloader can't always locate resources, particulary if they are buried inside a package structure and you have to refer to them using a path. As to why I'm not sure, but I ran into this problem yesterday at work. I have a solution, but since I don't fully understand it, I'm understandably uncomfortable :)

Some background - I have some code that uses an XML template to generate XML files - this is trivial stuff mind, I'm using MessageFormat to populate element content. I don't want this template to be easily altered, but I don't want it buried in a String either. So I put it in the template generator's package space and had it loaded in via the "Instance.class.getClassLoader()..." idiom. All the junit testing stuff around it worked, so I checked it in. But when a colleague deployed the code to JBoss as a jar, the resource wasn't to be found. I'm not sure why, but neither of these idioms worked:

  Instance.class.getClassLoader()...;
  ClassLoader.getSystemClassLoader()...;

I did some more testing and came the conclusion that to load a resource off the classpath you need to dynamically load a class packaged with said resource and use that to get the resource. For example, this worked where the above failed:

  Class c = Thread.currentThread()
    .getContextClassLoader().loadClass("...");
  c.getClass().getResourceAsStream(path);

A fuller example (using the flakier Class.forName):

  public class LE1150LoadResourceMain 
 {
    public static void main( String[] args )
      throws Exception  
   {
      Class c = Class.forName(
        "propylon.deps.generator.LE1150GeneratorImpl");
      InputStream is = c.getClass().getResourceAsStream(
        "/propylon/deps/generator/env-template.xml");
      StringBuffer buf = new StringBuffer();
      // load ....
      System.err.println(buf.toString());
    }
  }

works under a variety of loader environments that came to mind (IDE, JUnit, command line, J2EE containers, servlet containers).

Love to know why, if anyone does :) I suspect there's something simple and fundamental I'm not getting.

[alabama 3: woke up this morning]

Get Your Messaging On

Standards for services and messaging architectures:

HTTP 1.1
FIPA AA
ebXML 2.0
JXTA
XMLPP
OGSA
BEEP

[u2: elevation]

Heart of darkness: what's wrong with Ant

Patrick:

What I didn't see on the MARC list was a lot of postings by java folks who were very experienced with Ant and have decided that Ant just doesn't cut it anymore because of reason X, Y or Z. So what I'm really looking for are well laid out arguments for why Ant is really so terrible, and why something else is better because it is simpler, more productive and/or more powerful.

Ok...


  • Ant encourages duplication across build files. Duplication is built into Ant insofar as you will always be fighting with it if you want to say something in only one place. Make let you normalize just about everything, because it allowed you to import between files. Ant is currently stuck with entity inclusions - but 1.6 is supplying an import function.

  • Ant isn't recursive. A build system that is going to scale with a codebase has to be recursive. Period. When people are going on about dealing with subprojects, or about standard project and directory layouts, really what they are talking about is the need for recursively enumerable build structures. If your subprojects have the same structure as the superprojects you can use the same scripts to build them all, and you can reorganize stuff more easily.

  • As bad as the lack of recursion is, its worst side effect is crucifying - Ant's enforcing of downward dependencies. When you create a subproject, you should not have to declare its subtargets at the top level - the build process should descend and execute against the subtarget automatically so that the subproject will simply inherit the master's targets. The job of the subproject is to declare specific dependencies and parameters to the key targets (this is roughly how a decent make setup works, and maven seems intent on working this way as well). It's important to be able to keep adding targets and projects without changing the master build targets - if anyone suggested that the way to add a subclass to an inheritence hierarchy was to first declare it in the base class, they'd be laughed at. But this is how Ant works today. [Junit also has this problem with its test suites.]

  • Ant doesn't understand dependencies. I'm well aware of the "it's not an expert system/ dependency checker" arguments - they're absolutely true. But that's moot if what you need to build software is in fact an expert system or a dependency checker. Asking the buildfile writer to explicitly manage inter-project dependencies was never a good idea, unless you intend to only have one project.

Which leads us onto maven. The main objection to maven is that it's a patch around deficiencies in Ant. Every time I use maven it feels like there's something inherently wrong with it. It's building on the wrong kernel for the job, much like NT did with DOS, which has the effect of making it something of a stovepipe system. Despite the amount of effort that has gone into maven to help you with Ant (and you should dig into the source to see what I'm talking about) it remains 16 bit Ant at the core, the limitations of which just can't be abstracted away that easily.

In any case, correct me if I'm wrong here, but I'd say the Ant vs Make debate is pretty much solved. Is there any one out there who has uses both Ant and Make enough to make a rational comparison, and has gone back to Make because it work better?

I've seriously considered it, yes. Many times. I think you need to have seen a good makefile setup to appreciate why Ant is not make without make's wrinkles. Ant is nothing like make. That, or hacked at an expert system at some point. Unfortunately good makefiles are rare and most of us don't hack expert systems. The Ant vs make debate may be solved, insofar as there's no point arguing for make anymore. That's not to say that Ant doesn't bring its own issues to the table.

The reason I haven't moved back to make is social not technical. Ant has two characteristics to which I'm highly sympathetic - worse is better, and view source. It's far, far, easier to get folks to first accept, then use, then extend, Ant, than it is to get them to use make - make is freakish by comparison.

I've been pretty hard on a tool I use every day and have encouraged others to use: I am despite the post, a strong Ant advocate and have no problem blowing Ant's trumpet to colleagues or clients. When it comes to Java, nothing much happens for me without Ant and Junit. But I don't think Ant when it was conceived, was ever designed to handle the complex setups it is used for today. It's very much a victim of its own success, which can be attributed to its worse is better and view source qualities - Ant is a user friendly technology. But an important thing in advocacy is not to pretend there are no problems. I think there's a good argument to be had about making Ant a better build tool, but remain dubious that maven is the way to do it.

[alex reece: acid lab]

October 01, 2003

Junit 4

The fact is, JUnit is pretty much abandoned out there on Sourceforge. Call it a virtue if you want, but don't tell me JUnit 3.8.1 is complete or even "good enough." It can hardly be called an open source project in the same sense as something like Eclipse, Apache Geronimo, Jakarta Tomcat, MySQL, etc. because there are no active committers. And there won't be as long as Kent and Erich keep it locked up for fear of people actually implementing some of the many features that people have asked for. So JUnit is open source, but I wouldn't call it an open source project right now. Scott Sterling

+1. An OS project is only as good as its community. The community around JUnit is strong, but there's no easy way to convert that to better code as long as Kent, Erich, et al are being prolific elsewhere. I don't know that they're locking the code up - I figure they're simply very busy men. But the last thing anyone wants to see is JUnit JSR'd or worse, fragmented because the code base is moribound.

I think it's time to push for Junit 4. I'd like to see Scott's classloader fix incorporated in that or an interim release.

[the streets: don't mug yourself]

Migrating to Subversion III

One problem I've felt Subversion had is a lack of GUI tools (plugins for Eclipse and IDEA are pretty nascent and as a result aren't so hot). As a result I've been disinclined to nudge Subversion other than mention I like it. Developers have enough trouble with version control without making them type in weird incantations into a terminal.

Turns out, I'm probably dead wrong and Subversion may never need a decent GUI. I should explain that.

Something strange happened this weekend. I made 60-odd commits to my build manager project in three 4-hour stints. All by alt-tabbing out to the command line and adding a brief comment. Even for a command line / version control wonk like me, that's quite something- ballpark better than one commit every fifteen minutes. I never ever felt like Subversion was in the way. Quite the opposite - it was letting me capture every green bar in a very natural coding rythmn.

All with a VCS I'm still learning how to use. I put some of this down to the atomic, repository-wide commits I mentioned before. I put much of it down to better usability.

Much as I like CVS after using it for years, I've never felt that level of comfort with it - checkins can be a chore. Ditto for VSS. So you put them off, and let what I call the the 'commit area' expand. Doing that can result in you banging into someone else's work and merging. Merging large changes hurts, so you put off the inevitable and end up in a race to the bottom. A common response after being through this pain is to want to lock code. But the truth is, once you're working with a team, you are destined to integrating code one way or another. At best, locking files is a band-aid, at worst it linearizes code development and slows everyone down (by analogy, if you've coded on a project where all roads lead to a database, you've experienced something of what I'm talking about). The cure is not to try and make merging go away (you can't, even with pessimistic locking) but to commit smaller and smaller chunks of work until merging becomes trivial, and in many cases the VCS will take care of it. Integration is so important to development we should be doing it as often as possible. To do this successfully you need the VCS to support you, in exactly the same way that refactoring is supported by a good IDE and testing is supprted by a good framework.

Another effect of commiting this frequently is that the repository version history starts makes sense, and become a useful record - there are no huge bewildering changes to revisit. It also allows you to back out using the VCS. Kent Beck talks about backing out in Refactoring. Many of us will, when we get into trouble, move only forward until we come out the other end, battered, or we get truly lost. (I think this is in part is down to human nature, to search depth first for a solution). But sometimes it's better to throw a change away, fall back to a waypoint, and start over. With frequent checkins you'll rarely be in a situation where throwing work away is too painful to consider. Lose 20 minutes work? That's spilt milk. Much better than wasting half a day hacking your way out of the weeds.

There are tools, and them there are tools that change how you do things in fundamental ways. These are the tools like junit, emacs, idea, lisp, grep/find, araxis-merge, moveable-type, propelx. Add svn to that list. For the first time, I can really see an environment where every green bar test is checked into the repository - whoever glues JUnit to automatic checkins to Subversion is going to make a name for themselves.


[alex reece: feel the sunshine]