" /> Bill de hÓra: May 2003 Archives

« April 2003 | Main | June 2003 »

May 29, 2003

IDEA and Ant annoyance

So, recently, we were building a war file for deployment from an Ant script via IDEA. Everything's fine until we invoke a servlet. Version mismatch error (48, you know the one). Go back to the IDEA settings - yes it's using a 1.3 JDK, yes the environment's pointing at a 1.3 JDK. If we build directly from IDEA and deploy an unwar, it's fine. If we build the war using Ant from cygwin and deploy, it's fine. But not Ant from IDEA.

It turns out that IDEA uses a 1.4 javaw to run ant scripts, and to use a particular JDK you need to specify it (and then put an XML parser on your classpath if you don't have one already). It would be nice if IDEA someday picked that up based on your JDK preferences in the project file.

Somewhat related there doesn't seem to be a way to tell it which JDK to use for running unit tests - presumably it uses the JDK in your project preferences and not javaw (I haven't checked), but that's what we thought it would do with Ant.

IDEs, you can't trust them, even the good ones...

May 25, 2003

I'd rather use a GET

Benefit of human readable protocols

Give this a try

The browser configuration string to do it:

  http://www.google.ie/search?q=absurd+obfuscation&num=10&hl=en&ie=latin1&oe=latin1&safe=false

The Python to do it:

  import httplib
  conn = httplib.HTTPConnection("www.google.com")
  conn.request("GET", "/search?q=absurd+obfuscation&num=10&hl=en&ie=latin1&oe=latin1&safe=false")
  r1 = conn.getresponse()
  print r1.status, r1.reason
  data1 = r1.read()
  conn.close()
  print data1

Why readable XML matters, even to RDF

(I'm lifting this out of comments, Danny we both have pingback ;):


Thanks for quoting one of my better statements ;-) For there to be a change of syntax to be justified, there would need to be pretty major improvements (and that's apart from the politics needed to get something through the W3C). Given the graph/tree mismatch I'm not sure this is even possible - personally I thought Tim Bray's RPV syntax was actually uglier than abbreviated syntax RDF/XML. If all you want is human-readable, Notation3's not bad, it's just not XML. I'd also question how important readability is for any data-oriented XML. RDF isn't XML but (early) HTML isn't XML either. Once the generators/parsers or whatever are set up, there shouldn't be much need to go poking around. In the case of RDF/XML, it may not be that simple, but I think there's enough legibility for maintenance purposes. I do think there's a bit of exagerration going on too - it may not be optimum, but RDF/XML syntax isn't harder to work with than say XHTML(+CSS) or XSLT. Overall I just reckon effort will be better used at this point in time working on getting some good tools together, rather than worrying about the inelegance of RDF/XML syntax. If someone does come up with a neat XML syntax, great, if not, no big deal, adoption of RDF will just be a little slower.

Danny, I've agreed RPV isn't any better. And while N3 isn't XML, it's not RDF either so I'm happy to let it go by the by. I'd like to not be able to ignore n-triples. The rest of this post is where I'm coming from on the syntax thing.

I don't question how important readable XML is, I know it's important. Away from specland and in the trenches, the problems with XML often to come down the same old things- things like Namespaces, the DOM, too much XSLT to mange, memory usage. But truly the combination of hard to read data formats with tools designed to protect the developer from the raw XML data, or just plain hide it, amortizes all the other problems. Not being able to get at or work with the text drives me insane - I do not need to be protected from the stuff, and if I do, I shouldn't be using it.

In a past life, I never appreciated or understood HTTP until I worked with it directly (ie not through a CGI or a Servlet). Today it's obvious to me that one of the reasons that protocols like SMTP and HTTP have come to domination is because they can be read outside a debugger or some vendor's tool. I can often solve HTTP issues on the job by looking at the traffic, something I can't always do through server tools and GUIs (this has actually happened twice this month). That makes HTTP valuable to me professionally in precisely the same way being able to build sofware outside an IDE with make or Ant is - it helps me to do my job, better and more flexibly, which ultimately is not to use HTTP, but deliver software that meets a need. [The other characteristics in HTTP articulated by REST are important too over the life of a system, but for me, shipping with, and ongoing maintenance of, are primary characteristics of any software technology, especially in a networked environment]

It's very hard to do that with some of the data-focused XML I'm seeing in the last two years. And I don't believe (like some perhaps) that's to do with anything inherent in being data-focused, just that the emphasis on API access and expectations about tool support results in a dissonant and I suspect unintended outcome - write only XML.

The potential cost of dealing with write-only XML on the ground isn't worth it in my experience, and hand on heart I simply can't justify the risk of depending on tools to maniplate it. For these reasons, I'd shy away from recommending RDF in the heart of a system, in precisely the same way I would with XSLT or Perl, or some vendor's proprietrary format. That's even where I think RDF might be a good fit. Claims that RDF/XML is not much worse than XSLT is not in its favour - I've seen projects get into trouble simply because there was a lot of XSLT and it gets difficult to manage and organize. The only people I know that seem to be geting good use from RDF that would be near my current line of work are FourThought - but they wrote their technology to get there; I'm not sure I have the inclination or energy to do that just yet.

So when I'm whining about RDF/XML syntax, it's not just a matter of personal taste or being a curmudgeonly git. It's all very boring and practical really. If some technology is ungainly, it needs to be so for a reason - but RDF/XML is pure overhead - it adds no interesting expressive power, does not fall into the good enough category, and thus to me can't be rationalized by a charter, tools, or the fact that RDF is potentially valuable somewhere down the wire. The single exception might be that everyone else is using it - that's not the case.

RDF: syntax refuted?

As expected, the RDF syntax permatthread does the rounds

Ian Davis:

It's essential that the model is watertight before we can do anything major with the serialization syntax.

Yet the XML was heavily reworked while the Model Theory was invented from scratch. And n-triples was invented at the same time. No such ordering is essential or neccessary (but take a look at what Ian's doing with the XML). On the other hand, such Models are never watertight - just consistent.

Danny Ayers:

The web might have got to where it is today through being a collection of syntaxes, but for it to progress any further I think we need to step up a layer of abstraction. So I don't think we need to worry too much about Tim's syntax-oriented criticism.

Yes, RDF is not XML (as usual I put that in red, so no-one will miss). That's not sufficient to excuse an ugly hard to use serialization. And don't even start me on depending on tools...

Shelly Powers:

We need to stop treating RDF/XML as yet another variation of tags similiar to HTML and start looking at it as a form of virtual binary code -- machine generated and consumed, but output in plain text. Until we do, we'll never get to the point of creating that killer app that Tim wants.

Well following this argument to its conclusion there's no point using a text format in the first place, so whatever way you cut it the XML needs to be sorted out. Text vesus binary is a whole other permathread, but in my humble but correct opinion, the 'it's just for machines' argument doesn't fly on the Internet, never has, never will. The syntax matters - it forces you to think.

Dave Beckett:

The title of the document is RDF/XML Syntax Specification (Revised) and Revised is important. It is, hopefully, a better description of an existing syntax, which Tim Bray was involved in designing. Not me.

We removed some things, added others, but didn't rewrite the core language with all it's many abbreviations - that would break the contract with the existing deployed users of RDF/XML who have put it in their products

In my experience, Dave Beckett has the patience of a saint. He also has (or had) the toughest editorial job at the W3C for well over a year. But this is a tired excuse. If we look at the original charter it says the wg will not develop a new RDF syntax or new RDF model. Which turned out to be bunk - the Model Theory was developed from the ground up and it is, without question, a new model. Make no mistake, the RDF MT is a deviation from the charter that took up a large proportion of the wg's time during 2001 and the first half of 2002. This was or is, hardly ever questioned - but that is so much the worse for the charter since the model needed serious work. On the other hand the suggestion that the XML be developed from the ground up was and is consistently argued against by pointing to the charter, but it needed no less rework. It's important to be consistent about these things. And I'm mystified as to how a new RDF semantics does not break the contract with the existing deployed users of RDF/XML, but a new syntax does. - perhaps it's because most people with RDF data aren't operating over it yet at the level implied by the model theory.

I mean come on, how hard can this be? It would be worth knocking up a pretty XML syntax to just to prove Tim Bray, Sean McGrath and I wrong :)

The RDF.net challenge

The RDF.net Challenge
Tim Bray:

Eventually sometime in 2002 I ran out of patience and tried to figure out the simplest imaginable way to express (R,P,V) triples in XML, and came up with the unimaginatively-named RPV syntax. [...]

Here's how you say that in RPV:
<RPV xmlns="http://www.rdf.net/rpv/">
 <R id="Dave" pbase="http://www.example.com/terms/">
  <PV p="fullName">Dave Beckett</PV>
  <PV p="homePage" v="http://purl.org/net/dajobe" />
  </R>
 <R r="http://www.w3.org/TR/rdf-syntax-grammar">
  <PV p="http://www.example.com/terms/editor" v="#Dave" />
  <PV p="http://purl.org/dc/elements/1.1/title">
    RDF/XML Syntax Specification (Revised)
    </PV>
  </R>
 </RPV>
And here's how you say it in RDF/XML:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:ex="http://www.example.com/terms/">
  <rdf:Description rdf:about="http://www.w3.org/TR/rdf-syntax-grammar"
                   dc:title="RDF/XML Syntax Specification (Revised)">
    <ex:editor>
      <rdf:Description ex:fullName="Dave Beckett">
        <ex:homePage rdf:resource="http://purl.org/net/dajobe/" />
      </rdf:Description>
    </ex:editor>
  </rdf:Description>
</rdf:RDF> 
What do think?

It's not enough?

Fwiw, I agree with everything Tom Bray says - but I don't see RPV as much of an improvement over RDF/XML, at least not enough to write handlers for it, or typing it into an editor, or getting excited about RDF again. I couldn't write RPV down properly either, so it failed that test miserably. RDF for humans seems to work best when you draw a picture of the graph you're after and later write it down as n-triples (more on n-triples below).

I joined the RDF working group after its new charter was announced in 2001. Last time I looked I was still down as a wg member, but I haven't been involved in over a year. I was excited because RDF was the future of the web dammit, it was a practical, bug-fixing charter and everyone seemed to know where the bugs were - the XML syntax, reification, literals, anonymous nodes, containers. But especially the syntax. As it turned out the vast majority of energy in the wg went into the formal aspects of RDF - exactly the bits most people don't want to care about - and to this day I don't understand how a Model Theoretic semantics squares with a bug-fixing charter. I do remember that the DARPA/DAML/OWL folks were mighty unhappy with the state of RDF theory - at one point it seemed possible that those efforts would tear away from RDF. Well that's mostly sorted out and today it seems to me the biggest remaining bug with RDF is that not enough people want to use it, especially those in web services, where they're stuck trying to make WSDL do things it isn't capable of.

The syntax subgroup did a good technical job of cleaning up the XML grammar, but it's not so much easier to work with than what was there before. If it was, I don't believe n-triples would ever have been invented. The n-triple was developed to allow the wg to write down test-cases. Which is a great idea: if I ever work on a standards group again, I will insist on test cases. On the other hand n-triples is as bad a case of not eating your dogfood as you are likely to see - a technical group mandating XML for public consumption, while using something totally different to get its job done speaks volumes about the state of the XML. To this day I feel very awkward about that scenario and I can remember asking whether we shouldn't make n-triples normative, but it's not like I pushed. So I suck for that.

Anyway, what Tim Bray said. I'm convinced that RDF is going to languish until the W3C commits to inventing a usable XML syntax for it.

May 23, 2003

Dear Java Programmer

The Fishbowl: Dear XML Programmers...

The interesting question surely being, what value is Java when I can program the thing in Lisp? I could surely write an evaluator for the configuration file in less time than I spent writing it in Java. Ditto for Python.

As for what's coincident between Lisp and XML, it's what's not coincident that matters. Lisp has rules for evaluating expressions. XML has no rules for evaluating expressions.