I've been away for a week and it looks there's a lot going in the RSS/RDF world; in grand blog tradition, time to comment on the comments :)
Sam Ruby's hacked an Atom 2 RDF sheet together
to get a feel for the 'RDF Tax', which has provoked a number of comments.
Dare Obasanjo wanted to know what the point was:
I'm still waiting for anyone to give a good reason for Atom being RDF compatible besides buzzword compliance. Since you seem interested in making this happen can you point out the concrete benefits of doing this that don't contain the phrase "Semantic Web"?
RDF is a BingoBuzzword now? Anyway, Let's try to come up with a few reasons why RDF might be worthwhile. We'll start with a standard device for any technology that is not widely adopted - draw a spurious analogy to it and Lisp.
Lisp is extremely powerful because it's evaluation rules allow for meta circular evaluation (in the history of big ideas in computing, this one is right up there). Lisp is what some call a programmable programming language. This is enabled by its 'weird' syntax, which is very likely an optimal one (although Ruby and Python give us pause). Consider that you can add just about any programing langauge feature/craze to Lisp without altering the evaluation rules and consider that many of the current features/crazes we're interested in today have been availble as Lisp graft-ons for what would be in Internet years roughly a century. Compare that to the funky way languages like Java (and in time, C#) extend and adapt - they evolve by applying patches, not grafts. These are not 100 year languages. But, Java and C# have familiar syntax and active communities which will continue to develop and apply patches and band-aids, covering a multitude of sins resulting from non-uniformity.
RDF is tangentially like Lisp, only for content instead of programs. The potential power of RDF is in the uniform way it lets you describe content. So, here are three areas where I think RDF's uniform content model can help.
- Vocabulary mixins
Since it's triples all the way down, the process for vocabulary merging in RDF is much the same as that for merging two graphs, with extra constraints thrown in in order to keep the meaning of the new graph consistent with RDF's rules (as described in the RDF model theory). It's very simple in principle. In practice the tools aren't so hot and the XML is too difficult to work with directly - ultimately people want to deal with RDF in XML. Yet, it's probably easier to express and manage a dependency graph using raw makefiles or Ant hacks than express and manage simple content in RDF/XML. That's damning.
- Vocabulary extensibility
In RDF, extensibility is subtly different to the kind of extensibility that XML/RSS authors are often concerned about, which is the insertion of new names into sets of namespaces, what we might call shallow extensibility. To a large degree they are concerned about this because that is the place where XML Namespaces ushers them. The idea of vocabulary extension via mixing of vocabularies, what might be called deep extensibility is much less considered because you simply can't express that idea within the language game presupposed by XML Namespaces. This is why you need a uniform content model to make sense of namespaces and that XML Namespaces is a technology for firewalling content, not extending it. Though by the time you had a uniform model, I'd be arguing that these namespaces were a poor way to serialize the vocabularies anyway :)
The reason RDF is good at extending and merging vocabularies is because there aren't any.
Let me explain. I think the area that excites people most about RDF is that extensibility is achieved by merging graphs, not vocabularies. Vocabularies for RDF are just sub-graphs - that people happen to find them useful or not is incidental. Vocabulary has the same bearing in RDF as Nationality has in genetics - which is none at all. In much the same way my being Irish is irrelevant to my genetic makeup, my having a vocabulary is irrelevant to my RDF. In a very real sense, there are no vocabularies in an RDF graph. That you see them is at all is an illusion. (The only fundamental aspect of being Irish with regard to RDF is a love of the number 3 :)
To extend some RDF you add add new triples to its graph. That's it. You find out what nodes on the current graph are the same as the nodes on the graph you want to import, those are your join points. You don't figure out how to scrunch two blocks of XML or two databases together. You don't transform one vocabulary to another, or if you a systems integrator, look to establish the least general vocabulary that will unify the two (if you are standards body, you will naturally look to find the most general unifying vocabulary). While there are usability and social benefits in having them around, there's no need to enforce the separation of vocabularies in RDF machinery - that's very much an artefact of XML Namespaces.
All this for me tips XML Namespaces alledged usefulness for managing vocabularies into a cocked hat, being at best a means to transport RDF graphs around using XML. They're mechanism, not policy, and we need to stop treating them as policy.
Perhaps the most interesting use for RDF is query. This shouldn't be a surprise - query is simply an application of the very inference that makes so many people suspicious of RDF as the key ingredient of a pie in the sky AI project (the formal relationships between query, inference and theorem proving have been well understood for decades).
There is a growing need to coordinate and unify content - the rate of content generation far exceeds our ability to manipulate and digest it, even with statistical techniques and on the fly transformation to known structures. Uniformity is a big idea here - XML provides uniform syntax, but we still could do with a uniform content model for the query engines. As Edgar Codd found out, it's easier to query over a uniform model. And unless one of the existing KR languages sees overnight adoption, the relational model gets a facelift, or there are fundamental breakthroughs in the math behind statistical search, RDF is the only game in town. You can of course choose to patch things together in an analog of the commercially popular languages, Heath Robinson style. But you might end up suffering death from a thousand hacks.
Jon Udell draws a comparison between RDF uniformity and XML querying:
If the RDF folks have really solved the symbol grounding problem, I'm all ears. I'll never turn down a free lunch! If the claim is, more modestly, that RDF gives us a common processing model for content -- a Content Virtual Machine -- then I will assert a counter-claim. XML is a kind of Content Virtual Machine too, and XPath, XQuery, and SQL/XML are examples of unifying processing models. [...]
This is very good point, but I'll make a distinction to help see why RDF has soemthing else to offer. The distinction is between inscription and description. XML in this sense is a Syntax Virtual Machine. It's a grammar for grammars. And as such is immensely valuable, now that we mostly agreed to use it for new data formats. RDF is a grammar for content.
To draw a flaky biological analogy, XML is the raw material you would inscribe nucleic and mitochondrial DNA strings with. RDF is the stuff you would use to describe those strings to the proteins (which by the way, read in DNA as triples ;) who make the molecules blueprinted by the DNA. To be honest, I believe that syntactic building blocks are more fundamental than content blocks. Which is possibly why I depart company with RDFers regarding RDF/XML.
(As for symbol grounding, no such luck :)
[...] As we move into the realm of extensible aggregators we'll face the same old issues of platform support and code mobility. Nothing new there. However, as XQuery and SQL/XML move into the mainstream -- as is rapidly occurring -- aggregator developers are going to find themselves in possession of new data-management tools that can combine and query structured payloads. Those tools will not, because they cannot, know a priori what those payloads mean. But they'll provide leverage, and will simplify otherwise more complex chores. I can't see the endgame, but for me this is enough to justify doing the experiment. [RDF uniformity and XML querying]
XMLQuery has a practical importance in the enterprise. In principle it will lets us delay the moment we start putting XML into relational databases and therfore help mitigate creating huge system bottlenecks and dissonant system layers. My hope is that it will let us scale beyond file system inspections and ad-hoc scripts while avoiding the problem of n-tiered middleware, namely all roads lead to a database. But neither it nor XPath is a basis for describing the domain level concepts that businesses care about. Today the best practical approach is an XML document designed by a good modeller and let the application programmers have at, but RDF backed query could be a very powerful and highly scalable augmentation to good XML document designs.
Note that this is quite distinct from a uniform ontological model or a model of domains - no-one who understands what RDF is about it looking for a theory of everything. That is what you should be suspicious of - anyone offering The One True ontology, The One True model is selling snake oil or needs a crash course in modern philosophy. RDF is just another way to ease the pain of integration (where integration = interoperation + costs).
Human readability is underestimated. If this atrocius RDF syntax is chosen, you are going to scare off relatively non-technical people from producing an Atom feed. You can argue from here 'till eternity that "It's just applying a transformation with XSLT", but remember that many (most) people even have problems understanding a concept as simple as CSS.
Unless the Atom syndication format is going to lie in the same forgotten pool of mud that W3C RSS is, I believe RDF is best forgotten
Arve is objecting to RDF/XML while naming RDF. But on the whole, RDF itself is a pretty good idea, even if it is meta-models all the way up. RDF/XML undoubtedly remains a problem. It's the main reason I went into the RDF wilderness and don't normally evangelise RDF in my work - in the trenches, syntax matters.
I don't think the 25% increase is anything to worry overmuchabout - I'm sure it can be brought down significantly (I notice an rdf:Description in there for a start). Some human readability might have been lost in the process, but a lot of machine-readability has effectively been gained.
To butcher Alistair Cockburn's observation on process - machine readability can only ever be a second order effect, human effect is always first order. Unfortunately RDF/XML is a barrier to entry whatever the rationales offered by RDFers. The current excitement about RDF pales into insignificance compared to what might happen with a usable, hackable syntax - it could have been game set and match two years ago instead of the daily uphill battle of selling RDF.
Elsewhere, Danny has said that RDF/XML is not the only fruit and points to Uche Ogbuji's observation that for RDF, there is no syntax. This is only partially true. Practically there must be a syntax and consequent sofware codecs, or we can't communicate (if someone has a communications model without the presence of syntax, feel free to follow up - and by all means do so without recourse to syntax ;).
That RDF is syntax-independent is often touted as an unqualified benefit, but it's not always. It is a benefit in that we can reason about the rules for merging RDF graphs with recourse to syntax - this no different to be able to reason about a theorem without recourse to a notation, often the notation can get in the way. Likewise it's good to be able to talk about a graph without having to worry about Java Objects or C pointers. But in practice there is always a notation, a syntax, and in software development, notation is frightfully important. It's what you will be working with day in day out after all.
After four years of head scratching, I genuinely believe RDFers have a double blindspot with the XML syntax. First off they see so much value in RDF that the benefits outweigh the costs of the syntax, or any syntax. Second saying there is no syntax seems like the ultimate get-out clause - don't like the syntax? It's ok, the syntax doesn't matter, and you can write new one if you want. The first blind spot I have sympathy for. The second I have none, not any more. Interoperation doesn't start with models, it starts with syntax. Shared syntax is the prerequisite for interop. We have a couple of decades' experience to back that observation up. Interoperable models may indeed follow and piggyback on syntax, but looking to models first is a mistake. And if the first reaction to the syntax is to reject it, pushing the model around is going to be tough work.