" /> Bill de hÓra: May 2006 Archives

« April 2006 | Main | June 2006 »

May 31, 2006

links for 2006-05-31

May 29, 2006

Mob 2.0

I think I'm gonna run out and buy a few O'Reilly books. To protest.

links for 2006-05-29

May 27, 2006

links for 2006-05-27

May 26, 2006

Web Based

Web based - works for me.

Using IM for grid computing

Tim Bray:

"My problem is that I’ve been a Unix guy for twenty years and a Web guy for ten. My feeling is that if something says it’s a service, the right way to talk to it is to get a hostname/port-number combination, call it up, send a request, and wait for a response. My feeling is that a good way to have processes work together is for them to exchange streams of text using standard input and standard output. My feeling is that if I’m going to be stressing out the performance of an app on a grid, I don’t want too many layers of abstraction between me and the messages. My feeling is that server failures are going to have to be handled in an application-specific way, not pushed down into the abstraction layer."

XMPP. Especially the Erlang* server, ejabberd, as it can be managed with ejabberdctl the same way you manage Apache with apachectl. And it's fast. Want to create a compute/result farm for all things trivially parallelisable? Use a chatroom to coordinate the workers. Want to listen to interesting arbitary activity on a heteregenous networks? Use PubSub to subscribe to RT data feeds. When I look at the OGSA stack I'm fairly sure XMPP is to Grid and JINI as HTTP was to WS and CORBA. Give it a few years - instant messaging is where real commodity grid action will take place.

MapReduce. Tim mentions hadoop. I susoect hadoop got spun out of Nutch cos it was where all the action was at, and the basic spidering - index - respider - reindex cycle was being given short shrift. But mapreduce sure likes the Linda in() and out() operations, as you see them in the implementation Gelernter and Carrerio outline in How To Write Paralel Programs.


* as in, yeah, we do SMP.

May 25, 2006

Architecture of Participation

Tom Raftery:

"One of these events - the upcoming Web 2.0 half-day conference is the target of a cease and desist letter (below) from the legal team of O'Reilly publishers. Basically O'Reilly are claiming to have applied for a trademark for the term 'Web 2.0' and therefore IT@Cork can't use the term for its conference. Apparantly use of the term 'Web 2.0' is a 'flagrant violation' of their trademark"

It'll be interesting to see how this plays out, given Tim O'Reilly's OSS, copyright and patents leanings. The leter appears to have come from CMP and not ORA, although it mentions O'Reilly in passing. Raftery is one of Ireland's most popular tech bloggers - every web developer in the country will know about this by Friday lunchtime.

links for 2006-05-25

May 24, 2006

Is The Desktop UI Metaphor Dead?

Don Norman:

"Oh," people rush to object, "the Google search page is so spare, clean, elegant, not crowded with other stuff." True, but that's because you can only do one thing from their home page: search.

This should be no surprise, given it's a search interface. Norman has more:

Why are Yahoo! and MSN such complex-looking places? Because their systems are easier to use. Not because they are complex, but because they simplify the life of their users by letting them see their choices on the home page: news, alternative searches, other items of interest. Yahoo! even has an excellent personalization page, so you can choose what you wish to see on that first page.
It's surprising to hear this argument from man who championed the Information Appliance - "where the computer disappears into the tool and becomes invisible". As with building architecture perhaps most people's needs are too unsophisticated to warrant the attentions of usability experts and information architects.

Popular thinking is that the big portals are complicated noisy monoliths, and not simple, easy to use applications.

The issue with web portals is that you only tend to care about doing one thing at a time - everything else at that point is a distraction. Portals by their nature tend to assure it's all noise, all the time. Web portals are an extension of the desktop idiom to the web. They are trying to cram many items, activities and chunks of information into an an area smaller than a tabloid newspaper's frontpage. Portals then are like a cluttered desktop where the entire surface is overtaken woth papers, diaries, reports, posts-its and gadgets being charged.

Historically portals have not existed to benefit end users - they're a relic of AOL and Compuserve walled gardens, going back to the day when well paid people had meetings about "stickiness". Maybe portals and homepages are less neccessary when you can try to guess what you're after with a search. Nonetheless Google's does have a portal, which is here: http://www.google.com/ig.

What is overlooked is that Google's search engine rose to dominance when search engines had all but turned into portals with varying levels of irritation. With that 'simplistic' interface they have cornered search, the most important Internet activity after email (and with GMail they have made signficant strides in "searchifying" the classic email interface). The users clearly voted with their mice. But that detail would not suit Norman's argument, which is essentially the Google search UI draws users despite being a bad design. You can feel the "does design matter?" debate looming. If experts think Google has bad design, then the answer is surely either "no" or a qualified "do you mean design as in pretty?".

Is Norman out of touch? No, but it's telling that he focuses on the search page, but not the results page. There's arguably no need to bother with a dedicated page for maps when you can start the search with the word "map" or search for address locations and have the map option appear alongside the results.

Perhaps the hunt and peck approach of searching (along with gaming) is becoming the dominant computing metaphor, replacing nearly 3 decades of user interfaces based on the metaphor of an office desktop (ironically the metaphor itself being pushed into irrelevancy by desktop computing). If so, usability experts will need to reconsider what they deem to be best and appropriate.

Using TinyMCE in Django's admin

After a few failed attempts, I managed to get TinyMCE running inside Django's Admin. This is nice for editing html fragments and suchlike. Here's what to do.

First, download TinyMCE and unpack it into the media folder in Django. For example, if the path to your Django Admin is:

   /usr/local/python24/site-packages/django/contrib/admin/media

then put the TinyMCE js folder (in the distro this tinymce/jscripts/tiny_mce) here:

   /usr/local/python24/site-packages/django/contrib/admin/media/tiny_mce

Here's a grab from Windows showing what you should end up with:

tinymce-in-django-admin

Configure the TinyMCE pane. In the django/contrib/admin/media/js folder, add a textareas.js file that will initialize and configure TinyMCE. Here's another grab from Windows showing what you should have:

tinymce-in-django-admin-config


I used the AddWYSIWYGEditor configuration from the Django Wiki for this. It's a minimal setup that grabs all textareas on the page and applies TinyMCE to them. You can download the configuration from here as well: textareas.js.

That's Django Admin set up.

Now let's configure a Model class to use a rich edit area. Here's a model called StaffBio that isn't configured to use TinyMCE:

    class StaffBio(models.Model):
        name = models.CharField("Name", maxlength=128,db_index=True)  
        position =    models.CharField("Position", maxlength=128,db_index=True)  
        bio = models.TextField("Biography")
        fote = ImageField("Photo", null=True, 
               blank=True, upload_to="E:/home/dev/storage/StaffBio/fote")
        pageorder = models.IntegerField("Appearance on bio page",
                  choices=SITEORDER_CHOICES, db_index=True)  
        jobcat = models.CharField(maxlength=1, 
                  choices=JOBCAT_CHOICES, db_index=True, default='P')
        
        def __repr__(self):
            return self.name
    
        class Meta:
            ordering = ['name', 'jobcat', 'status']
            verbose_name = "Staff  Bio"
        
        class Admin:
            fields = (
                ('Content', {'fields': ('name','position','bio','fote', 'jobcat', 'pageorder')}),
                ) 
            list_display = ( 'name', 'jobcat' )
            list_filter = ['status', 'jobcat']
            search_fields = ['name']

In Django's regular admin setup it will look like this:

staffbio-poor

To configure it, we need to add one line to the Admin class - a 'js' array field - like this:

    class StaffBio(models.Model):
        name = models.CharField("Name", maxlength=128,db_index=True)  
        position =    models.CharField("Position", maxlength=128,db_index=True)  
        bio = models.TextField("Biography")
        fote = ImageField("Photo", null=True, 
               blank=True, upload_to="E:/home/dev/storage/StaffBio/fote")
        pageorder = models.IntegerField("Appearance on bio page",
                  choices=SITEORDER_CHOICES, db_index=True)  
        jobcat = models.CharField(maxlength=1, 
                  choices=JOBCAT_CHOICES, db_index=True, default='P')
        
        def __repr__(self):
            return self.name
    
        class Meta:
            ordering = ['name', 'jobcat', 'status']
            verbose_name = "Staff  Bio"
        
        class Admin:
            fields = (
                ('Content', {'fields': ('name','position','bio','fote', 'jobcat', 'pageorder')}),
                ) 
            list_display = ( 'name', 'jobcat' )
            list_filter = ['status', 'jobcat']
            search_fields = ['name']
            js = ['tiny_mce/tiny_mce.js', 'js/textareas.js']

The js field allows you to set javascript for the admin screens. The array itself contains links to the textareas.js file and to the base tiny_mce.js script. One thing to note here is that each link is relative - as of May 2006, Django admin will automatically prepend these links with whatever the value of ADMIN_MEDIA_PREFIX is in your settings.py file (this tripped me up initially).

And that's it. When you go back to your admin and refresh, you'll have a rich text area courtesy of TinyMCE.

staffbio-rich

No doubt the admin is workable with Kupu or FCKEditor if that's your preferred html editor. For me, TinyMCE made sense as it was easy to set up and there was a existing configuration on the Wiki.

The next thing I want to do is install TinyMCE as the default Django Flatpage editor; that will require patching the existing admin module, and probably adding a flag to settings.py to enable/disable it. It would be nice too if the setting was global instead of having to markup each model's Admin class. The only way I've seen to do that to date is to create a new model field called RichTextField that brings in the js editor by default. But that mixes up content authoring concerns with modelling concerns by subclassing forms.TextField, which is why I decided not to go that route. I think hacking the Admin module to be wysiwyg configurable is cleaner, the problem is deciding what js tool to bundle. In the long run I imagine admin will ship with Dojo editor since Dojo is what the Django guys have settled on (and Dojo editor can deal with the back button as well which is *very* cool).

links for 2006-05-24

May 23, 2006

QOTD

Sean McGrath

One would think that putting a custom logo in the top left hand corner would be sufficient for most folks. After all, this is largely what we do with word processor documents, presentation slides and so on. We do not try to hide the fact that we are using standard templates/applications for these things.

Not so for web sites it seems.

links for 2006-05-23

  • "it automatically resolves browser dependencies which is good. Java programmers are used to writing their code to certifiably Java-compliant runtime"- Not quite convinced...

May 22, 2006

No Free Lunch

Tim Bray is disappointed about the reaction to Java's imminent open sourcing. The depth of antipathy held towards Java by some of the OSS crowd is... well, that Blackdown thing sure left a scar.

And if you've ever had the pleasure of typing in a JSR API from the spec, because of Sun's bizarre licences, well, what are you supposed to say to that? Except be thankful those days are coming to an end.

The ASF likes Java, despite typing in a fair share of APIs in their time. I went through the ApacheCon, agenda today, and there's tonnes of it going on in the Burlo next month. It's wall to wall OSGi, MyFaces, Portlets and Any old Java CMS*, with the odd PHP and httpd.conf bit tucked in. And Maven too.

I guess the main thing is to mush on before the LAMP crowd accelerate away from the JVM entirely.


* The ASF have a good few CMS type things written in Java, none of them seem complete or obviously "in front". What's up with that?

And does anyone else think JSR170 is the Java API for ZODB?

May 20, 2006

links for 2006-05-20

May 19, 2006

links for 2006-05-19

May 18, 2006

JournoFormats

Adrian is so close to microformats, with his tag set for online journalism it's scary. Here's his original tag set from the article:

    <time gmt="HH:MM:SS">
    <expire when="YYYY-MM-DD">
    <currency date="YYYY-MM-DD" units="USD">
    <city>
    <profanity level="X">
    <date real="YYYY-MM-DD">

Here's the microformat:

    <div class="time whenYYMMDD gmt">
    <div class="expire whenYYMMDD gmt">
    <div class="currency usd">
    <div class="city">
    <div class="profanity Lx">
    <div class="realdate whenYYMMDD gmt">

In Django using a uF should mean the files can stay stored as flatpages. A case of having your blob and eating it too.

QOTD

Cote':

Instead of focusing on keeping the cost of the whole relationship with the customer at a minimum, most development methodologies focus on making the cost of that first copy as low as possible.

Happy Shiny Metadata

In the software-candy-mountain, I have this lovely notion of metadata that is intrinsic to content and metadata that is not.

The former are universals about the content; which is to say they're true statements for all systems the content might go through - ever. The latter metadata is contingent on systems and people processing the content (workflow state comes to mind as does stuff like "nofollow"). I think it's really quite elegant and would include two personal favourites - a notion of control directives and workflow as being contingent and belonging squarely inside a system boundary; and support for external parties producing statements about content that they are not the source or owner of.

Meanwhile, in the desert of the real, it's impossible to split metadata consistently this way. There are very few (any?) non-contingent 'facts' about content. Universal truths are for mathematicians and priests.

links for 2006-05-18

May 16, 2006

Get Some

Graeme Rocher:

"Every major open source project needs backing from the big players in the industry and Oracle have made a commitment (Thanks tug!) to get behind Grails which is fantastic news."

"(e) has major commercial backing"

May 15, 2006

links for 2006-05-15

May 14, 2006

links for 2006-05-14

May 13, 2006

Dead Planet

I think I'd subscribe to planet.journals.ie if it added some value on top of aggregating Irish weblogs. Which is to say, I don't see the point of an Irish Planet, compared to an Apache or XML one*, whereas I'd probably find an Irish Technorati useful.


* For those being aggregated. Anyone running a Planet has ad revenue to generate value.

Frameworking

Eric Meyer is flummoxed by frameworks:

"But I just don't get all these new-fangled programming frameworks. Is something wrong with me? Seriously. I have this grumpy, churlish feeling that I suspect is rather similar to the way SGML experts felt when they saw HTML becoming so popular, and that scares me."

I just don't know. I'm trying to remember a time when there weren't frameworks - back when CGI supported "rm -rf" maybe. Still there are many sites made out of flat files, hand edited.

There is a point when the framework will either get in the way or show its true value. I think it's the point when you need to produce some new behaviour with code as opposed to rearrange things on the screen. Sometimes you need to build a porch, not paint the wall or add some new flatpack thingie to the middle room.

[ I'm finding that with Movable Type these days, I wish it was a framework instead of a product - there are things I want to do, but altering it seems tricky. It doesn't strike me as "for alteration". It'd also mean I have to learn passable Perl - not MT's fault, but as Perl has always gone whoosh right past me, I'm not hopeful about staying on MT.]

Eric's problem is that if you want total control, you will have to do a lot of work. In the web/bloggy space alone - i18n (maybe), tagging, commenting, archiving, slugging, login, preview, managing, spam filtering, templates, rpc, feed generation, scheduling jobs, who knows what else. You'll also have to make sure that things can be added later in a sane way. True, there is nothing worse than a bad framework, except maybe a bad programming language, but you'll be left in the dust otherwise, either reinventing what others can assume will "just" be there, or writing out all your content again in a way that can be processed by software. Total control implies total effort.

Speaking of content rewriting. Recently, I've been porting a website of flat files (spread over half a decade or so). I've been doing this to test a web framework - frameworks that only function in greenfield or closed situations are not especially interesting. The really interesting tools are often ones that introduce simplicity and order in already complex and highly entropic systems, as oposed to avoiding them and declaring victory. Also (more importantly) the goal is to make the content flexible and available for future needs. Now, the flat file thing itself is great - 100% scalable, precomputed, self-archiving, and so on. But that's not the issue - it's that these files have been written out by hand, where each was cut and pasted from an older one.

The most interesting thing about this site is how the flat files have changed over time. The older pages are different from the recent ones, you can see the copy errors, mutations and evolution take place, but any two side by side in time are almost indistinguishable, exhibiting very subtle alterations.

From a certain viewpoint the site more or less looks like a grand canyon, each internet year producing a new seam.

It's remarkable that something like a website can have a geology, layer after layer of frozen accidents. Integrators will be used to seeing that in non-web systems. But the difficulty is you can't terraform this kind of site - reskinning means editing every single page. Despite the fact that each html file can be read in, they're different enough as collection to mean that they can't easily be uniformly processed. To understand what it would take uniformly process them means analysing all of them, which is to say, if we could spec the code in advance they'd be a priori uniform.

Part of what I've been doing is normalising the files and reverse engineering the latent templates. Now, this kind of work would drive many people insane, but because I deal a good bit in legacy systems (as in I'll admit they exist and need to be linked up), it feels like a workout, a free chance to learn something about how systems old and new need to work. There's no magic shortcuts here, but it's not all eyeballing either - there are also tricks and techniques you can use to determine how variant the collection is - sort of data mining for common structure.

This kind of work also can also drive the framework insane, which is exactly why it's valuable. A good framework will deal gracefully with stress resulting from variation, which means dealing with the structural, syntactic and semantic exceptions, and not insist the data is just so. The green field is wafer thin. In the trenches you need to know if and when the tool will fail you, or whether it will be a force multiplier.

Ok, so we were talking about frameworks. Here's the thing, if you want to be able to terraform, to alter the presentation of content after the point of creation, you need a web framework. Now that framework can be as simple as a few scripts to inject some text into a one up, one down, three across html layout, or as complex as as high end CMS, but it's still a framework. What the web framework does is provide a rendevous point between some code and some content. As soon as you want to do something like reskin, output html and rss side by side, or associate a comment with a post, you're in framework territory. Which is to say to capture behaviour over data in repeatable form, you need code and a place to run it*. Frameworking is programming.

Some things can be made simpler via declaring the relation between some code and some data instead of writing out the code directly, (CSS is a good example of this) but not all things. Contrariwise for uniform processing you need to write out the content in a uniform manner as possible, which places pressure on content writers and designers to conform to the code. Historically, that pressure has been deemed to be excessive, which is perhaps why Eric Meyer sees frameworks as straightjackets.

On the bright side for the content focused - authors, designers, folks who are more like Eric Meyer or Jeff Croft, and less like me, "modern" web frameworks seem much more interested in catering for people who aren't programmers and have broader use cases than automating drudge coding work or making it easy to bind to a database. That's clearly the case with frameworks like Django and Plone and seems to be where the various web developer communities are headed. And that's a good thing.


* I had something in here originally about code then being an existential quantifier over a data structure (or an operator over a type), but I suspect it would have ruined the post :)

links for 2006-05-13

May 12, 2006

links for 2006-05-12

May 10, 2006

Laoise de hÓra

Born 9th May 2006, 3:58am. 8lbs 4oz.

laoise de hora


Hello, world.

links for 2006-05-10

May 08, 2006

links for 2006-05-08

May 06, 2006

links for 2006-05-06

May 05, 2006

links for 2006-05-05

May 04, 2006

links for 2006-05-04

May 03, 2006

MADD

MySQL, Apache, Django, Debian.

It sucks. I know. But if Tim Bray has one, I want one too!

Hey, over here! Shiny!

From the funniest ex-hacker: The Markup Wars.

alt-tab

Bruce Eckel on Python IDEs: "As far as Python goes, the argument for an IDE is not so compelling. Most people I know just use regular editors. I think the reason is that Python is less verbose.The example I often give is to read each line from a file, which I can do in Python without thinking about it. In Java, it's a research project to open a file."

Ruby people speak much the same about TextMate. Yes, it's true the brevity of the language means a text editor will get your further, but that's not a good reason to pass on a powerful IDE.

A good while back I gave up on one true ring theories for editing source and haven't looked back. Typically there's 3-5 "IDEs" open on my machine. Right now, in my taskbar, there's ultraedit, emacs, wingide and eclipse. Today for source code, I used textpad, emacs, wingide idea, pydev, eclipse, vim, ultraedit. I'm betting lots of developers work with multiple editors. I don't like vim much, but it's more or less guaranteed to be there on servers, so it makes sense to use it, whereas emacs is hit and miss.

If there's an issue with IDEs today, it's this obsession with debuggers. I need a better debugger in an IDE about as much as I need a better paperclip guy. Profilers and good support for running tests would be great. I've used the debugger in IDEA once in 4 years. That one time was because a colleague couldn't believe I'd never used the debugger in IDEA so we stepped through a session together. Go figure.

The Library of Imaginary Machines

Both JRuby and Jython have experienced stop start development. Why is that?

Maybe language development isn't sexy anymore. For a lot of developers now, I think programming languages just aren't all than interesting in terms of a target abstraction, or for playing around with. My generation (graduating in the late nineties) might be last that actually were interested in programming languages as an end in themselves, and even then platforms like J2EE, CORBA, and the Web were much more interesting to many. Yes we argue a lot today about which language is better, but that's mostly trash talk about The Man and what we use to get our work done. I'm talking about what we play with outside of work for fun - it tends to be frameworks and platforms, not programming languages, which are now a means to an end, something you put up with. Time was I would play around with Antlr, or use RelaxNG to verify other languages, or hack a http subset. For fun. Now I fool around with Rails and Django and Web Protocols, which are different classes of machine to a language machine.

Perhaps the relative importance of the programming language is because developing programming languages is difficult, something we might classify these days under systems programming. It's more difficult than building a framework or a library. There's less tolerance for inconsistency in a language than a framework. Internal illogic will be found out. Yes, something can put something together quickly and it can be useful and fun, but really finishing it out, making it performant, the last 20% - is a big ask.

In software there's always some machine you're dealing with that is made from another machine. There are Turing Machines, Von Neumann Machines, Register Machines, State machines, Automatons, Virtual Machines, Pools, Caches, Timers, Schedulers - even languages are machines. Sun has landgrabbed the term Virtual Machine for Java, but really there's always been virtual machines.

As a consequence using the current machine or language to implement a more relevant machine or language is a powerful technique. Since new machines can take time to figure out and are hard to estimate they make line of business managers nervous - they smack of over-design, over-engineering, and over budget. As Terence Parr put it "Why program by hand in five days what you can spend five years of your life automating?" But when you have one, it's like going to battle with an AK-47 instead of a musket.

The gun analogy is crude, but apt. There is no force multiplier in software development like a new machine. Once you start writing software this way, in terms of better machines, it's very seductive. New machines are silver bullets. Power to get things done is what gets people fired up about Rails as a DSL, or MDA, or Code Generation. I suspect current heady interest in Domain Specific Languages (DSLs) is driven by the "power tool" notion.

When Joel Spolsky talks about Leaky Abstractions, he's talking about the situation where an underlying machine, one which you should ideally pay no attention to, pops through your present machine into your coding reality, Lovecraftian style. The word we use to describe this hiding of machines is "transparency", which is one of those technical terms that is correct, but bone-headed. Most people see the word "transparent" as allowing you to see through to something, not hide it. What we really mean here is "opacity", the ability of one machine to hide the fact that it is implemented in terms of another machine. It's when the underlying machine (or machines) seep through that there *is* transparency.That's when the leaky abstraction occurs and problems begin.

I like the "machine" term better for this kind of discussion - it has less psychological baggage and consequent politicking than "language". For starters you don't have to buy into Sapir-Whorf theory to appreciate the words of people like Steve Yegge or Paul Graham or Sean McGrath on this subject. It also keeps the discussion focused on operational and utilitarian matters. "My machine is better than yours" has a fighting chance of a useful outcome. "My language is better than yours" is a Godwin attractor.

Recently a Rails request ran on the JVM. That's the kind of thing that can drive interest. JRuby becomes the cost of doing business for that VM; an underlying machine to support the Rails machine/DSL. JRuby itself is of limited interest - whereas a clone of Rails on Java turns heads. Hence JRuby gets done.

Support for dynamic languages on the JVM will probably work itself out. Instead it might be concurrent programming that results in migration away from the JVM. "Nonsense", you say - "the Java threading model clearly rocks". Well, yes and no. Some have claimed that Java, with its shared memory concurrency model does threading right. I'd have to agree - for the set of credible commercial languages Java is the winner if you are going to do threads. But it's a stretch to say its shared memory model is the right approach to concurrent programming altogether. Languges like Erlang and Mozart, are arguably better, albeit currently 'impractical for commercial use'. The only Erlang book I can find is from the mid-nineties, is currently £47 - I've been trying not to buy it for a few months.

Multi-core is the default production configuration now for CPUs I trialed a tiny laptop a few weeks ago and the most surprising thing was that it had a dual core. I'm no metal head, but given that they're turning up in machines optimized for road warriors and office apps, and if people are right about the end of Moore's Law (bye-law?), then the next major machine/language argument could be over concurrency models, with shared memory playing the incumbent, just as static typing and OO does today in language design. Productivity and expressive power might be recast in terms of your ability to work with the concurrent architectures being imposed on you.

Bubbles! Shiny!

After a pointer from Coté, I setup an account on Last.fm just over a fortnight ago.

Thanks to Last.fm's recommendations and neighbour features I'm building up a list of music to buy. iTunes nor any music store I've seen does *nothing* like this in terms of driving demand (and my colleagues may have noticed increased use of The Headphones recently :). Someone is going to end up paying serious wong for this property.

The title is a bit snarky. One thing that marks out Web2.0 mania from DotCom mania (apart from functional javascript ;), is that so many of the Web2.0 sites are actually useful.

There's no such thing as neutral content

Rick Jelliffe on the "The Self-Defeatingness of XML's Media Independence":

"On a project today one of XML's paradoxes struck me: we adopt XML for publishing often because we want to re-target our documents to different publications and media; however we then find it useful if the information is organized or formatted similarly on different media and applications, in order to reduce gratuitous differences, ease processing and to increase branding. So our books end up getting PDA-isms such as small sections, or HTML-isms such as page focus, or RSS/Atom-isms such as chapter and section summaries (It was worth writing this blog for the pleasure of saying 'RSS/atom-ism'.) And our HTML pages get book-isms, such as the familiar TOC and next/previous/up buttons. The initial movement for media and publication independence is met by a counter movement for cross-media and cross-publication homogeneity."

Rick nails it. There is no such thing as channel neutral content, or device independent authoring.

I should qualify that. You can and should go some way to device and channel neutrality, XML is good for that, decoupling channel output from content storage is good for that, neutrality is an admirable goal. But after a point you realise that you're aiming for goal that isn't totally achievable. Compromises are neccesary. As a goal it's in the "world peace" category. Push things too far and the risk is that by trying to be all things to all people and all media you end up with a cacophony of styling issues, unwieldy content models, and jumbled content. Worst case, you end up trying to compute context from excerpts and syntactic markup, essentially trying to solve an artifical intelligence problem.

As writers and publishers, we've internalised "know your audience" as an unquestioned maxim. As writers and publishers, we're much less reluctant to take the other thing onboard - "know your medium" isn't even a maxim, never mind an unquestioned one.

For example lets look at how I quoted Rick. I started with his name and post title, followed by the quote. That style of quoting - "someone at this link said this: - " is quite deliberate. I call it "Ruby Quoting", since I picked it up from reading Sam Ruby's weblog. The reason I use it is because this post will be pumped through any number of aggregators including ones that will strip out the structure and indentation that makes the quote visually distinctive, as is the case with traditional media like print, or with HTML web pages. At quick glance or topline scanning (typical reading modes for feeds), the quote can be mistaken as your words. The lead in helps avoid that from happening.

Here's the more traditonal quoting style, as might be advocated by academia or print publishing guidelines:

On a project today one of XML's paradoxes struck me: we adopt XML for publishing often because we want to re-target our documents to different publications and media; however we then find it useful if the information is organized or formatted similarly on different media and applications, in order to reduce gratuitous differences, ease processing and to increase branding. So our books end up getting PDA-isms such as small sections, or HTML-isms such as page focus, or RSS/Atom-isms such as chapter and section summaries (It was worth writing this blog for the pleasure of saying 'RSS/atom-ism'.) And our HTML pages get book-isms, such as the familiar TOC and next/previous/up buttons. The initial movement for media and publication independence is met by a counter movement for cross-media and cross-publication homogeneity. - Rick Jelliffe

That style doesn't work as well for RSS syndication, especially when the quote is a lead-in to the rest of the post. People tend to think it's you writing and might misattribute the statement, even when you surround it with quotation marks and mark it up with <blockquote>. Second, use of quotes for lead-ins is an idiomatic form for weblogs - posts are often reacting to what someone else said (just like this post is). It also affects traditional essay and chapter techniques, like the use of pithy opening quotes and aphorisms - again, those techniques don't work out that well over RSS - I think someone complimented me once for something Dr. Johnson said. After being misquoted (or even flamed) a few times you learn to adapt the content to the medium by using a different authoring technique. That's what media does to content. It's unavoidable.

I remember being at WWW9 in Amsterdam in 2000 and hearing someone from the BBC explain that massive volumes of content had to be repurposed by hand more or less for the web. And then it had to be repurposed again for mobile devices. I think they were going as far as keeping multiple copies of the content for broad class of devices and channels (some from the Beeb is welcome to confirm/deny that). As it turns out, mobile devices have characteristics not a million miles away from link or excerpt based feeds - which may have something to do with why the BBC seem to understand web syndication more than any other broadcaster. That kind of medium/device might well represent a lowest common denominator for organising digital content.

The autumn after WWW9, I got to review a position paper written by Miles Sabin, which stated bluntly that device independent authoring was a pipe-dream. Here's an excerpt outlining his position:

"The proliferation of new delivery media for web content has brought an old problem to the fore in a new context: how to produce content which is suited to diverse media consistently with constrained cost and time budgets. The range of new devices is very broad, from traditional PC browsers, through mobile, embedded and consumer devices, to speech only devices and devices specialized for accessibility needs. This breadth, combined with comparative novelty, might make it appear that we have a radically new problem to solve, and that a search for radically new solutions might be worthwhile. We believe that this appearance of novelty is deceptive and that, regrettably, the search for novel solutions will be fruitless."

The key idea that paper tried to debunk was that of "primordial content" - it's a necessary assumption for single source* authoring solutions to be achievable. I think the position holds up well, and would go as far as saying the last five years have born it out. Today I would extend that idea to any notion of "primordial markup" - a single metadata set which you can syntactically transform to arbitrary devices. I didn't go to the workshop then, but what we back heard from Miles who went to Bristol to present it, was that people agreed with his take (a surprise: we thought in might be a controversial view), but were pretty glum about the situation.

Update: Miles sent me a link to some slides - http://www.w3.org/2000/10/DIAWorkshop/sabin/slide1.html. he also gently refreshed my faulty memory as to who did the bulk of work - certainly not I! Miles' now properly attributed.

It's not a new issue. A "news story delivered via radio is not the same as the story delivered by television but without pictures", or via a news site with a graphic and excerpt. Consider powerpoint - it has fundamentally altered how business people communicate, in particular how they buy and sell services. No-one expects to be able to autogenerate powerpoints from written documents - they're always crafted by hand from existing content - best case some old deck slides get reused. Email has had a similar impact, although gets less flak than powerpoint. IM and SMS will go as far as changing how we speak.

The medium really is the message.


Colophon: I should say that none of this is meant to be an engineer throwing up his hands and saying "Impossible!". Some amount of channel independence is both possible and desirable, but there's a limit to what can be realistically achieved without having a human author re-appropiriate the content. I think this slide, http://www.w3.org/2000/10/DIAWorkshop/sabin/slide13.html, indicates some of the limitations.

* Thanks to Scott Mark for reminding me of the term "single source".


May 02, 2006

links for 2006-05-02

May 01, 2006

Nagware

I just installed a "recommended" Acrobat reader upgrade, as in I seem to get a pop-up every time I start Acrobat reader telling me to "upgrade". I think we need a name for software that does this - nagware, maybe (update: looks like "nagware" is already taken). Moreover I had to reboot the thinkpad *twice*. I wonder what it's doing that it needs to restart a computer two times - it doesn't strike me as sandboxed, and given how many PDFs are downloaded these days... fret, fret, fret.

4Gbs of RAM on a laptop running a VMware image - that's the future.

links for 2006-05-01