" /> Bill de hÓra: December 2005 Archives

« November 2005 | Main | January 2006 »

December 31, 2005

Difficult v Hard as opposed to Java v C

People seem shocked, just shocked, about Joel Spolsky's post "The Perils of JavaSchools" because Java is deemed too easy. Spolsky's being quite specific tho', insofar as the post is not a criticism of Java at all, but a concern about the content of CS curricula, where he compares C and java as teaching languages and ends up favouring C.

"Instead what I'd like to claim is that Java is not, generally, a hard enough programming language that it can be used to discriminate between great programmers and mediocre programmers."

Which makes me wonder if people read it clearly. Joel calls out two aspects that might discriminate good and bad programmers, recursion and pointers.

"If I may be so brash, it has been my humble experience that there are two things traditionally taught in universities as a part of a computer science curriculum which many people just never really fully comprehend: pointers and recursion." - Joel Spolsky

As I recall there's a pit of despair when you first hit pointers. My C lecturer, Dimitris, even drew the learning curve for us at the beginning of the semester - the middle third looked like a market crash. Some books can help with the concepts, but pointers are... alien. Recursion is another matter. Understanding recursion seems to come down to how it's taught and explained rather than talent.

Joel goes on to say:

"There's nothing hard enough about Java to really weed out the programmers without the part of the brain that does pointers or recursion."

So what's more more interesting here in terms of education than either language wars or the relative merits of languages in the industry is whether Joel is picking the right challenges. I wonder if pointers and recursion are the right hard stuff to be teaching programmers with - I have my doubts. Consider concurrent programming - that's hard, it involves coordinating actors as well as the usual data structures and behaviour issues. I'm fairly sure it's hard in Java*, but Java through its memory and threading model allows you to focus on concurrency with a minimum of incidental noise (such as pointers). Distributed programming is hard (and with it comes the truly hard matters of cache invalidation and naming). Organising medium to large scale software is hard. Yet, Java, the easy language by comparison it seems, gives you plenty of teaching options on these fronts. Joel has a crack at teaching OO but OO is useful for teaching one thing - state - that is, how programs can function with respect to time. The problem with OO, these days, is that if you're going to be working anywhere near a network (these days, lots of us are), it's maybe teaching you the wrong lessons about how to manage said state.

Anyway. Yes, pointers might be hard to fathom except for a tiny fraction of the general population and a small fraction of programmers, but they're arguably an irritation rather than anything fundamentally challenging.So given the educational context, perhaps such annoyances are worth forgoing so you can move students towards hard, as opposed to difficult matters. Joel mentions the SICP material for MIT, which is telling. One of the reasons SICP teaches with Scheme is because in an education setting Scheme has precious little by way of distractions, allowing students and teachers to focus on what's actually being taught. Indeed Scheme seems to be sufficiently clear that you can use it to teach classical mechanics and dynamical systems as well as programming.

* Maybe 10 years from now, someone will be complaining about Erlang not being any good for weeding out mediocre concurrent programmers the way Java was.

December 22, 2005

Watch that space

Radovan Janecek on triple stores:

"there is no way to integrate disparate registries/repositories together than something like that"

At least I think he's talking about triple stores. The reason to call this out is that Radovan really, really knows his Enterprise, WS and Web technologies. And Domain Specific (DS-*) is a powerful meme right now. If he's concluding a domain neutral appraoch to integrating datasets is the way to go, that's grounded in a lot of real-world experience, not just theory.

Watch that space.

depth favicon

I really like Matthew Gertner's favicon. Very striking, and unusual to see someone publish a 3 dimensional one.

Then you win

"Wikipedia is not an authoritative encyclopedia, and it should stop trying to be one." - Nicholas Carr

Argumentum ad verecundiam. No mention nor rebuttal by Nicholas Carr of the bug equivalency metric. I'll take Carr's post as a recognition that the Britannica and Wikipedia approaches are viable. Somehow, I'm reminded of scripting vs static typing programming language arguments a few years back - incumbents slowly forced into acknowledgment. And I'm in no doubt, that while the metric may lack meaning, the numbers alone have surprised people.

Joda-Time 1.2 ships

Joda-Time 1.2 is released.

It's nice to have an alternative to the JDK date clases.

Wikipedia v Britannica: putting it in perspective

Niall Kelly on the Wikipedia v Britannica thing:

"Show me where in Britannica you can read up on all past and present members of the X-Men (in preparation for the new movie of course) and I'll start paying more attention."
Charles Miller:
"Take a walk, for example, through Wikipedia’s incredibly detailed coverage of Pokémon, professional wrestling, or fan fiction. No aspect of the miscellany or trivia of their subject-matter is left uncovered."

December 20, 2005

Prole Art Threat

"Wikipedia is about as good a source of accurate information as Britannica, the venerable standard-bearer of facts about the world around us, according to a study published this week in the journal Nature.- CNET"

It'll be interesting to read Nicholas Carr's take on this, given past criticisms of the wiki:

"The rift comes at a time when the quality of the encyclopedia, which has long been held up as an example of the Internet's ability to harness "collective intelligence," is under debate (a debate set off by a critical post of mine earlier this month)." - Nicholas Carr

There's no word from him yet. But it's interesting to wonder if what is essentially a statistical approach (Wikipedia) can compete with the structured semantic one (Brittanica). Maybe it can for relatively large numbers of contributors.

Let's leave the last word to Samuel Johnson circa 1753:

"I saw that one enquiry only gave occasion to another, that book referred to book, that to search was not always to find, and to find was not always to be informed"

Feeds vs Attention, or Data vs Behaviour?

"While one benefit of a del.icio.us feed is more granularity the problem is that he'd be spoonfeeding us instead of teaching us what to pay attention to. On the other hand, by sharing an OPML based Reading List Piaras would be providing an 'attention lense' which could be applied to many services going forward." - EirePreneur

Being a software person, I may well be missing the key thing that makes OPML a more valuable format to a user that say, an RSS/Atom feed, XBEL, or XOXO. For me, they're just formats, they don't very much on their own. What counts are the behaviour and features the software can weave around them, and how easy it is to mix and move the data between applications. That's aside from some issues around the formats, that only software types could (and probably should) care about - for example Robert Scoble ain't all that interested. But if you're an app developer or more to to the point, someone who wants to recombine apps, how you get your data and how it's formed can matter - good data formats enable good software.

update: James Corbett sent though a comment that's worth lifting up:

"OPML is a collection of RSS feeds and as such is an aggregate, overall indication of attention as opposed to a single thread of that attention. People have numerous behavioural characteristics which make up their overall personality and likewise a dynamic OPML collection of feeds (Reading List) is a much better descriptor of personality than any single feed (describing only one characteristic/interest).

And because OPML is already 'out there', supported in one form or another by every aggregator, it is ideally placed as an initial standard for Attention data IMHO.

As a non-programmer I can't argue the technical merits of one format versus another but if there are alternatives that as well placed as OPML, and not just 'in the lab' then I'd like to see them. I've tried to follow the argument about XOXO but my question is always this - are there OPMLmanager.com, OPMLsearch and OPML editor equivalents out there which are as easy for an 'end user' to use to build something like the the Open Irish Directory (OpenEir.org)?"

December 15, 2005


Bye bye, UDDI (spotted by Mark)

Elsewhere I see posts about stuff like using indigo for REST based apps. More and more grumbling about XSD. That sort of thing. What's irritating out all this recent insight around WS is that it's not like we didn't know WS wasn't going to meet the hype. It may turn out to be a surprise for the industry, but I think the people in the trenches saw it coming a long time ago.

December 14, 2005

RDF - schema versioning and data typing

One of the advantages of storing an RDF representation in an RDBMS is that you'll never (hardly ever?) need to make a schema change in the RDBMS - because the domain is not represented using tables - tables are solely used for storage of RDF triples.

Your Mileage May Vary

Using RDF storage provides flexibility at the domain level. Altering tables isn't needed because RDF, being a graph based, is naturally additive. Instead keep adding new rows, where every row represents a link between two nodes in the graph. The downside is the number of rows you'll have to manage will explode; depending on the size of the datasets you're working with this might not matter. My (somewhat anecdotal) experience with RDF is that datasets in the order of 106 and greater aren't uncommon and that you should budget for an order of magnitude increase in terms of the number of rows required for the domain storage compared to an entity relational approach.


It's an interesting question whether using RDBMSes to store RDF counts as some form of abuse, or bad engineering. RDBMSes were after all designed to support relational algebra, not RDF's model theoretic semantics (when you do the math, you find the math are different). That said a number of relational experts point out that RDBMSes don't implement relation theory properly anyway. The mismatch between RDBMS and RDF is similar to the mismatch between RDBMSes and OO (collections of objects being graphs as well). This doesn't bode well - ORM, yow. However most of this mismatch occurs when the graph data is shredded across the domain's tables and roundtripped in and out of the database server. If the RDF store is using an RDBMS primarily as a storage and indexing mechanism for graph structures rather than mapping onto domain specific entity tables (Users, Cars, those sort of things), the dissonance is lessened, and you're left with a straight-up engineering matter (getting the RDBMS to perform CRUD efficiently) rather than a domain modelling/mapping one.

Complexity Conservation

One last thing to consider is that where you gain in structural flexibility you might lose in developer convenience. Consider Ruby On Rails and Django. One reason cited for the immense productivity of these stacks is the dynamic and flexible nature of the underlying languages (Ruby and Python). Part of the productivity boost is also is coming from leveraging the 'static' types of database tables (or put another way, when you take away the backing databases these frameworks have less to offer). When RDF is stored abstractly on an RDBMS, the type information that could be derived from entity tables is lost. There's an argument to be had that not having this table metadata around will makes the automation of things like forms generation/capture and validation trickier (and perhaps intractable). With the exception of some RDF/XForms related work by the folks at Copia and maybe Danny Ayers, I don't know if the RDF community has looked at this, much of the focus lately has been on query support through SPARQL.

December 13, 2005


"Where's the domain specific language for the domain of software programming?"

UI clunker #1

Not the best UI prompt I've seen recently

December 09, 2005

My Eyes!

I've been having problems with my eyes in the last week - to keep it short, I burst some blood vessels in my sockets- the whites get covered in a film of red. It's not sore or damaging at all, but it meant they looked a bit creepy for while - not quite 28 days later creepy, but some creepy.

Richard in work has kindly attempted to recreate the condition with a fote - ahem.

December 06, 2005

Atom Format: RFC4287

Atom gets an RFC number: RFC4287. Its status within the IETF is a "Proposed Standard Protocol", the format is stable, so you can build on this right now.

Whatever about RSS2.0 and RSS1.0, RFC4287 is your upgrade path from Atom 0.3.