" /> Bill de hÓra: November 2006 Archives

« September 2006 | Main | December 2006 »

November 23, 2006

Django Generic View: archive_monthdays

Django's generic views have an archive_month view , which defaults to listing all the objects found in that month. If you have a high volume of things to show you might not want to render them all on a single page. So instead it might be handy to break the month down by days, in the same way archive_year breaks down by month. Here's a generic view that does that: http://dehora.net/code/django/archive_monthdays.py

Essentially it's a cloned version of the archive_month view, with the following changes:

  • make_object_list: the same arg as found on archive_year, make list of objects published in the given month if True; defaults to False
  • "year" passed as a str to the template context
  • "days": passed as a list of dates to the template context

and here's a simple archive_monthdays template:


    {% extends "base_site.html" %}

    {% block content %}
	<h2>{{ month|date:"F" }} {{ year }} Archive</h2>
	
    <ul class="linklist">
	{% for date in days %}
		<li>
          <a href="{{ date|date:"d"|lower }}/">
             {{ date|date:"D" }} {{ date|date:"d" }}</a>
        </li>
	{% endfor %}
    </ul>
    {% endblock %}	

November 20, 2006

Semantic Review

Revyu: review anything you can put a URI to. It's alpha alpha alpha alpha alpha alpha alpha alpha alpha alpha alpha alpha alpha

Cool. Uses RDF to tag URIs with review data, and exports the reviews as RDF, which are in turn, reviewable. They should put a stylesheet on the RDF tho', and they might need a cache layer. But I guess we can start putting "revyu it!" buttons on things, once there's an API.

Prediction: if it gets traction, Yahoo! will acquire.

via: Danny


November 19, 2006

links for 2006-11-19

November 18, 2006

What America Accent Do You Have

Judging by how you talk you are probably from north Jersey, New York City, Connecticut or Rhode Island.

Circa 1865 maybe.

links for 2006-11-18

November 17, 2006

The War On Error

Last March: REST wins, noone goes home.

Well, it looks like we're done. Which is worse, that everyone gets it now and we'll have REST startups in Q207, or that it took half a decade?

It's tempting be scathing. But nevermind, The Grid's next.

Think about it

Sean nails it.

November 16, 2006

Currently Reading

Murano Magic

Leo Simons:

"We convert from RDF to different specialized XML formats and back again. We convert from RDF to excel spreadsheets and back again (ugh). We have our jira instance hooked up to our RDF store. We convert RDF to other kinds of RDF. We have custom RDF visualization tools. We have custom RDF store crawlers that do efficient validation. We have RDF schemas that control the behavior of other distributed systems by adding intelligence to the core schema. We do triple timestamping. We do intelligent schema-driven indexing. We have custom libraries to make doing wicket-based, RDF-based web application development easier. Oh, we do RDF-based web applications. In short, we do more RDF than you can shake a stick at. So not a day goes by without some of our developers swearing about "RDF" or "metadata", since in many ways RDF still isn't exactly mature technology. But we'll fix the warts, and contribute those fixes back to the open source community."

via Danny - cool; finding TV was my final year project (machine learning over graphs pulled out of program descriptions). I can't see this being done over broadcast networks - being able to tag TV would be awesome, but being able to *rate* TV would kill the advertising industry, which in turn kills TV unless you disabled ratings for ads. So I guess this runs over IP networks. And if TVP is any good, Google might buy them, if only to manage the innovation.

QOTD

"A gentle merging of REST and EDA."

That's such a soothing turn of phrase.

November 15, 2006

links for 2006-11-15

IM Grid

Ian Foster: "When I first met Web fundamentalists, I found them irritating, because they would not debate on technical grounds. However, they have ultimately proved to be entertaining."

One of the curious things about REST design is that it's *not* absolutist. It clearly demarcates a problem space. It identifies characteristics (aka design decisions) that result in what Fielding and others believe to be an optimal design. If there is a general purpose lesson to be learned from the REST style, it's this - understand the problem space, and then apply a principled design to it.

Much more interesting is why the Grid has cycled its core specifications in the last few years - first OSGI, now WSRF. Here's an alternative view: 5 years from now, Grid services that are not already on Web will be based on instant messaging, and XMPP/Atom/RDF will be the key Grid transfer protocols and formats.

November 13, 2006

links for 2006-11-13

Antialiased Emacs on Ubuntu

Two weeks ago: "I think, but am not sure, that my anti-aliased Emacs crashes intermittently. As in poof, utterly gone. I really want to be imagining this one."

That turned out to be non-imginary. Emacs would shut down with no warning, no messages. Just gone. Since I use Emacs to hold temp files and bits and pieces in buffers over the course of the day, it needs to be stable. If neccessary I'd go back to a regular install.

The first time around I had problems building the antialiased version; so I used a premade .deb, and this is the one that shuts down. The chances are the .deb was fine but my local configuration wasn't. In trying to build from scratch again, it turned out the problems were done to not having libxt-dev and libncurses5-dev installed as build prerequisites. That meant /.configure would not find X headers. So make might fail, and if it didn't the compiled Emacs would run in terminal mode (in the latter case if you don't have libncurses5-dev you'll get a termcap error when starting Emacs).

Anyway, The following incantation worked for me on 6.06. Before I started I removed everything emacs related from Ubuntu via synaptic (by searching for emacs and taking the uninstall option).

sudo apt-get install libncurses5-dev
sudo apt-get install libxt-dev

sudo apt-get -f install emacs-snapshot-gtk
cd /opt
sudo cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/cvsroot/emacs co emacs
cd emacs
sudo cvs up -Pd -r XFT_JHD_BRANCH
sudo ./configure --with-x-toolkit=gtk --with-xft=yes --prefix=/usr
sudo make bootstrap && make
sudo rm /etc/alternatives/emacs
sudo ln -s /opt/emacs/src/emacs /etc/alternatives/emacs

The difference between this and another incantation going around is slight - including the the two libraries at the beginning and scrubbing anything emacs related from your OS.

November 12, 2006

Factories like they oughta be

E = _E()

Over there, maybe

Danny responds quickly on "where do we go from here"

On Databases and efficiency: "I suspect the practical upper limits of scale are well above what'll be needed in practice."

That's a 640kb argument if I ever head one :) I must say I really, really don't belive it. I'm thinking billions and billions of triples in a decade; even less. Ok, I'm exagerrating, and it's easy to just add another order of magnitude to score points - but having to interactively process 10s or 100s of millions of triples isn't far fetched.

Update: Danny left a great comment, I'm lifting the entire thing:

I guess I should have qualified that first sentence : "I suspect the practical upper limits of scale for_a_single_store are well above what'll be needed in practice.".

I generally agree with what you're saying here, but would emphasize spreading the latency question out - say you've got 1000 triples in each of a 1000 independent, remote stores, how quickly can you match a particular pattern?

I'm not sure how far the notion of response time in search engines generalises. How's this sound:

Customer: "McTodger and chips, please" [400mS]
(plastic tray appears)

- processing time 400mS, response time 400mS,

Customer: "McTodger, please"
Spotty Youth: "You want fries with that?" [100mS]
Customer: "yes" [500mS]
Spotty Youth: "You want a McCupOfTea with that?" [100mS]
Customer: "no" [500mS]
Spotty Youth: "anything McElse?" [100mS]
Customer: "no" [500mS]
(plastic tray appears)

- processing time 1800mS, *apparent* response time 100mS

Whatever, the fact that Google can do what it does is some cause for optimism. As is Elias Torres playing with Map/Reduce code.

Update: following some links, I found a paper on scaling Ingenta's storage. Leigh Dodds works for Ingenta and they use Jena+Postgres; he's been looking at the RDBMS scaling side for some time. I also found this claim by Michael Bergman: "It is truly (yes, TRULY), not uncommon to see ten-fold storage increases with semantically-aware document sets.". That's more or less has been my experience. So, maybe we need an order of magnitude?

It's not so much a question of how much data - it's a question of how efficient triples can be, compared to say, db tables operating over domain models, or a text store operating over inverted indices (technologies that have invested 1000s of man years and billions of dollars in making them efficient). Without that, the only way to justify a massive performance hit is a corresponding increase in functionality - one place where the semweb community needs to explain itself better.

This goes back to integration as well - where's the compelling story about how RDF can augment existing domain models? I've seen enough to say it's entirely a good idea, but I wouldn't bet a system design on it just yet. Maybe I need to catch up on the semweb engineering state of art; I'm easly two years behind.

On scale: "One of the features of the Semantic Web is that it's distributed (just like the web), so there's no need to keep everything in one place."

That much I know, but see the point I made about it being an engineering neccessity, as opposed to a feature. One word counts here - latency. if you believe the research Google have conducted recently, response speed matters to users more than anything. And I'm betting most of the time Google spend searching is due to data center latency. Of course Google, along with all other major search engines, are heavily invested in centralised storage. Then again, I've heard this "search speed is king" argument anectodally from time time over the years.

I guess if anyone can pull it off, they'll have an instantly disruptive technology for searching, one that would fit naturally with the interaction models of Mobile and IM technology, which are nothing like the Web's.

On this issue of time on the wire, I did some back of the napkin stuff a few years back for a project - iirc RDF/XML was the most efficient way to represent triples; that's probably due to XML namespaces acting as a compression algorithm for URIs. I remember thinking turtle plus namespace abbreviatons would the way to go; you get a second boost since it can parsed faster than markup.

The other option to reduce time on the wire is intelligent routing built on a network of notifications. That would supply a means for query services to expose what domains they can answer on, allowing you to route queries to them. The big assumption is that queries can be analysed - tough when most people only type in one or two words. But it might be useful for vertical client applications, such as music players (arguably Amarok already does this with musicbrainz).

Where to go from here

Danny Ayers left me a challenge: "I don't see an answer to 'so where do we go from here?'."

Well ok then - let's look at areas where the semweb community could induce adoption by elminating barriers. Please mind I'm using "RDF" loosely here; what I'm saying applies equally well in my mind to extended models, like OWL, or to vocabularies like FOAF and SKOS.

Databases, and efficiency. As this current kerfuffle began with relational databases, some work on the semweb side showing how to store RDF data relationally, in a time and space efficient manner wouldn't hurt. Or how RDF extension triples can be integrated into domain models. Storing RDF in an RDBMS is very inefficient relative to using domain/entity models. If it can't be done without making trade offs, that's is also useful information - nobody pretends storing trees in DBs is a slam dunk. In particular it's important to understand if there are upper limits on how many triples can reasonably be handled by an RDBMS (my sense is 5-8M records). It's one thing to speak of a web of data; it's another to say a web of data is going to be an engineering neccessity.

ORM and widget mappers. How can RDF serialized data be mapped in and out of HTML forms and databases? All modern db backed web frameworks do this via their ORM and widget mapping systems - it's one of the reasons they're insanely productive. Indeed you can make an argument to say that improvements in ORMs and automated form mapping are the most important advances in web technology in the last half-decade - not AJAX, not REST awareness, not compliant CSS engines, not even syndication. But you can't do any of that with RDF as unlike HTML forms and SQL tables, it doesn't come with a useful type system. Try to map arbitrary RDF in and out of HTML forms if you don't believe me. Even constrained formats like DOAP can be impractical to work with, and tend to result in point solutions. By comparison, stacks like RoR and Django are ridiculously easy to work with for arbitrary entities. Plone's model of content, called Archetypes is even more sophisticated - every content object in a Plone system gets view and edit handling for free due to its Schema and Widget designs. None of these are using RDF, and as far as I can tell they couldn't without a massive loss of functionality.

Tagging How RDF can add value over and above Web2.0 tagging schemes. Indeed, can it? If only I had a cent for every time I heard a semwebber point out how RDF is much better than blog categorization, atom:category, tag clouds. Yet no-one, statistically speaking, is using RDF for these things. Danny said recently: "A handful of metadata fields attached to a blob? We can do a *lot* better than that." Blobs with property values, in the form of Atom and ID3 and EXIF, and are creating more value on daily basis than RDF has done in an entire decade. A huge amount of value in social software and mashups are driven by blobs with metadata - also known as tagging. Blobs with property values are *insanely* useful. Just explain how RDF makes blobs better (clue: relating blobs to each other).

Integration: Above there's need to be a story on how RDF/semweb, can integrate with existing commercial technology. Phased deployment with RDF is too difficult. A key reason for this is that the official format, RDF/XML does not round trip due to the allowed variation in its syntax, which makes it inaccessible to other tool-chains, unless they become RDF/XML parsers. Most commercial work with data is fundamentally based on processing tool-chains. People shunt data from system to system, back and forth, and change the formatting as they go along. This is absolutely fundamental in the industry sectors I work in, and on the web itself. The cost of making all the various stages aware of RDF/XML aware is highly unlikely to make economic sense, and neither is deploying RDF toolsets end to end. Hence RDF/XML remains largely undeployed where it could in theory be valuable. If you're going to have deploy RDF/XML in toto, well a lot of people won't see the investment value no matter how well made the ROI case is made - just use a homegrown XML vocab that is syntactically static and can be transformed, or a standard something that is relatively consistent and can have the semantics layered on via scripting, like Atom.

Anything else is boiling the ocean. And while you can, and the semantic web community frequently do make the argument that there are overall cost benefits to be had, anyone with an iota of experience delivering production systems will know that expanding the scope of a project in that manner makes the project more likely to fail. If the technology can't be deployed organically, that's the technology's problem, not the ecosystem's.

Syntactic stability: in the last 18 months, I've become convinced that RDF is almost ideal as a backup format for semi-structured content. Well, not the content itself but the content metadata, and specifically relationships between content. Once your software systems internals are instrumented to identify each content item using a URL, associations like parent-child, translations, labelling, permissions statements, almost any kind of index, are prime candidates for RDF serialisation. The path based notations of the JCR or Zope aren't adequate as identifiers, nor are the database primary key identifiers used in blogging apps and Web/CMS frameworks. XML while good for raw content intrinsically doesn't support relations, and the kinds of guarantees you get from RSS/Atom or a custom/private format are weak sauce at best. Now, assuming you can instrument the data with URIs there's a big big opportunity to use RDF in an operationally critical part of a system. You can also in principle 'publish out the back' - by giving peole your backups for syndication or warehousing.

Again, the problem here is that the XML syntax doesn't roundtrip in and of other RDF systems; it also means you can't safely merge backups from multiple systems without being fully committed to an RDF/XML toolchain or there is a good chance either your marshallers or your incoming parsing layer will break. This, along with toolchain integration are prime arguments to revisit the XML syntax.

November 11, 2006

links for 2006-11-11

November 10, 2006

Still done seeking

Tim Bray compares Frameworks. The man has a gift for breaking things down to their essence. I left a comment saying more or less that Java isn't a framework, and Rails hasn't been around long enough to say one way or another whether it's maintainable, not by Tim's timelines (I help maintain systems that are older than Rails, so I'm not entirely clueless on this one). Still, it's good to see *someone* out there thinking about maintenance. On maintenance; I'll add one more criteria - if you're doing to do heavy metaprogramming or runtime monkey patching, be sure to ship with tests.

For what it's worth, I still stand by this: web frameworks reloaded.

links for 2006-11-10

moleskines and child-rearing

Baby teething at 2.00 AM solved - give them a Moleskine notebook to chew on. It must be the texture,

By the way - does anyone know where to pick up Moleskines in Dublin? Especially Cahiers. Amazon.co.uk's supply seems to be have dried up, and moleskine.co.uk are pricey. at best.

November 09, 2006

Switch Blocks

A while back, I said,

"Put it this way - if I can't get down to the Burlo to hang out in the bar with Steve Loughran, I don't have time to change OSes. "

Well, I finally got around it, and spottily documented how it went in a post. That post got picked up by Digg, as a result there's nearly 40 comments, many of which deserve a response. I'll do that as another post real soon now.

Anyway, back then , I asked what does Ubuntu have that covers these off:

  1. Feeddemon
  2. Visio (as clumsy as visio is, dia isn't at the races)
  3. Word screen split (this is what stops me from using Oo all the time)
  4. Copernic

The answers turned out to be:

  1. feeddemon: Bloglines
  2. Visio: Dual boot into windows, be accepting of reality
  3. Word split screen: Write shorter better organised documents, have the discipline to write exec summaries and rollups at the end
  4. Copernic: Beagle

Eclipse is now my Python IDE of choice on Ubuntu thanks to PyDev. Using a Java app platform to write Python apps makes me quite the wit ;). Although a few colleagues had been telling me to get onto PyDev for while. Now I wonder if XULRunner could be packaged as an OSGI plugin for Eclipse - that would be interesting.

Finally - what's up with Debian stable, shipping with Subversion 1.1.4? 1.1.4 is about 18 months old; in the meantime Ubuntu is running 1.3.x. I wanted to use viewvc in work but couldn't as it requires a higher Subversion rev than Debian stable allows for.

November 08, 2006

links for 2006-11-08

November 07, 2006

Metasoup

The problem: take XHTML fragments, parse out all the "a" tags, and check to see if their linked resources are of a certain type. If they are, derefence that content and inline it into the fragment, leaving non matching a tags alone. That ignores a raft of environmental details, like permssions, link type checking, link availbility, testing on an app server, skinning the deferenced content, speed, and so on. The difficulty: the markup fragment might not be well-formed.

My first rection was to use regexes, which meant I had two problems. I would have had to split the content into regex groups split by the links,process the links, keep a memo of which links are up for expansion, which are not, dereference the content for the expandables, inline that content, stitch it all back together and send on the output. It looked at best, complicated. My second reaction was a stream parse and intercept of the a tags, writing out embedded content where the links matched the inlinable types. I couldn't find tools in Python that will handle dodgy markup in streaming mode and write the content back out cleanly (as TagSoup does for Java).

Why not insist that the content come in well-formed? That would open up the toolchain. But that would also hurt the users, as they want to be able to preview in mid-flight, in that case being facistic about well-formedness will just makes the application frustrating to use. Well-formed markup is the end, not the means.

Soup

I wound up restating the problem - accept that the fragments would be a mess - now what?

I ended up using a library called BeautifulSoup. BeautifulSoup is Python code that will parse junk markup and give you a tree. Really it's quite something, it'll take on any old nonsense and create a HTML tree in memory. It also goes a very long way way to get your content into a decent state for Unicode.

It worked. I was eventually able to get inlined content to come out as a microformat. The lesson I (re)learned was that using BeautifulSoup, and in the past Universal Feed Parser and Tidy, makes it clear there's some economic value to be had in giving up on well-formedness in a judicious fashion.

[By the way, Effbot has announced an ElementSoup wrapper for BeautifulSoup.]

Tolerance

Engineers have a concept called tolerence. A tolerance specifies the variance in dimensions under which which a part or component can be built and still be acceptable for production use. There's all kinds of ways to state tolerence, but perfect tolerances are neither physically possible nor desirable, they are too expensive. There is a diminishing returns curve for manufacturing cost along how tight you make a tolerance. Engineers (real ones, not programmers) use tolerances to actively mange cost and risk.

Every major commercial project I have worked on, every one, has had the issue of "data tolerances" being off, where two or more systems did not line up properly. The result invariably is to fix one end, both ends, or insert a compensating layer - what mechanics call a 'shim' and what programmers call "middleware". Software projects unfortunately don't have notions of tolerance. In software we lean more towad binary and highly discrete positions on the data -"wellformed" v "illformed" "valid" v "invalid", "pass" v "fail", "your fault" v "my fault". This doesn't just happen before go live - interoperation is subject to entropy and decay - systems will drift apart over time unless they are tended to. Reality is Corrosive.

There's a political dimension to consider. If you accept you might get junk every now and then, and introduce permissible levels of error, you get to mitigate the interminable and inevitable blameslinging over who should pick up the tab because two systems data do not line up as predicted. I've seen schedule put at risk over such arguments, when the costs could just as easily been been shared.

We don't have the tools or metrics just yet for defining data tolerances as as acceptable practice, but it might happen if enough of these kinds of parse anything libraries come online, that we can come to put a dollar cost on what it is involved in insisting on having perfect markup flying about end to end versus judiciously giving up on syntactic precision.

Metasoup

The code for BeautifulSoup is worth a read, along with Tidy, TagSoup, and Universal Feed Parser. Overall, they read like bunch of error correcting codes strangling a parser.

If we assume or allow that most data on the web is syntactic junk and will always be syntactic junk, and in truth there's no reason to assume otherwise, then there is a good argument that says we'll need a layer of convertors whose purpose is to parse content no matter what. My takeway is that the Semantic Web, or anything less grandiose but essentially similar in aims, such as structured blogging, microformats, or enterprise CMSes and Wikis can embrace code like BeautifulSoup, TagSoup and Universal Feed Parser as neccessary evils.

update via James: Ian Hickson is defining how parsers should deal with invalid HTML.

In the semantic web case, I think tag soup parsers are a fundamental layer to that architecture - syntactic convertors that work just like analog-to-digital converters. They set you up for making sense of the data by actually allowing you to load it instead of dropping it on the floor and failing. Without that layer, tools like Grddl, (a way of extracting RDF out of XML/XHTML) don't get to execute at all. [by the way, there's plenty of prior art in robotics and physical agent systems for building these kinds of layered or hybrid architectures.]

Now, some people will find simply entertaining the idea of junk content a deplorable state of affairs, that will inevitably lead to some kind of syntactic event horizon, where the Web collapses under the weight of its own ill formedness. On the other hand if you allow for some garbage in and try to do something with it, you get to ship something useful today, and perhaps build something more valuable on top tomorrow. Plus we're already in a deplorable state of affairs. I find myself conflicted.

Last word to Anne Zelenka, speaking about the feed parser:

"I wouldn't call it a necessary evil, just necessary. Life is messy :)"

links for 2006-11-07

November 06, 2006

Momentum

Chapter 3 of the Django Book, "The Basics of Generating Web Pages" is available from here It's looking good so far; if the book is late it'll be down to processing all those comments (at this rate they'll end up with thousands). But check out the comments feature - nice.

November 05, 2006

XP to Ubuntu

Windows XP

I reduced the c:\ drive to 20Gb, and defragmented it. Visio and ms project are a part of life, so I'll need the windows partition for dual booting. Eventually I'll switch to a VMWare image, but a dual boot is the path of least resistance for now. I stripped the e:\ drive.

Backing up was as follows:

  • Anything in E:\home\dehora that is not already in subversion (I always keep my ~home in subversion)
  • E:\home\My Documents
  • Anything in E:\home\work that is not already in subversion
  • E:\home\thunderbird (anyone out there tried putting 6Gb of mail into Subversion yet?)
  • C:\Documents and Settings\dehora
  • ....

... and various scatted dot folders that have configurations and data (such as gaim and feeddemon). This is one the reasons I'm off windows. The filesystem is too unstructured and that has apps throwing data all over the place.

Reinstall Windows from the backup drive (this is an IBM TP, it has a partition dedicted to reinstallation). That works, and after about 5 reboots, windows is Really Fast again. And that's another reason to leave Windows, the longer you use it the slower it gets - reinstalling windows every 8 months isn't really on anymore.

Kubuntu

I have traditionally liked KDE more than Gnome. And blue is a nice color. Install Kubuntu via the live CD.

I set things up like this:

  • hda2/ - 35 Gb
  • hda5/ 10Gb '/'
  • hda6/ 20Gb '/home'
  • hda7/ 2 Gb swap
  • hda8/ 3 GB fat32 '/media/osshare

It didn't like my partitions, it only saw hda5 as 35Gb ext3. Hda5 is actually 10Gb and is the first logical partition on hda2, an extended primary, which is 35Gb. Bizarre. After about a hour of thrashing about with the disk configurations, it turns out rebooting fixes all that, and Kubuntu can now allocate the partitions.

Installed. It Just Works (tm). Brilliant. Start installing some apps. don't bring over the data yet.

Oops. Adept, the Kubuntu package manager crashes hard. The Internet says Adept is crashy but tends to come back with some work. However, removing lock files, killing processes, reconfiguring, hand cleaning the database - none of that will get apt running again. After about 3 hours, and although I've broken 2 or 3 apt databases in last year, I'm wondering about a distribution that will so casually break apt, which is reputedly solid software. The solution I found was to burn the Ubuntu iso with k3b, give up on Kubuntu, and start over.

Ubuntu

Out of curiosity I tried to confuse its partition manager as per Kubuntu, but it didn't bite. No problems during installation.

It Just Works (tm). Ubuntu is just like Kubuntu expect it's sepia, not blue, and doesn't have adept, but does have Gnome.

Over the course of the first week, I Installed a lot of packages, such as easyubuntu, subversion, meld, MyPasswordSafe, Thunderbird, Firefox ("I Can't Believe It's Not OSS"), Eclipse, IDEA, KMyMoney, Gaim, Skype, gFTP, gtkpod, mysql, and lots more. To install some things, means letting the package manager talk to server things called "verses" (universe, multiverse, geddit?).

Ubuntu in use

Ubuntu/Gnome/GTK/Linux is a fine environment, probably an ideal development environment. I didn't boot back into windows for two weeks (I had to work with some Visio files), which I think says something. The way Ubuntu lays out the screen, with a thin task bar at the top for shortcuts, menus and devices/status and a bottom context bar for opne apps, trashcan is a very good use of screen real estate. And it never slows down or degrades in performance over the course of a day, which is important for anyone's work, but especially so for developers.

The other that's great about being on a Linux is a very simple thing - symlinks. I haven't had a chance to look at Vista yet, I hear the monad shell is a huge improvement, but Windows really really needs to support symlinks. Clearly it can be done (witness Junction Link Magic).

The real upside however is that as desktop environment, Ubuntu is more or less complete. Lots of things in it and Linux just work - such as:

  • Printing.
  • Wireless (install the networking tools).
  • Dual monitor support (some config needed, but it's well-documented).
  • Automouting USB drives.
  • Samba and file sharing with windows.

My kids *love* the screensavers. My daughter asked "what's that" (Amarok), and wanted to know what games came with Ubuntu. It's easier to switch kids if they think Ubuntu is Cool, as opposed to Good.

Notable apps

The post-it notes are cool, and useful.

Eclipse/PyDev/Subclipse

In the switchover, I'm surprised about one thing above all else. In a span of less than two weeks, Eclipse+PyDev became my preferred Python enviroment. I hadn't used Pydev for a while. It's now a very impressive IDE. Previously I had been using Wing, but I've never quite gotten used to its project and file management idioms. Part of the appeal of PyDev is actually in the Subclipse plugin, which has also come on in leaps and bounds in the last few years. Eclipse/SWT itself looks well on Linux (I'd heard otherwise), and seems to be very stable.

Amarok

My colleagues rave incessently about Amarok. I can see why; it's sweet. It stores your prefs in a database, integrates really weill with the web. I mean *really* well - better than anything I've seem that isn't a browser or a feedreader. I love that it has wikipedia, last.fm and radio support built in; and who'd have thought musicbrainz could be so useful? Also it supports mutliple folders sources for your music, and behaves gracefully, as it should, when you are disconnected from your shares or USB drive. The only snafu I had was that you need a particular legacy version of libxine-main1 to play flac files with Amarok 1.4.3, and that takes some fiddling to setup. An awesome application; this is how rich client apps should be. And I hear there's .rb files in there.

Emacs

It takes ages, but you can get an anti-aliased Emacs for Ubuntu. I've been using NTEmacs for years, and not having decent typeface support would have me crawling up the walls. Most of the extensions in my ~/emacs folder worked, except for some very weird behavior with jde (I think it included my .svn folder in its configured makefile or something). Removing the jde folders and reinstalling without .svn subfolders fixed that.

Annoyances

Some minor annoyances:

  • Cut and paste doesn't work properly. I don't know what level it's failing on, but I regularly lose the last few characters of my selection. Plus sometimes paste is right-click, sometimes it's middle-click, sometimes its Ctrl-V. I assume I'll get used to this eventually.
  • Cursor jumping. The cursor jumps up one or two columns now and then. I don't know why this is, maybe some focus follows mouse thing. Happens inside most apps, but not in Emacs.
  • Hibernate/suspend doesn't work with the Thinkpad. I now have to organise myself properly by saving my work state and turning my laptop off before going home. Others might now be impressed with my new-found professionalism and structured work methods, but it's annoying to to have to serialise my work state every day (I used to go up to a fortnight without rebooting Windows, not a good idea maybe). A combination of Postit notes and Emacs buffers are saving me each day.
  • I think, but am not sure, that my anti-aliased Emacs crashes intermittently. As in poof, utterly gone. I really want to be imagining this one.
  • Nautilus: doesn't handle large numbers of files too well (I have some folders of XML and data files with between 10,000 and 50,000 items; windows will just about function; Nautilus crashes. But it's an extreme need to browse 10,000 files. You get very used to have the folder view on the left hand side in Windows explorer; not all Nautilus modes have this and folder view seems to go away depending on what you're doing; Overall it seems to prefer a browser style model where you drill up and down and rely on breadcrumb bar for context. Not sure about that, I guess I'll get used to it.
  • Gedit occassionaly stops shutdown (similar to how a Windows apps can generally stop that OS booting down). More than once I thought I had shutdown, closed the lid, gotten home and found a very hot laptop in my bag. I've been told this isn't possible, but I managed to show it a colleague a few weeks ago.
  • Installers and icons: for a number of apps I installed, I had to manually create a .desktop file, put it in the right folder, and mockup a 16x16 image file for the icon. it's not a big deal to put one of these together, but as a developer I'm not sure I fit into Ubuntu's notion of a "human being". Human beings don't install software and expect to edit config files to see a desktop icon (imagine asking people to edit .ini files in windows). In fairness, this is probably more to do with the app developers than Ubuntu itself.

Conclusion

Most of this was written back in September. After about 6 weeks of heavy use, there's nothing that has me wanting to move off Ubuntu. It's remarkably solid and well-designed, and maybe no more than 2 years away from being something anyone could use. Definitely a keeper.

The only real downside are some applications I truly miss from Windows - these are Feeddemon, Copernic and TortoiseSVN which have been supplanted by Bloglines, Beagle and Subclipse for now. Beagle and Subclipse are fine, but not having a really good client side aggregator is a pain.

November 04, 2006

Programming languages as deployed

Tiobe is a widely cited index of the popularity of software programming languages. I thought it would be interesting to record what I actually used in productionwork this year, as opposed to what I think is cool or interesting or has potential. So, here are the languages I've used in production so far this year:

  • Python
  • Java
  • Javascript
  • SQL
  • Perl5 Regex
  • XSLT
  • Ant
  • TAL/Zpt
  • CSS
  • Relax NG
  • UML/MDA
  • HTTP
  • Clisp*
  • RDF*

Are all these really programming languages? While I didn't write a list like this last year I sense the number of things I'm using that aren't normally considered "programming languages" is on the rise.

I guess some people are looking at things like HTTP and Ant and CSS and wondering whether they are really programming languages. Still, I figure that they can be included, on the basis they are either replacing or reducing the raw coding I used to do. I'm in no doubt whatsoever protocols like HTTP are languages - or at least, if you start thinking about HTTP as a language, the job of using it becomes far easier. Curiously, the Ant I wrote this year was for building a Plone based system, not for Java code. Using Ant was less effort than maintaining a hairball of a shell script or starting from scratch with Python.

This year was the first year I've used UML/MDA for generating production code, something I'll confess to being a tad ambivalent about, though in this case there was no need to roundtrip the code, which simplifies things. Some people reading this will know I have a background in RDF and am openly critical about certain aspects of it; but I'm seeing more and more value in it as an interlingua/interop technology - not for representing content but for relationships between content. For example RDF is ideal for saying that something is a translation of something else.

Surprising omissions and inclusions. On reflection, I was surprised to see no Jython. Spending most of this year with Plone/Zope probably has a lot to do with that. And this year I wrote more Javascript than I have for years, perhaps more than at any time in the past, even though I would not consider myself a JS programmer if asked. What I can say is that JS is easier to work with today than I can remember. XSLT is similar. XSLT must be Perl for XML people. I don't consider myself an XSLT person, but I seem to use controlled doses of it year in and year out.

Outside of work, my time was taken taken with - well, work mostly - this year has been hectic, to say the least. But other than that, Atom Protocol** and Django occupied most of my interest, with very small smatterings of Erlang, (J)Ruby and Eclipse/RCP.

Ongoing. It will certainly be interesting to see what the list looks like a year from now, and I think it might be useful to do this over a 5 year period, not just looking for trends, but looking to see how many of the the things one looks at for fun, self-development or pure evaluation ever make their way into production. It wouldn't be a surprise to find dissonance, or a gap between what is initially appealing or interesting, and what gets deployed in the field. One thing I will say, is that I doubt the list will become shorter. When Joe Gregorio says "My prediction is that we'll see the same thing tomorrow as we see today: a frothing stew of languages competing for niches on an ever growing software landscape", what I actually use seems to to bear that out.


* When I get stuck, I tend to drop down to RDF and Lisp to figure out data modelling and programming problems respectively. Lisp and RDF are executable whiteboards. The RDF went into production in various guises, and the CLisp stayed behind as scaffolding - maybe next year ;)

** As an aside, I'm seeing increased interest in Atom/RSS in commercial work, and most of it is coming from the business side, not technologists.

November 03, 2006

Lump it

Bug 338621 - Feed View overrides XSLT stylesheet defined in XML document

No Firefox 2.0 for me then.