" /> Bill de hÓra: June 2007 Archives

« May 2007 | Main | July 2007 »

June 29, 2007

NSAI taking comments on OOXML

The NSAI are accepting advisory submissions for the their vote on Office Open XML (ISO/IEC DIS 29500).

Submissions are invited, which, to ensure that they can be considered as input to the advisory report to establish NSAI's position, should be transmitted no later than 11 July 2007 for the attention of Dr. Ian Cowan at NSAI, Glasnevin, Dublin 9 or by email at ian.cowan@nsai.ie
.

The deadline is 11 July. Antoin O Lachtnain is willing to take comments onboard.

[via Justin Mason]

June 28, 2007

links for 2007-06-28

June 26, 2007

links for 2007-06-26

June 23, 2007

Charity Link Meme

I've been tagged by John, (via Paul); the idea is to improve the page rank of charities in Ireland.

By the way there's a table quiz for To Russia with Love on Monday 2nd July in the Coachhouse, Dublin Ballinteer at 8.30pm. If you're interested contact Orla Davis (086-351-1938) for details.

Tagged:


links for 2007-06-23

June 22, 2007

links for 2007-06-22

June 19, 2007

links for 2007-06-19

June 16, 2007

links for 2007-06-16

June 15, 2007

Domain Specific Modeling

Joe Gregorio: "I frequently hear that REST can't be applied to complex situations. I also want to use the example as motivation for talking about some of the idioms that are available to handle more extensive requirements.". That's what I'm talking about.

Artificial Stupidity

Richard P. Feynman: "I realized something: he doesn't know numbers. With the abacus, you don't have to memorize a lot of arithmetic combinations; all you have to do is to learn to push the little beads up and down. You don't have to memorize 9 7=16; you just know that when you add 9, you push a ten's bead up and pull a one's bead down. So we're slower at basic arithmetic, but we know numbers."

I wonder what Feymnan would say about a Feynman in a Chinese Room, solving cube roots.

links for 2007-06-15

June 14, 2007

links for 2007-06-14

June 13, 2007

Essential Reading

Mark McKeown recently posted a superb canned history of 2PC and consensus protocols on rest-discuss. A few of us asked him if he would make a weblog post out of it. And here it is: "A brief history of Consensus, 2PC and Transaction Commit." Here's a slice:

"By this time "consensus" was the name given to the problem of getting a bunch of processors to agree a value. In an asynchronous system (where processors run at arbitrary speeds and messages can take an arbitrarily long time to travel between processors) with a perfect network (all messages are delivered, messages arrive in order and can not be duplicated) distributed consensus is impossible with just one faulty process (even just a fail-stop). The kernel of the problem is that you cannot tell the difference between a process that has stopped and one that is running very slowly, making dealing with faults in an asynchronous system almost impossible."

links for 2007-06-13

June 12, 2007

links for 2007-06-12

June 11, 2007

links for 2007-06-11

June 10, 2007

JRuby 1.0

Congrats to the jRuby team on releasing JRuby 1.0; they've put in a *ton* of work.

Comparative Architecture

Dion Almaer: "This is why, for Twitter to scale nicely, it probably makes sense to use a message bus that can scale out nicely."

David Pollak: "There's a ton of problems that are really hard to solve with an RDBMS that get solved really easily with message passing. ".

The latter comes with 884 lines of Scala code; and is definitely worth a look; it's probably not the kind of message bus you have in mind.

In my part of the world, Scala doesn't make Google results front page. Give it a month.

Limitations

Parick Logan: "Bonjour is zero config. Have you seen the Jini configuration mechanism?"

I remember whining about Jini setup a few years ago. "No wonder Jini doesn't get used", I said. I caught a lot of stick for that. It's improved since then - "Jini up in non-geological time", I said Granted, post titles like that don't win hearts and minds.

Bonjour/zeroconf seems to be a limited discovery protocol in the same way bittorrent is a limited p2p protocol. Jini clearly does more. Maybe limited is a good thing.


Social networks, web publishing and strategy tax

On Failure

Stefan Tilkov paraphrased my responses to Dare's post on the Atom Protocol as:

"Bill de hÓra acknowledges that the third is indeed missing from APP, considers the second problem a general issue with PUT, and disagrees about the first one; but he adds two more problems: update resumption and batch/multi-part uploads."

To recap, the issues Dare raised are:

  • Mismatch with data models that aren't microcontent
  • Lack of support for granular updates to fields of an item
  • Poor support for hierarchy

Stefan is a connector across a number of communities, so I'd like to qualify his reduction as follows:

  1. Atom as Joe points out, is more than an envelope, it's content. I pointed out, valuable formats - ones with media types, and not just the usual blogging suspects - are properly supported in APP. Lolcats won't be a problem.
  2. Use PATCH. More on this below.
  3. I do not think Atom is a good format for hierarchical data, but it's not clear to me that's a problem (certainly it's not a protocol level problem). You probably want to start with a placeless model as APP/Atom does and declare hierarchies and maps out of band. There are all kinds of options for this that will work within the APP constraints.

Perhaps the title of my post was misleading (that's what you get for being clever). The point wasn't to criticize some detailed observations, or suggest APP has serious problems, but rather to criticize the dual conclusions that 1) the APP has failed for some definition of "general purpose" publishing, and 2) it's necessary to roll your own publishing protocol for the reasons given. Feedback on the protocol is a good thing, but I couldn't get to those conclusions following the arguments given. It didn't take long for some people to provide workable options, and I presented some other issues to chew on (batch updates and resuming uploads).

On PATCH

I mentioned using PATCH as an option for dealing with partial updates. Matthias Ernst questioned the need for a different method:

"I don't see that need. PUT with the If-Match: header is just enough to do the work on the client side using optimistic concurrency control."

Stefan also questioned the need for PATCH:

"I’m not at all sure I like the PATCH approach, too — I’m not really keen on having to tunnel even more verbs through POST because they’re not widely supported"

update: Stefan explained to me that his concern is adding another method rather than tunneling; a valid concern. I probably wasn't clear enough on where I was going with this. First of all, PATCH is defined in RFC2608 19.6.1.1 (sort of) and arguably part of HTTP, it's not a POST tunnel (thanks to Julian for the reference). Second, what Matthias says is true for the case of multiple editors (and APP has mention of how to deal with lost updates using If-Match and friends), but this is a different problem to sending deltas - ie, you don't need partial updates to have lost updates.

The design value in using a new method to deal with delta updates is twofold.

First no matter what the format is, or the optimal algorithm/policy for merging data on the format, the PATCH method is explicit in its intent - the server is getting a change delta from the client as a function of the representation sent down to the client. With PUT you have to infer outside the method whether the server is receiving a delta or a full update. You can deal with this format by format using PUT, and APP has specifications in place for avoid the problem altogether (the atom-syntax working group felt that sending partials was overloading PUT). Joe points to the following in section 9.3:

"To avoid unintentional loss of data when editing Member Entries or Media Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup as defined in Section 6 of [RFC4287]."

But "general purpose" diff/patch is another matter, especially if people want to work at a higher level than bytes. I see no reason to disallow it in the future; the best way to do that is not redefine or muddy PUT now (or later on), but allow the protocol room to use PATCH.

Second the broader guideline I had in mind was this - whenever you have you two operations that resemble each other superficially but are semantically different and have different expected outcomes, you should consider separate and explicit definitions to avoid interop issues. It's not just about finding efficient techniques for important approaches to readers and writers like optimistic concurrency - it's about providing a uniform means of expression in the protocol design.

On Strategy Tax

Broadening things beyond direct issues with Atom Protocol for a minute, it should be clear that defining your own publishing and data access protocol, means building your own tools and platform infrastructure from top to bottom. The amount of work to do this, again for some definition of "general purpose" shouldn't be underestimated. It's much more likely in a high pressure commercial environment to produce a protocol that is highly limited and works for one platform - yours. That is you end up with less capability and yet another silo. This is analogous at the protocol level to Facebook's choosing to create markup format for users - one that says more about Facebook's current capabilities than the actual users - instead of rolling with something like FOAF. Arguably controlling of data portability is largely the point, but the overall costs of doing so shouldn't underestimated. Going custom will up the overall design and engineering dollars spent 'below the waterline'. Companies, even big ones, are resource bound so each engineering dollar spent on publishing infrastructure is a dollar not spent on a cool feature a user might care about. You want to be sure it's the right thing to do. For those integrating against such a provider you probably want to keep custom formats/protocols at the edge and convert them to open models for that internal use.

This reluctance to roll out on an open protocol is a good example of a strategy tax, where creating barriers to data allows companies building social network platforms to maximize a return on that data and all importantly, monetize the graph of social relations. This balance around open data and platform franchises is a difficult problem for social network providers, who are especially subject to moddish swings in interest or perceived coolness. They don't yet seem to have the stable revenue streams that Google has from adsense or that Ebay and Amazon have from providing marketplaces. It's surely tempting then to reduce the fluidity of user data while figuring out how to become an 800lb gorilla. However web history suggests betting on a user silo will be a short lived tactical advantage, not a strategic play a la desktop operating systems. Perhaps there are other models to lockin - people have been pointing out for years that Google has precious little lockin on the search page and it's trivial to use a different search engine - yet somehow they manage to get by.

links for 2007-06-10

June 09, 2007

APP on the Web has failed: miserably, utterly, and completely

In his post "Why GData/APP Fails as a General Purpose Editing Protocol for the Web" Dare Obasanjo says

"I thought it would be useful to describe the limitations we saw in the Atom Publishing Protocol which made it unsuitable as the data access protocol for a large class of online services. "

and provides 3 issues with Atom Protocol's. data model.

  1. Mismatch with data models that aren't microcontent
  2. Lack of support for granular updates to fields of an item
  3. Poor support for hierarchy

The post is a good read, and informative, but the title and the above quotation has something of Chicken Little about it. Let's go though Dare's 3 problems, provide some options for dealing with them, and then state 2 further problems with APP that are indeed worth thinking about.

  1. Mismatch with data models that aren't microcontent

    "I guess we could keep the existing XML format used by the Facebook REST API and treat the user documents as media resources. But in that case, we aren't really using the Atom Publishing Protocol, instead we've reinvented WebDAV. Poorly."

    Actually do treat it as a media entry; it'll work fine.

    Here's some speculation about formats. First, an awful lot of needless custom markup formats are going to be replaced by Atom entries; a good example is anything that looks like an event. Yes, some fields become pointless (atom:summary being an example I keep running into), but I'd say the problem of carrying around some junk DNA fields is outweighed by not starting over, plus you are easily integrated with the planet's syndication technology, for some definition of "free". Second, anything that looks like a bag of descriptive metadata (and.Facebook markup about users is exactly that) should be starting at RDF and working back to custom only based on real needs. The problem here is that the markup is describing more than the User, what it's describing reflects what Facebook's feature set can do. Facebook then risk going to revving the data as part of the platform*. Whereas something like FOAF would ameliorate much of that and allow people to concentrate on work that's actually valuable.

    The acid test here is whether Facebook's custom format is worthy of a media type. If it is, it probably has a reason to exist.

    [Incidentally, APP + non-Atom content strikes me as nothing like WebDAV; I'd like to hear more about that.]

  2. Lack of support for granular updates to fields of an item

    "Thus each client is responsible for ensuring that it doesn't lose any XML that was in the original atom:entry element it downloaded. The second problem is more serious and should be of concern to anyone who's read Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout. The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes."

    The solution at the protocol level is PATCH. In other words, this is not just a data problem. Using PUT to send deltas mucks about with PUT semantics in too subtle a way. The correct choice in that case is to choose a new method, not overload an existing one that has "nearby" semantics. Assuming it's really needed, it might take a few years to see proper support for PATCH - no doubt we'll see some ropey ideas rolled out in the meantime such as diff annotations in formats or method override headers.

    At the data level, Atom presents challenges; there's a minimum set of elements you need to be valid, but the truth is general purpose deltafication support across formats is a hard problem - just deltifying XML infosets alone is a hard problem. If you want to do this above the byte level, with data elements rather than offsets, again I'd say to look at RDF. Every RDF statement and collection of statements is a graph, and all its operations are closed under graphs. RDF is thus ideal for granular updates, including sending incomplete data sets in the first place.

    That said, once the client is sending a PATCH request, the intent is explicit irregardless of the format in play; that includes servers being able to say they do/don't support that instead of trashing content.

    In fact this came up this year in the atom-syntax working group as a design issue. I feel the atom working group made the right choice not trying to standardize it yet. Frankly, part of me sees this concern as somewhat Enterprisey; the kind of requirement only a WS-* standards group could care about. But if it's a real problem, I suspect it can be dealt with without running off and defining half-baked custom protocols.


  3. Poor support for hierarchy

    "The Atom data model is that it doesn't directly support nesting or hierarchies. You can have a collection of media resources or entry resources but the entry resources cannot themselves contain entry resources."

    I wouldn't say "poor" so much as non-existent. So I agree, and have banged my head against representing hierarchal data with Atom (or any RSS) in the past.

    It turns out the solution is provided by Microformats.- send a XOXO map file in the body of an Entry (or directly as a Media Entry). You can chose to inline all the data in the XOXO, provide basic metadata in description lists, or just links. There's not much point trying to force Atom Entries and Feeds to represent something they're not designed for.


All that said, I'm very happy to see real implementors provide some pushback on the Atom Protocol for their needs. However going on to claim GData/APP has failed is random enough conclusion, especially for the problems mentioned, which in one case ,is a deliberate design exclusion (for now). If these are the most serious problems encountered inside MSFT, it strikes me that APP's overall design is in good shape. Given the level of thought and discussion he indicates seems to have gone on inside MSFT, I'm surprised Dare didn't mention these two issues, which strike me as much more substantial:

  1. Update resumption: some clients need the ability to be able to upload data in segments. Aside from a poor user experience and general bandwidth costs, this is important for certain billing models; otherwise consumers have to pay on every failed attempt to upload a fote. APP doesn't state support for this at all; it might be doable using HTTP more generally, but to get decent client support you'd want it documented in an RFC at least.
  2. Batch and multi-part uploads: This feature was considered and let go by the atom-syntax working group. The reason was that processing around batching (aka "boxcarring") can get surprisingly complicated. That is, it's deceptively simple to just say "send a bunch of entries". Still, it would be good to look at this at some point in th future.


* I'd like to think inventing a custom format to describe a user that is lockstepped to a platform was part of a platform play, or even technical resistance because of using databases for storing arbitrary graphs - anything really. More likely it was lack of knowledge/research and/or fud about RDF. Oh well, at least we know what's down that raod.

The title is taken from Mark Pilgrim's article "XML on the web has failed"

Got Game

Mateia Andrei:


In some of the military software projects, what we see is predominance of the career and corporate-enhancing infinite games. It is quite clear that delivery of the software is a secondary concern, and growing the company, growing personal influence, or growing the career is what is many people's minds. The logic of the funny contractor behavior doesn't make sense until you realize they are playing a different game, in which different moves are called for. Then it suddenly all makes sense - even if you don't like it.

Subscribed. Explicit game selection would make for a fascinating kick-off or strategy planning meeting - "here's the kind of games we're going to play".

links for 2007-06-09

June 08, 2007

IPC

About Erlang:

"1. Processes have 'share nothing' semantics. This is obvious since they are imagined to run on physically separated machines.
2. Message passing is the only way to pass data between processes. Again since nothing is shared this is the only means possible to exchange data.
3. Isolation implies that message passing is asynchronous. If process communication is synchronous then a software error in the receiver of a message could indefinitely block the sender of the message destroying the property of isolation.
4. Since nothing is shared, everything necessary to perform a distributed computation must be copied. Since nothing is shared, and the only way to communicate between processes is by message passing, then we will never know if our messages arrive (remember we said that message passing is inherently unreliable.) The only way to know if a message has been correctly sent is to send a confirmation message back"

Given recent interest in anything (anything!) that can cope with multicore, it seemed to appropriate to re-quote.that gem from Joe Armstrong.

June 07, 2007

links for 2007-06-07

June 06, 2007

links for 2007-06-06

June 05, 2007

Mobility

Starting today I'm working at Newbay. The idea of building out systems that reach many millions of people, on phones, is as compelling as the way Newbay are going about it. It's an exciting and vibrant space; not just in the way it allows people to connect, but also the engineering challenges involved in providing the platforms that allow the scaling of systems to achieve robust performance and availability.

To former colleagues at Propylon; many thanks, I'm proud to have contributed there for going on the last five years and I wish you every success.

QOTD

Brian McCallister: "To set the record straight, we own Apache."

June 04, 2007

Is Google using Zope/Plone?

Limi: "The answer is yes, but I can't tell you where or for what - yet."

RDFa and QNames in content

From Evan Prodromou

"I think that the smart money is on microformats, but having RDFa become part of HTML 2 makes microformats.org's future seem kind of like a cul-de-sac."

Here's another take - RDFa becoming part of XHTML2 makes XHTML2 less likely to see wide deployment. Here's some RDFa from Evan:

<div class="vcard" xmlns:v="http://www.w3.org/2001/vcard-rdf/3.0#"
         about="http://evan.prodromou.name/">
      <img src="http://evan.prodromou.name/images/Evan48.jpg" alt="photo" property="v:photo" />
      <a property="v:FN" href="http://evan.prodromou.name/">Evan Prodromou</a>

See those values, like "v:photo", and "v:FN"? They're known as "QNames in content" and technically they have no merit. Zip. From the W3C TAG Finding:

"In so far as the identification mechanism of the Web is the URI and QNames are not URIs, it is a mistake to use a QName for identification when a URI would serve."

Given the history of QNames in content at the W3C, to see them designed in like this is quite something. Given it's an obviously bad idea and violates a TAG directive, it's difficult to see this making it to Recommendation.

Python Pain Points

"What Are Your Python Pain Points, Really?"

Not that many:

  • str v unicode
  • HTTP libraries - the best one isn't in the stdlib.
  • Method signatures - if optional types ever landed, I'd use them in method signatures.
  • self - more of an irritant than actually painful.
  • No multiline lambdas -cue standard whining about how Ruby has closures.
  • No SMP/core support - this is not the same thing as wanting threads.
  • Jython being behind Python - you have to be careful about targeting versions.

A few years ago I would have put IDEs up there, but now there's PyDev.

Surface Tension

"This is NOT REST!".

Unsubscribed.


June 01, 2007

Threads plus shared mutable state

Last year, I blathered:

"For the set of credible commercial languages Java is the winner if you are going to do threads. But it's a stretch to say its shared memory model is the right approach to concurrent programming altogether. Languges like Erlang and Mozart, are arguably better"

Insights gleaned when reading CTM figured into that assertion. So who better than one of the authors, Peter Van Der Roy, to explain things while comenting on the assertion that "for concurrent programming to become mainstream, we must discard threads as a programming model.":

"I assume that by "threads" the authors mean "shared-state concurrency". Their statement is right if we assume that a mainstream language must be stateful. But if not, then there is another way to solve the problem. The real problem is not threads as such; it is threads plus shared mutable state. To solve this problem, it's not necessary to throw away threads. It is sufficient to disallow mutable state shared between threads (mutable state local to one thread is still allowed). "

In Java, shared mutable state would be the memory heap. It seems that the coordination mechanism is the issue rather than the number of actors.