Extensions v Envelopes

Here's a sample activity from the Open Social REST protocol (v0_9):

<entry xmlns="http://www.w3.org/2005/Atom">
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title>some activity</title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
        href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <content type="application/xml">
       <activity xmlns="http://ns.opensocial.org/2008/opensocial">
           <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
           <title type="html"><a href=\"foo\">some activity</a></title>
           <updated>2008-02-20T23:35:37.266Z</updated>
           <body>Some details for some activity</body>
           <bodyId>383777272</bodyId>
           <url>http://api.example.org/activity/feeds/.../af3778</url>
           <userId>example.org:34KJDCSKJN2HHF0DW20394</userId>
       </activity>

    </content>
</entry>


It's 1.1 kilobytes. I'll call that style "enveloping". Here's an alternative that doesn't embed the activity in the content and instead use the Atom Entry directly, which I'll call "extending":

<entry xmlns="http://www.w3.org/2005/Atom"
       xmlns:os="http://ns.opensocial.org/2008/opensocial>
   <id>http://example.org/activities/example.org:87ead8dead6beef/self/af3778</id>
   <title
type="html"><a href=\"foo\">some activity</a></title>
   <updated>2008-02-20T23:35:37.266Z</updated>
   <author>
      <uri>urn:guid:example.org:34KJDCSKJN2HHF0DW20394</uri>
      <name>John Smith</name>
   </author>
   <link rel="self" type="application/atom+xml"
        href="http://api.example.org/activity/feeds/.../af3778" />
   <link rel="alternate" type="application/json"
       href="http://example.org/activities/example.org:87ead8dead6beef/self/af3778" />
   <os:bodyId>383777272</os:bodyId>
   <content>Some details for some activity</content>
</entry>


It's 686 bytes (the activity XML by itself is 460 bytes). As far as I can tell there's no loss of meaning between the two. 545 bytes might not seem worth worrying about, but all that data adds up (very roughly 5.5Kb for every 10 activities, or 1/2 a Meg for every 1000), especially for mobile systems, and especially for activity data. I have a long standing belief that social activity traffic will dwarf what we've seen with blogging and eventually, email. If you're a real performance nut the latter should be faster to parse as well since the tree is flatter. The latter approach is akin to the way microformats or RDF inline into HTML, whereas the former is akin to how people use SOAP.

Ok, so that's bytes, and you might not care about the overhead. The bigger problem with using Atom as an envelope is that information gets repeated. Atom has its own required elements and is not a pure envelope format like SOAP. OpenSocial's "os:title", "os:updated", "os:id", "os:url", "os:body", "os:userId" all have corresponding Atom elements (atom:title, atom:id, atom:link, atom:content, atom:url). Actually what's really interesting is that only one new element was needed using the extension style, the "os:bodyId" (we can have an argument about os:userId, I mapped it to atom:url because the example does as well by making it a urn).  This repetition is an easy source of bugs and dissonance. The cognitive dissonance comes from having to know which "id" or "updated" to look at, but duplicated data also means fragility. What if the updated timestamps are different? Which id/updated pair should I use for sync? Which title? I'm not picking on Open Social here by the way, it's a general problem with leveraging Atom.

I suspect one reason extensions get designed like this is because the format designers have their own XML (or JSON) vocabs, and their own models, and want to preserve them. Designs are more cohesive that way. As far as I can tell, you can pluck the os:activity element right out of atom:content and discard the Atom entry with no information loss, but this begs the question - why bother using Atom at all? There are a couple of reasons. One is that Atom has in the last 4 years become a platform technology as well as a format. Syndication markup now has massive global deployment, probably second only to HTML. Trying to get your pet XML format distributed today without piggybacking on syndication is nigh on impossible. OpenSocial, OpenSearch, Activity Streams, PSHB, Atom Threading, Feed History, Salmon Protocol, OCCI, OData, GData, all use Atom as a platform as much as a format. So Atom provides reach. Another is that Atom syndicates and aggregates data. "Well, duh it's a syndication format!", you say. But if you take all the custom XML formats and mash them up all you get is syntactic meltdown. By giving up on domain specificity, aggregation gives a better approach to data distribution. This I think is why Activity Streams, OpenSearch and Open Social beat custom social netwoking formats, none of which have become a de-facto standard the way say, S3 has for storage - neither Twitter's or Facebook's API is de-facto  (although StatusNet does emulate Twitter). RDF by being syntax neutral is even better for data aggregation but that's another topic and a bit further out into the future.

So. Would it be better to extend the Atom Entry directly? We've had a few years to watch and learn from social platforms and formats being built out on Atom, and I think that direct extension, not enveloping, is the way to go. Which is to say, I'll take a DRY specification over a cohesive domain model and syntax. It does means having to explain the mapping rules and buying into Atom's (loose) domain model, but this only has to be done once in the extension specification, and it avoids all these "hosting" rules and armies of developers pulling the same data from different fields, which is begging for interop and semantic problems down the line.

I think in hindsight, some of Atom's required elements act against people mapping into Atom, namely atom:author and atom:title. Those two really show the blogging heritage of Atom rather than the design goal of a well-formed log entry. Even though author is a "Person" construct in Atom, author is a fairly specific role that might not work semantically for people (what does it mean to "author" an activity?). As for atom:title, increasingly important data like tweets, sms, events, notifications and activities just don't have titles, which means padding the atom:title with some text. The other required elements - atom:id, atom:updated are generic constructs that I see as unqualified goodness being adopted in custom formats (which is great). The atom:link too is generically useful, with one snag, it can only carry one value in the rel attribute (unlike HTML). So these are problems, but not enough to make me want to use an enveloping pattern.

Tags:

4 Comments


    Bill - Agreed with your points. I pushed DRY for the OpenSocial format but wasn't successful in promoting the idea of requiring mapping to/from Atom standard elements; it was felt to be too much work for people who are writing OpenSocial specific code anyway.

    Activity Streams is at a similar point right now; in fact there's a discussion about whether it can use the <author> element to mean the actor(s) or initiator(s) of an activity. At the moment, the spec calls for there to be a separate set of "actor" elements. Partly this seems to be because the Atom non-normative grammar suggests that things like atom:link aren't allowed inside <author>... would be good to get this clarified.

    (And, required atom:id is great except when you're doing creation operations :).


    Thanks John, I'll confess I've gone back and forth on the two options for a long time now.

    And you've remined me I have an action to get clarification around atom-on-atom extensions.


    Gone back and forth on this too. Some points in defense of the "envelope" approach (I hate this name though!):

    atom:summary -- On reading the spec, the description of atom:summary seems to be a license to plop your XML into atom:content and then use atom:summary to provide a readable description for use in feed readers and the like -- also a good rebuttal to the "your feed should look good in a feed reader" argument you often get from the extension camp.

    mime types -- you can use a mime type (something more specific than application/xml) on atom:content to describe your data format. You can also use your mime type in app:accept to declare that the feed accepts entries containing your format (best when using Joe's multipart/related extension).

    CRUD -- atom:feed, atom:entry, app:collection and rel=edit communicate that the service implements RESTful CRUD. That's an important statement, especially when combined with the mime type point above.

    Content Management System -- this use of Atom is analogous to an HTML-based content management system hosting proprietary data files (e.g. word docs). It would make sense in this sort of system for the HTML to provide meta data (author, title, category) that is repeated in the content itself as well as describe relationships between the managed documents. Why is this ok for HTML and not for Atom?

    It's a tough choice though -- neither mechanism is perfect.


    Great post -- I agree completely (including reservations/trouble spots). All things considered, very hard to beat Atom using your "extending" pattern. I agree w/ John Panzer that required atom:id is, in fact, a problem. I've had MANY situations in which I need what is effectively a "get last insert id" operation. For that, I want to simply POST "<entry/>" and get back a small atom entry with it's server-generated atom:id.