" /> Bill de hÓra: October 2004 Archives

« September 2004 | Main | November 2004 »

October 30, 2004

Execute this: managing configuration in programming languages

Sean hones in on an aspect of GMail's use of Javascript:

Web clients carry around a basic, low level programming language called Javascript. The real beauty of Javascript is that it is dynamic - you can blurr the distinction between code and data. You can hoist the level of abstraction you work with in your app by layering domain specific concepts on top of it in the form of functions and data structures. You can sling across data structures already teed up for use on the other end with the aid of the magic of "eval". You can implement complex behaviour by sending across a program to be run rather than trying to explain what you want done declaratively to the other side.
Now, in such a world - would you send XML data to and fro? Developers with a static typing programming language background might be inclined to say yes but I suspect javascriptophiles, lispers, pythoneers and rubyites are more likely to say no. Reason being, it is so much more natural to exchange lumps of code - mere text strings remember - that can be eval'ed to re-create the data structure you have in the XML. - Sean McGrath

There are more trivial but not to be underestimated uses to such an ability. Configuration is one such use. In Python/Jython, you can define your configuration files using plain code and import the configuration file for use. For example:

   minute=60
   session_timeout=minute*30

is a perfectly good and clear way of doing things - the intepreter will assign to session_timeout without the need for the programmer to write any tedious setup code (try doing that in a Java properties file). And given that more and more developer time is spent in configuration and declaration (be it setting things up or writing tools to read and write things to set up) this is good thing. Being able to define configuration in terms of functions and inbuilt data structures such as lists and dictionaries rather than string literals is not only useful and powerful, it also lowers the burden on developers by reducing the number of config languages in the system.

Of course you could do this in Java with a configuration object. But hardly anyone does this - most would consider it hard coding of data, the wrong thing to do. Perhaps this is because there is sufficient effort in compiling and packaging Java that we have always preferred to use XML or .properties files as being less work. That's worth some emphasis - it's considered less work to write XML configuration files for Java than use Java itself for configuration. Languages like Javascript, Ruby and Python having managed to avoid placing that level of obstacle in front of developers. Some might consider that a wakeup call, given how despised "configuration hell" is by Java developers. I imagine some of this argument holds for C# assemblies as well.

Another example comes from trying to distribute and manage system configurations - HP have been developing such a system, or 'fabric', under a OSS licence called SmartFrog. They considered the options, such as XML, but the language they came up looks something like a scripting one.

A significant argument against moving code over the wire and running it is security - eval() will no doubt make some people nervous. Consider this from the SmartFrog FAQ

Q: How secure is SmartFrog?
A: Without security, SmartFrog would be a near-perfect virus transmission engine! Fortunately, we have taken security seriously, and the system protects itself from malicious use using a public key infrastructure (PKI) system. Each node that participates in a SmartFrog runtime system is supplied with a certificate. Furthermore, all software components and system descriptions are signed with a certificate. The certificates are used to permit only validated nodes to participate in the runtime system, and those nodes will only manipulate components and descriptions that have been appropriately signed. Additionally, all network communication takes place using an encrypted transport. Currently there is a single-level security model, where nodes and components are either fully trusted or not trusted at all. Finer grained models of security are currently being researched.

Update: Steve Loughran pointed out in comments that SmartFrog is not a scripting language..

SmartFrog is not a scripting language.
It is a declarative language that is even less scripty than ant, because it evaluates all references in a resolution phase that doesnt depend on the order of execution of things (unlike ant's property resolution which depends upon execution order)
So what you are really doing is uploading a configuration over the wire, just one that can choreograph stuff, including shell scripts (which you can insert inline into the runsh component)

...which dents my argument a bit. Oh well :)

Others will point out that having a static type system allows for secure runtimes in ways that a dynamic type systems does not. Thankfully there has been significant research, practice and experience to draw from (ranging from Perl's strict mode, to the Java sandbox, to web browsers running code securely, even agent programming arcana such as Telescript) It's not as though data for execution is not being sent around already today. It is, but it's not normally talked about as such. Consider that a huge amount of forms based traffic going over the web is ultimately obfuscated SQL, or that you are probably downloading Javascript every day onto your computer for execution - GMail is taking this a few notches higher. Security concerns notwithstanding, there is a clear need to move behaviour over the wire in ways that are not as well-supported as we might like.

The point seem to be this: it is beneficial to lower the distinction between declarative data and code. Lisp hackers have known the value of blurring the distinction for decades; only now is it becoming an imperative to consider the idea in mainstream distributed programming.

Having discussed Java, it should be mentioned that Java's designers are aware of this benefit. In the past, Bill Joy has argued to the limits of XML by claiming that passing around static data is not enough; you always end up needing to pass behaviour around, even if it's only a stylesheet. Joy was involved in Java's Jini framework which features code mobility and then JXTA. Java itself has always allowed for mobile code, going back as far as Applets and the JVM sandbox model. Ironically it may turn out that Joy was right all along, but due to Java's compilation and packaging mechanisms lost out on flexibility.

October 28, 2004

RFC 3930

One of my favourite networking papers has been updated. Donald Eastlake has issued a new RFC for "The Protocol versus Document Points of View in Computer Protocols". Here's an excerpt:

   DOCUM: What is important are complete (digital) documents, analogous
      to pieces of paper, viewed by people.  A major concern is to be
      able to present such documents as directly as possible to a court
      or other third party.  Because what is presented to the person is
      all that is important, anything that can effect this, such as a
      "style sheet", MUST be considered part of the document.
      Sometimes it is forgotten that the "document" originates in a
      computer, may travel over, be processed in, and be stored in
      computer systems, and is viewed on a computer, and that such
      operations may involve transcoding, enveloping, or data
      reconstruction.
   PROTO: What is important are bits on the wire generated and consumed
      by well-defined computer protocol processes.  No person ever sees
      the full messages as such; it is only viewed as a whole by geeks
      when debugging, and even then they only see some translated
      visible form.  If one actually ever has to demonstrate something
      about such a message in a court or to a third party, there isn't
      any way to avoid having computer experts interpret it.  Sometimes
      it is forgotten that pieces of such messages may end up being
      included in or influencing data displayed to a person.

The rest is here: RFC 3930. Enjoy.

Progress

It seems that there is finally some consensus building on xml-dev that XML Namespaces is a problematic technology.

Here's what happened - Len Bullard asked what problems people felt XML had - namespaces came out top of the list. If you're like me, and have had a bone to pick with XML Namespaces, this is very good thing. Perhaps this will result in a better spec down the line.

October 23, 2004

ABOUT / HTTP/1.2

The discussion on the intersection between Web Services and REST continues, but a passing aside may hint at an equally important issue for Web architecture.

Vorsprung durch einfachheit

Don Box:

Had we started with a simpler basis (perhaps Relax NG + some SOAP-specific extensions), my guess is we'd be having different discussions right now.
- Don Box, Correcting MNot

The purist's rebuttal to this is that there was enough programming and networking art at the time to have known that starting with a simpler basis was the right engineering option. The paranoiac's rebuttal is that a sufficient complexity was deliberately chosen to control the rate of decay of an industry's already dying business models. Neither is true. IT evolution is slower and more gradual that most in and observing of the industry would care to acknowledge. True technical disruptions are in fact rare and most are noted as such after they have happened not during. While it may be obvious at a very high level that things can and should be simple (the uniform interface is an idea with long-standing in computing and networking circles), in the cut and thrust of systems building it's easy to lose track of this against more immediate concerns. And while there is unquestionably a revolution underway in IT business models, it beggers belief that sworn competitors preside in smoke filled rooms collectively making arrangements to fleece the world's customers through web services standards.

Objects and uniformity

Mark Baker observes that part of Don Box's sample object interface isn't required:

P.S. java.lang.Object already has Get() - it's called toString().

Perhaps, but perhaps not. Granted, arguing against the completeness of uniform interfaces because object languages don't usually have them is not a convincing argument against using uniform interfaces in protocols, in much the same way arguing against the completeness of wave functions because Newtonian physics has only a physics of billiard balls is not a convincing argument against wave functions in quantum theory. The physics are sufficiently different that the metaphors break down. As for Java, this is the uniform interface:

    public class Object
    {
      protected Object  clone();
      boolean equals(Object obj);
      protected void finalize();
      Class getClass(); 
      int hashCode(); 
      void notify();
      void notifyAll(); 
      String toString() 
      void wait(); 
      void wait(long timeout);
      void wait(long timeout, int nanos); 
    }

Now, no-one in their right mind would base any interesting Java application semantics on toString; its idiomatic use is for diagnostics and printing. HTTP GET is an entirely different beast to toString - again the physics are sufficiently different. The Javaspaces API however is closer to Don Box's intent, and represents a uniform Object interface in use today:

  public interface JavaSpace 
  {
    Lease write(Entry entry, Transaction txn, long lease);
    Entry read(Entry tmpl, Transaction txn, long timeout);
    Entry readIfExists(Entry tmpl, Transaction txn, long timeout);
    Entry take(Entry tmpl, Transaction txn, long timeout);
    Entry takeIfExists(Entry tmpl, Transaction txn, long timeout);
    EventRegistration notify(Entry tmpl, Transaction txn, 
      RemoteEventListener li, long lease, MarshalledObject handback);
    Entry snapshot(Entry e);
  }

As Patrick Logan has observed , the problem that most object languages and most protocol languages are seeking to solve is different. Protocol verbs are concerned with coordination between entities rather than the functional composition and depedency management issues most object models are concerned with. Javaspaces is representative of what a coordination protocol looks like in API form - your business object representation of a customer may be very different.

What about you?

Finally, from Mark Nottingham's original entry, an aside that hints at an important issue that has not be discussed to date:

MEX is the first spec to use WS-Transfer, and it cant help but define a GetMetadata method to go along with Get, instead of splitting things up into separate resources.
- Mark Nottingham, POST

Accessing the metadata for a Resource (the thing a URI names) is a open issue for the web architecture, one that tends to get drowned out by more colorful but inconclusive and less useful discussions around pseudo-philosophical arcana as can been witnessed currently on the W3C's Technical Architecture Group's mailing list (the TAG as it is known, is the Group di tutti Groups within the W3C). However, it's not clear that declaring a second resource as the metadata resource for another resource is a workable or desirable option. For one it's intellectually frustrating in a Goedelian sense. More importantly it bifurcates resources into those that return representions that are about themselves and those that return representions that are about things other than themselves. This possibly only really matters when machines are expected to be able to disambiguate between the two under the current architecture (for the most part, people don't have a problem functioning with such ambiguity).

Patrick Stickler of Nokia has done enough work in this area to be satisfied that a new verb (one he calls MGET) is needed to inquire of a resource's metadata on the basis that distinguishing between representations of resources and metadata about resources using the HTTP entity body mechanism is ambiguous and/or inefficient. Stickler is prone to using emotive language calling any clients that need to use two resources (or two or more HTTP operations) to figure things out "second-class citizens"; and while an established consensus around the idea of this other verb has not arisen, the technical analysis seems comprehensive enough. There are other options such as adding a qualifying header that contextualizes the representation as being representation or metadata; there are those who think it doesn't matter and the current model will do fine.

This metatdata about issue may become apparent as more code is written to direct goal-oriented activities online of behalf of users rather the interactive data/state transfer we see today - the difference between code that actively monitors Ebay and bargains on your behalf and code that sends your holiday snaps to your weblog. While there was unjustified hype around such 'agents' during the 1990s, we are arguably reaching a point where the underlying network infrastructure and data formats are approaching a level of sophistication sufficient to support highly rudimentary but long-lived problem solvers (developments in instant messaging, social network software and online games are also significant technical drivers). As the level of automation increases online, it may be that GetMetadata is indeed the optimal approach.

October 12, 2004

Writing Servlets with Jython tutorial

Sean is starting a series on how to get the most out of Jython. Part 1 is about Writing Servlets with Jython. Future articles will include using Jython with JMS. (Of course, we know all about this stuff in work already ;)

October 10, 2004

Slab! Interoperation and evolution in web architecture

Mark Nottingham on POST being special:

From the standpoint of interface semantics, the difference here is really just one between saying 'POST machineMgmtFormat' and 'MANAGEMACHINE.' In the uniform approach, the service-specific semantics are pushed down into the type (media type) and content (entity body) of the data. In the specific approach, they're surfaced in the interface itself.
This isn't a big difference.

Mark is commenting on Don Box's expectations of the insights that could result from the WS-Transfer specification. I would say the difference is substantial. The reason for this is is that determining a trade off between interoperability and evolvability in a language (or protocol) is a critical axis, particulary in computer languages. If we go back to the fundamental underpinnings of a language or protocol verb set, we arrive at speech act theory, the theory of how we communicate with other actors to achieve a goal. In crafting such acts of speech we find two basic approaches to the creation of verbs or action words. The first is that the the verb set is open to addition, but not to modification. Anyone is free to add new verbs to the language. The second is that the verb set is fixed - no additions, no modifications. The latter enhances the uniformity and interoperability of the language at the cost of evolvability. The reason this matters is simple enough - the use-value of a verb is a function of the number of clients and servers that share an understanding of it. Not quite Metcalfe's law, but along the same lines. the reason it matters even more in computer languages is also simple - computers for the most part do have have the ability to learn new verbs or bootstrap them into the language the way people using natural languages do - thus the need for precise specification.

While HTTP does allow for addition, practically speaking, the verb set is fixed. It has taken years for WebDAV additions to HTTP* to penetrate more than a fraction of the Web. Other efforts, such as HTTPR, an extension for reliable messaging, have gone nowhere. Even within the mandated verb set of HTTP itself, we find the availibility of verbs varies widely (notably PUT and DELETE) with entire eco-systems (such as mobile device clients) having only a subset. One can argue that the active verb set of HTTP comprises a subset of 3 verbs - HEAD, GET, POST - anything else is dead tongue.

The problem with HTTP POST, and what makes it special, is that it is a semantic catchall. What makes POST a uniform speech act is ironically the absence of interesting semantics and lack of specificity. Although it has specifications that are helpful to people when dealing with caches and state management, there's no controlled means of defining what one is actually saying with it, without some further and prior agreement between client and server. The reality is that POST has been overloaded and abused to get systems talking even where such systems would have done better with an alternate verb - and the result is that in many systems the POST speech act is close to meaningless. WS-Transfer aims to throw some light into this void by providing a means to add consistent meaning to operations that would often be drilled through POST. In particular this may prove valuable for use with web services toolkits which are often designed to hide the networking aspect of communications from the developer.

While they may share a common basis, WS-Transfer and HTTP are fudamentally different in mechanism and this is not a design consideration we should so easily gloss over. This is because we now have a sufficient understanding of languages are protocols designed for use in open and distributed systems to be concerned that despite WS-Transfer looking like a good thing on the drawing board it might flounder in the trenches.

Why would this happen? It comes down to how the trade off between inteoperation and evolution affects the way extensions are managed. Neither HTTP or WS-Transfer has as strong story to tell as one might expect. On the one hand, REST advocates laud HTTP's and the Web's success as an architectural triumph and will happily point to WS-Transfer and say "I told you so". They are much more circumspect in their musings of how people actually use HTTP to get things done, which is not consistent with, and sometimes directly contrary to, the REST architecture. On the other hand, promoters of WS-Transfer see it as targetting those who need something more specific than POST or who are firmly grounded and trained in middleware practices and not network protocols. WS-Transfer however does not offer a consistent means of addition - it describes a structural means for declaring new verbs, without an ability to declare what they mean. And it is difficult to see WS-Transfer stopping at the handful of verbs it defines currently. It thus may if left unfettered, act as a distributed Tower of Babel, affecting people's ability to communicate. This is compounded by the fact that it is considered good practice in object oriented middleware to be highly specific in the methods used on objects - verb uniformity is not a goal, quite the opposite in fact. In one sense, HTTP represents a more conservative and risk-averse approach to the evolution/interoperation tradeoff.

In computer jargon, this means of addition is often called 'semantics', though philosphers and mathematicians may take umbrage at that (they have a very specific definition of the term). It essentially defines how you may extend a language in terms of its existing primitives and is when successful based, universally it seems, on formal logic. There are existing technologies that strive to achieve this such as FIPA-ACL, Lisp and OWL. Each is successful in its own way (Lisp's ability to extend itself consistently has allowed it to remain current for going on 50 years), but are often considered obscure or academic by mainstream technologists and practitioners. RDF, which can be used to make logical assertions about URIs, is ideally suited for articulating the semantics of any extended WS-Transfer speech acts. This is because WS-Transfer uses URIs to name its verbs.

Where WS-Transfer may prove useful is in breaking down barriers and getting architectural factions to the table by raising awareness that there are issues with both REST and Webservices approaches. The debate between REST/Web and Webservices/Middleware communities has often been acrimonious. Aside from complications resulting from the strategic and commercial agendas imposed by the industry that have resulted in a plethora of competing and inconsistent web services 'standards', the core technical debate has been arcane. It is not always obvious to outsiders and system stakeholders why some kind of agreement can't be forged. While the architects and vendors are busy in argument, customers and practitioners are frequently left with little by the way of clear advice on how to either construct new systems or integrate existing ones. The outcome is that systems are being built, week in, week out, than cross the Web/Middleware boundary without being informed by both approaches and where they are approriate. This implies projects with excess risks and costs, wasted effort, re-learning of best practices or what is already in the state of the art. This is all the more important now that systems that incorporate web and middleware aspects are increasingly the norm (the size of the industry sector affected is significant). Any effort that raises mutual understanding is most welcome.


* WebDAV folks are often adamant that the verbs WebDAV specifies are HTTP verbs - there are no WebDAV verbs.

Update: Originally I asserted in the first sentence of the entry, that Mark Nottingham thought that WS-Transfer matters. Mark was kind enough to point out he never actually says that (see comments), so I've removed the assertion.