" /> Bill de hÓra: April 2007 Archives

« March 2007 | Main | May 2007 »

April 28, 2007

Episode IV

Charlie Savage:

"Its a no-brainer to see that Atom will blow away the WS-DeathStar (which is a mocking tribute to the mountains of specifications that IBM, Microsoft and others have generated for defining web services)."

Happy to see the meme spread.

April 25, 2007

links for 2007-04-25

April 24, 2007

links for 2007-04-24

April 21, 2007

links for 2007-04-21

April 20, 2007

links for 2007-04-20

April 19, 2007

Patched svndumpfilter2

update: Simon Tatham has applied a better patch that deals with quotes in paths, as well as whitespace (quotes in svn paths - who knew!?). It's available as rev r7468 of svndumpfilter2.

Sometimes you want to export part of a Subversion repository. leaving the rest behind while keeping the repository history and metadata. The tool for this job is svndumpfilter which operates on Subversion dumpfiles. But svndumpfilter has a serious flaw - if a file or path was copied from a path you're filtering out to one you're filtering in, svndumpfilter won't be able to fill out the history and the job will fail. Simon Tatham's svndumpflter2 cleverly fixes this by looking up paths against the source repository the dumpfile was taken from, using svnlook.

In turn svndumpflter2 has a tiny bug; if a repository path being checked has white space in its name*, and is passed to svnlook as is, svnlook will only read up to the first whitespace, which results in a "path not found" error. A simple fix - placing all arguments to svnlook in the script inside quotes does the trick. As is often the case most of the work was running down why svndumpfilter doesn't work with copies, why svndumpfilter2 was reporting bad paths to begin with, and documenting what was done (this post). Otherwise svndumpfilter2 comes highly recommended - the repository I was working against is on the large side; "du -sh" on its repo folder comes in at 2.5Gb (the checkout is much bigger), with nearly 20,000 commits.

You can get a patched file here - svndumpfilter2. Alternatively "hg clone" the mercurial repository from http://www.dehora.net/hg/tools/ **.




* for this and many other reasons, avoiding whitespace in file names tends to be a good policy.

** Students of irony are welcome to savour the notion of keeping a subversion tool inside a mercurial repository.


April 17, 2007

links for 2007-04-17

April 15, 2007

Java Archive Network

Elliotte Rusty-Harold: "What's the real core of Java? The pieces you can't do without? java.lang, a few pieces of java.io and java.util "

I'm not Sun, but this is the kind of question I'd lob to Ian Murdock, now that he works there. As much as Debian's sluggish process annoys me, its packaging system represents the state of the art.

"Some sort of centralized system or user repository would hold different versions of various libraries. Maven already comes very close to this, and JSR-277 may go further."

Java could do with something like CPAN or Gems. As for JSR-277, I suspect if you hooked OSGi packaging into an Atom Protocol based distribution and publishing mechanism, threw in signing, avoided transitive dependencies, you'd have a technical basis for an archive network. But wow, not Maven.


April 13, 2007

Ultimate

I installed Ubuntu Server on an old P3 tonight. In 20 minutes, I had a LAMP server. This is the best OS installation experience I've had, bar none.

links for 2007-04-13

April 12, 2007

links for 2007-04-12

April 11, 2007

links for 2007-04-11

April 10, 2007

412 precondition failed and subversion/mercurial commits

if you are receiving a "412 Precondition Failed" message when checking into Subversion or pushing to Mercurial (I get this on TextDrive from time to time), it's probably a mod_security protections objecting to your content. Two possible solutions:

  • Change the content - sometimes changing the text of a commit comment will let the checkin through.
  • Disable security filtering using mod_security. To disable it for your repository set "SecFilterEngine Off" in a .htaccess file.

Test First

"These restrictions are totally unacceptable to us." - the ASF are looking for an alternative asking Sun to honour their contractual requirements for the Java SE test kit. One to watch.

update: see Steve Loughran's comment below

links for 2007-04-10

April 09, 2007

Dumb and Dumber

Dumb: I bought two books recently. One was called "Information Retrieval: Algorithms and Heuristics" and I was reading it over the weekend. Or trying to. By the time I got to the material on Bayesian, I was getting lost. However this is basic stuff - I knew it in college. This follows on from a realization not long past, that to get through "Modern Heuristics", I'm going to need a math refresher. Also I still haven't fully graped this metaprogramming example in JavaScript. Not good.

Dumber: Also at the weekend, I spent more time than I would like getting Textdrive, Mercurial and Trac to play nice - this is the infrastructure for a Django based weblog I'm writing. I reckon we waste who knows how many millions of hours a year on silly configuration matters, but it took me too long to set this up (you instinctively tend to know when you're the problem). Trac and Mercurial are written in Python. I might get to file some tickets and a few test cases to complete for next weekend, before getting back into work and wrapping up Atom Protocol. By comparison look at what Peter Norvig did for grins in Python recently.In Python. On a plane. Not good either.

So, Pete Lacey's kind observations notwithstanding, I sure don't feel smart at the moment.

Mercurial, Part III: running Trac with Mercurial on TextDrive

This post is a (very) terse description of how I set things up Trac + Mercurial integration on my TXD account, based loosely on TextDrive's own instructions for setting up Trac, instructions on the TracPlugin page and "Non-root Trac installation on Textdrive", which didn't quite work for me, as the CGI wouldn't load the Mercurial .egg plugin, but it sure got me started. You can see a previous post as to why you need to hold your own copy of Trac.

Install Trac 10.3.x locally:

    #mkdir ~/local
    # cd ~local
    # wget http://ftp.edgewall.com/pub/trac/trac-0.10.3.1.tar.gz
    # tar xvzf trac-0.10.3.1.tar.gz
    # mv trac-0.10.3.1 trac-0.10.3.1-src
    # cd  trac-0.10.3.1-src
    # python setup.py install --prefix=/users/home/$yourname/local/trac-0.10.3.1
    # cd ..; ln -s trac-0.10.3.1 trac 
    # rm -rf   trac-0.10.3.1-src; rm trac-0.10.3.1.tar.gz

Have your account pickup your copy of 10.3.x trac instead of TXD's 0.9.x version:

    # nano ~./profile
    PYTHONPATH=${HOME}/local/trac/lib/python2.4/site-packages
    #export PYTHONPATH=${HOME}/local/trac/lib/python2.4/site-packages

You must set the PYTHONPATH - the tracd and trac-admin scripts are lightweight wrappers that use Python's import mechanism, not some fixed path, so Python's sys.path must be set to not point at TXD's version - don't waste half and hour figuring this out like I did. At this point you should be able to open a Python shell, "import trac", "print trac.__version__" and see a version number that is 10.3.something.

Get the Mercurial plugin for Trac:

    # mkdir ~/tmp; cd ~/tmp
    # svn co http://svn.edgewall.com/repos/trac/sandbox/mercurial-plugin
    # cd mercurial-plugin
    # python setup.py bdist_egg

That will put an .egg in ~/tmp/mercurial-plugin/dist (see TracMercurial for more details).

Create a trac site as per TXD's instructions, but use your copy of trac-admin:

    # mkdir ~/trac
    # ~/local/trac/bin/trac-admin ~/trac/$sampletrac initenv
    for the repository_type type 'hg'
    for the repository_path, type the local path to your mercurial repo 

Copy the .egg plugin to the trac project, it doesn't need to be globally installed:

    # cp ~/tmp/mercurial-plugin/dist/*.egg ~/trac/$sampletrac/plugins

If you had an exisiting Trac install, see TracMercurial for details on how to convert it.

Now, start tracd as per TXD's instructions and using the port they gave you, but use your copy of tracd instead:

 ~/local/trac/bin/tracd  -d -p XXXX \
--auth $sampletrac,/home/$username/etc/trac.digest.passwd,$your.tld \
/home/$username/trac/$sampletrac

Post script: a question on multiple projects under Trac

update: Bill Mill explains how to do it: "Just list each project consecutively on the tracd command line." Doh!

Trac looks nice, but can it support multiple projects under a single port? Mercurial branches tend to be cloned standalone repositories, so this will be needed.


See also:

Dependency

Need a better weblog. Choose to write your own. Need a VCS. Choose Mercurial. Need a bugtracker. Choose Trac. Trac depends on Subversion. Need a Plugin. Choose TracMercurial. Chosen Textdrive already. Textdrive chose Trac 0.9.5. TracMercurial needs Trac 0.10. Choose own installation of Trac.

Get nothing done for 2.5 hours.

I think it's true to say, dependencies can't be abstracted away;
and anyone that says otherwise, is selling notions quite unwise

links for 2007-04-09

Say again?

Update:: InfoQ have a new leader up, XUL: What the web should look like?. Nice going!

I mostly like InfoQ, but the how can you have a leader titled "Is XML the Future of UI Development?", and not mention XUL? In the 'XML as UI language' category, XUL was one of the first. It must be half a decade old now.

April 08, 2007

Bzzt Questions

"Ech, state variables stink. Here’s a better python solution"

for i in range(1,100):

You can stop there. It's a fencepost error as was the original loop responded to.

For anyone asking what I call "bzzt questions" in interviews, do me two favours. First, read the chapter in programming pearls that talks about binary sorting. Second, read the classic from Dave Pickett about implementing a file copy function. Sometimes basic programming tasks are deceptively hard to get right.

What I'd really want are filters than tell me how it will take someone to step through a 50kloc code base to find a bug, or if they will actually do that. Whether they will accept counter-intuitive design advice from seniors on a team (since learning by experience is the lowest form of learning). Whether they have a tendency to cut and paste code (it can't just be changed). Whether they even known what a fencepost error is (the first step is knowing you have a problem). Whether they can detach emotionally from a problem to get unstuck ("I know it can't be that" == it's probably that). Whether they know why shipping against trunk is boneheaded (which trunk?). Whether they know how long the work will take (knowing what it takes to deliver code?). Whether they panic when presented with accountability. And so on.

I suspect the only real answer is a probationary period, and bzzt questions are possibly a premature optimization for that.

Oh and one other thing. If you think being a manager, a veep, or a director-of necessarily means dumbing down and leaving mere coding behind, please read what Peter Norvig writes in planes.

Data Parallel

More from Duncan Cragg: "It's scalable because of all the reasons I mentioned before: the cacheability of the basic data operations and their parallelisability through partitioning."

So that's partitioning of operations dealt with, and yes it's a huge feature - now, what about data? Recently I said in a comment on Joe Gregorio's blog* that RDF can be partitioned "N ways to Sunday". RDF people don't talk up this property enough, it goes overlooked.

Over there, Pete Kirkham points out that, "the problem with triples is you then have to do joins to create objects out of them". I'd go further and say the real problem is grouping them into tidy 'domain models' that OO devs and gurus like Eric Evans and Martin Fowler insist are a good thing. But once you partition RDBMS backed data (as most big web based systems end up doing, especially on their user accounts), you have to do the first bit (distributed joins) anyway. It seems that GOOG and EBAY have decided to accept this as a physical design constraint and thus are keeping data integrity constraints in the applications, replace the RDBMS with raw storage, however barmy that sounds to those of us working at smaller scales.

At that point, perhaps it becomes worth considering whether you actually need what an RDBMS gives you anymore, or whether you need a dumb store, a la BigTable. Perhaps the RDF guys should stop figuring out how to solve the RDBMS-Triple Impedance Mismatch Problem and start looking at alternative storage like Hadoop. Most RDF systems using relational databases are using them as dumb stores anyway, or at least they were last time I looked.

It also occurs to me I should do two things 1) review the current RDF toolsets, as it's been at least 2 years, 2) really write down what I like about RDF, as opposed to picking at its flaws, which I'm too prone to doing.




* Of late Joe is really starting to "open his shoulders", as we say in parts of Ireland. If you're not subscribed, do so.

Always Be Closing

Duncan Cragg: "It's the same as in the real world: as long as it all settles in the end and the rules are followed. The ResponseToBestOffer cites what state of the Offer it is accepting. If that changes for any reason, the ResponseToBestOffer is void."

Duncan explains how eventual consistency might work in a REST based system. Technically it's fine. The hard part is persuading people the sky won't fall in if the transactions work funny.

Matchstick Men

Does anyone care?


Tim Bray:
"This happens over and over. New WS-* spec submission, check. Insanely huge charter locking down the conclusion and ensuring a rubber-stamp outcome, check. Loads of dependencies on WS-standards, WS-drafts, WS-submissions, and other WS-handwaving, check. Resolute obliviousness to other technologies that address the same problem, check."

Burton Group: "The WSFED charter gives lip service to working on convergence with SAML 2.0. Like other commenters, we find this less than convincing; the WSFED charter's invitation to other standards committees looks like a passive-aggressive maneuver. It puts the onus on SAML 2.0, which has already been standardized, to come to WSFED on their terms and make changes to an established standard to accommodate features of a specification which was not developed in an open forum and is not yet a standard."

Eve Maler: "UPDATE: The telecon was held this morning. TC convener Paul Cotton responded to the collected comments by reading from a prepared text that gave the same answer 30 times: “Proposed response: no changes to the WSFED TC charter are required.” The sole exception was to accept the comment noting extraneous characters. Message received loud and clear"

Paul Madsen: "No change is required"

I read what Tim, Eve, Paul and The Burton Group had to say. The criticisms lacked bite. I found myself strangely unmoved, unsurprised, unshocked, unconcerned. I saw that a firestorm has not been lit across weblogs, as would have been the case not even a year ago. It seems that no-one cares anymore, and WSFED will be consigned to irrelevance and along with it, much of the promotion around WS-*. WS-* as a process, as a technical means designing systems , as a way to generate 'future business value' now lacks credibility. This has less to the do with the technology involved and more to do with how the technology has be presented to the market, and consequently how it has evolved.

The Business of IT is Business

This apathy is bad news for the handful of vendors and OSS communities who are at least trying to get something done with WS-*, instead of managing incumbent revenue streams via standardisation. It's bad news for those technologists, consultants and analysts who promoted WS-* years ago, and now have to quietly disassociate themselves or reframe the past as a great learning. It's bad news for those with deployed WS-* systems, who might be facing yet another re-architecting exercise in the coming years.

The lessons to be learned from the heavy-handed promotion of WS-* are twofold.

First, both enterprise software and services organisations need to rein in their marketing and sales divisions, as strange as that might sound. In essence, they need to stop promising miracles. What has happened with WS-* promotion, and what is happening with SOA is bad for the industry, bad for shareholder value. Customers will come to reject the vendor/analyst/consultant triumvirate if it comes to appear to be nothing more than a racket. In effect, that would be a rejection of the entire market. This helps no-one, least of all customers, dependent as they are on software and related services. More realistic approaches to the market need to be found - "rip and replace" of IT assets isn't a sustainable model (ironically WS-* in the beginning was about avoiding such expense).

Second, and more important, one cannot cleave technology from business and expect good results in technological matters. This has afflicted the evolution of WS-* for years. There has been much talk since the dotbomb collapse about alignment and governance, yet what seems to have happened is that technology and delivery aspects have been given short shrift. In the meantime business people make uninformed technology bets that have to be honored with vigorish later by IT departments and project teams. The notion that the "business of IT is business", has been transformed into "IT doesn't matter", with the consequence that the valid concerns of IT people are not heeded.

IT is Business, Business is IT.

However good the slogan the "business of IT is business" might have sounded after the dotcom bubble, the gap has in fact widened. Critically the upkeep and maintenance of legacy systems has come to dominate business software spending. Most large enterprise IT divisions now have the equivalent of a pensions fund crisis, except that all the money is being spent on old systems instead of old people.

In software projects, the devil is truly in the details. IT projects tend flounder not due to big picture issues. They fail due to the details of delivery, which leads to gross cost under-estimations and to project death spirals. Getting into details "at another date", one which is always deferred, cannot be therefor considered a sound approach to project risk. Nor can the diversion of funds to new grand projects based on new architectural precepts away from upkeep and modernisation of existing systems that literally "run the enterprise".

By the same token, process models that encourage strong separation of software and business functions are arguably broken - just why can't your business analysts make initial assessments of the technical costs instead of drawing matchstick men? Why is it that VPs, well able to understand complex matters like logistics, options theory and even spreadsheet programming, get a pass when it comes to something conceptually simple like their intranet or email systems? The result is further cost and inefficiency as requirements and needs are transliterated back and forth between competing specialisations. That WS-* was pitched as an abstraction, as a way to not have to care about technical details has not helped.

What's next for IT?

Assaf Arkin correctly observes that REST is now the "cool by association" technology. That will be interesting - REST is technically grounded and puports to describe the as-is architecture of the Web. The grassroots that promote it and build in that style have made it clear they have no truck with the marketing spiel that currently surrounds WS-* and SOA. Indeed the growth and promotion of REST and Internet style has been done in sharp counterpoint to WS-* technologies. Expect a lot of people to get grilled, if not flamed, as they try and repurpose the REST label. Yet however curmudgeonly REST proponents like to act, some dilution seems inevitable, as has been the case with with business adoption of open source (both its software and its processes). And do not be surprised to see specific WS-* technologies and ideas with technical merit, such as SAML and payload encryption, make an appearance while the process that generated them is discarded.

links for 2007-04-08

Vocabulary Design and Integration

Vocabularies

There are two schools of thought on vocabulary design. The first says you should always reuse terms from existing vocabularies if you have them. The second says you should always create your own terms when given the chance.

The problem with the first is your are beholden to someone's else's sensibilities should they change the meaning of terms from under you (if you think the meaning of terms are fixed, there are safer games for you to play than vocabulary design). The problem with the second is term proliferation, which leads to a requirement for data integration between systems (if you think defining the meaning of terms is not coveted, there are again safer games for you to play than vocabulary design).

What's good about the first approach is macroscopic - there are less terms on the whole. What's good about the second approach is microscopic - terms have local stability and coherency. Both of these approaches are wrong insofar as neither represents a complete solution. They also transcend technology issues, such as arguments over RDF versus XML. And at differing rates, they will produce a need to integrate vocabularies.

XML

XML doesn't do anything interesting for integration by itself - you need the transformations. The upside of the transformation approach is that it deals well with the psychology of term ownership - wanting to control the meaning of a word is almost instinctive - that lends itself to vocabulary design approach of term creation. The notion of vocabulary is introduced in XML via namespaces and schema languages.

The downside is that you will have to write the transformations, and test that the transformations do what you intended in terms of the data. Once you have a transformation between two formats it serves as an implicit specification of the canonical form of the the two formats, although that could give some formalists cause for indigestion. "It's ok, we have regression tests", offers limited comfort to said formalists.

RDF

Unfortunately, the RDF approach is often mischaracterised so let's try and rectify that. The key to understanding RDF lies in what is meant by the term "data model". The term needs calling out because the RDF meaning isn't the same as the (more commonly used) meaning in IT and software circles. In the RDF, the data model implies a formal mathematical underpinning, literally "a model of data"*.

While it's hard to discern what others mean by "data model" outside the technical definition used by RDF, the point is that RDF does not work in terms of local canonical agreements for a problem space, ie the domains of discourse for vocabularies. It works by defining a canonical semantics for all data, represented as graph structures. Thus you're welcome to represent some class of thing, say employee details, or some domain, say patient records, in any number of variant** ways in RDF, but they'll all share the data model. Whereas in XML the data models are arbitrary and typically unknown - a declaration is made the markup and schemata are about some domain and the programmers are expected to get on with it.

OWL

OWL also has a formal data model - arguably is has 3 such models, each more powerful than RDF's, and all somewhat tenuously linked to RDF via the notion of a class. RDF/OWL will allow you make statements about the relative likeness of things that you would otherwise state imperatively using a programming language. To manage differing vocabularies, you'd use constructs such as sameAs from OWL that allow you say that one thing relates to another in some way - indeed sameAs is probably the best known relation of this kind.

The main value of this approach is easy warehousing and data linking. Transformation code is replaced with declarations of term equivalence. While OWL can go further, and express notions other than term equivalence (such as classhood), how it manages term mapping is of most interest to integrators.

Notions of Vocabulary

This produces a counterintuitive result - RDF's and OWL's notion of "vocabulary" is very weak compared to XML's, and arguably it doesn't exist it all. It's unsual because RDF is more strongly associated with heavyweight vocabulary design approaches such as taxomonies and ontologies. What RDF has are groups of terms that happen to managed by differing communities, and how terms relate is governed by a uniform semantics and processing model. All the focus is how terms can relate globally, not on how they are modularised and organized for a domain. Thus it's common to see formats that reuse some or more terms from other vocabularies.

XML based vocabularies on the other hand exhibit wide variation in processing and semantics, often this is seen as a feature of using markup. XML documents are also isolated despite the shared syntax; the number of XML formats that mix and match vocabularies is small and reuse is infrequent; perhaps the most notable counter-example is the Open Office file format, now standardised as ODF which re-uses other specialised vocabularies such as XHTML and SVG.

The Atom format allows and encourages the use 'foreign markup' from non-Atom namespaces, which is a more flexible approach than previous XML standards. While we should not read too much into the naming of things, 'foreign markup' betrays a definite bias to vocabulary integration, never minding that a notion such as "foreign RDF" wouldn't make any sense ***.

Economics

The reason why "RDF v XML" or "XML v Microformats" doesn't get to why transforms are more widely adopted than inference as an integration technique has nothing to do with the relative technical value of the approaches - clearly you can various approach to handle vocabularies and data integration. The reasons are primarily economic and there are two such factors worth considering. First, a transform is the shortest critical path to integrating any two formats and most people typically only have to care about two formats at a given time; indeed on many projects teams won't have the time scope or budget to consider broader concerns. That the individual case is almost always optimized at the expense of the general case on a project should be no secret. Second, a transformation will be most familiar to integrators, in terms of approach, figuring out the risks, available toolchains, and costs. It is integrators who who are typically tasked with this work, the majority of which is actually better understood as data migration and not unification. Irrespective of whether a non-transform approach might in principle produce greater overall value, the transformation approach will tend to have more predictable local outcomes.


* Having a data model is valuable in terms of understanding formal properties and expressive power, but most people can and do get away without caring for the details day to day in much the same way the working programmer isn't overly focused on Turing machines or Relational Algebra.

** note that variance here also includes syntax

*** Incidentally, the Atom Working Group's consensus was that the second approach, term creation, was the lesser of two weevils.

April 07, 2007

links for 2007-04-07

April 06, 2007

Mercurial, Part II: setting up Mercurial on TextDrive

The wonderful people at TextDrive don't support Mercurial centrally, but since Mercurial is a Python app, you can set it up locally on your account. If you're like me, you'll also want to be able to push and pull changes over HTTP for multiple repositories using the hgwebdir.cgi script. The rest of this post is a (very) terse description of how I set things up on my TXD account, based on the publishing instructions in Mercurial Wiki.

Install mercurial in your home folder:

    #mkdir ~/local; mkdir ~/local/mercurial
    # cd ~local
    # wget http://www.selenic.com/mercurial/release/mercurial-0.9.3.tar.gz 
    # tar xvzf mercurial-0.9.3.tar.gz 
    # cd  ~/local/mercurial-0.9.3
    # python setup.py install --home=~/local/mercurial
    # nano ~/.profile
    PYTHONPATH=${HOME}/local/mercurial/lib/python
    PATH=${HOME}/local/mercurial/bin:$PATH 
    #export PYTHONPATH=${HOME}/local/mercurial/lib/python
    #export PATH=${HOME}/local/mercurial/bin:$PATH 

Create a base configuration file:

    # touch ~/.hgrc
    # nano ~/.hgrc
    [ui]
    username = your name 
    

Check your setup:

    # hg debuginstall
    Checking encoding (US-ASCII)...
    Checking extensions...
    Checking templates...
    Checking patch...
    Checking merge helper...
    Checking commit editor...
    Checking username...
    No problems detected

Make a public repository area, and serve it:

# mkdir ~/web/public/hg
# mkdir ~/web/public/hg/repos

~/web/public/hg/repos is where you will create your public mercurial repositories. Note that symlinking into here doesn't work, they have to be housed. To serve it out

    # cp  ~/local/mercurial-0.9.3/hgwebdir.cgi  ~/web/public/hg
    # chmod 755 ~/web/public/hg/hgwebdir.cgi
    # nano ~/web/public/hg/hgwebdir.cgi
    import sys
    sys.path.insert(0, "/users/home/$youraccountname/local/mercurial/lib/python")

Now tell the cgi where the repos are (for example suppose we had created a repo called 'weblog'):

    # nano ~/web/public/hg/hgweb.config 
    [paths]
    weblog = repos/weblog

Create a user account for pushing changes:

    # mkdir etc
    # htpasswd -c ~/etc/hgpasswd $mercurialname

Configure apache access to the repositories:

    # nano ~/web/public/hg/.htaccess 

    Options +ExecCGI
    RewriteEngine On
    RewriteBase /hg
    RewriteRule ^$ hgwebdir.cgi  [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule (.*) hgwebdir.cgi/$1  [QSA,L]
    AuthUserFile /users/home/$youraccountname/etc/hgpasswd
    AuthGroupFile /dev/null
    AuthName "My Repository"
    AuthType Basic
    <Limit POST PUT>
    Require valid-user
    </Limit>

the above will allow anyone to browse

    http://$yourdomain/hg
and see all the repositories under:
    ~/web/public/hg/repos

The Auth* directives mean commits are restricted to authenticated users, but anyone can browse (if you want to restrict browsing add GET to the methods in <Limit>. The rewrite rules are explained in the mercurial wiki.

For each repository under repos you'll need to add the following to its .hg/hgrc file:

    [web]
    push_ssl = false
    allow_push = $mercurialname

where "$mercurialname " matches what you added to hgpasswd earlier. This isn't secure - mercurial by default does not allow push over http, with good reason, you have to disable via push_ssl. If you can get a https setup running in TextDrive, you should do so (and tell me what you did ;).

To pull a repository down via HTTP use "hg clone":

    # hg clone http://$yourdomain/hg/weblog weblog

To commit (you'll be challenged for auth details):

    # cd weblog
    do work
    # hg ci -m "my changes"
    # hg push http://$yourdomain/hg/weblog
    pushing to http://$yourdomain/hg/weblog
    searching for changes
    http authorization required
    realm: My Repository
    user: $mercurialname
    password: 
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 1 changes to 1 files

See also: Mercurial, Part I, first impressions.

links for 2007-04-06

April 05, 2007

Me too

"Almost everything I've ever downloaded that used maven for its build process, didn't build. "

A periodic table of the elements

Spot the carbon

No CSS

To know more about why styles are disabled on this website visit the Annual CSS Naked Day website for more information.

| | Comments (0) | TrackBacks (0)

April 04, 2007

Step 3

James Pasley: "If you’re ever tempted to put a retry loop into your REST client just in case the HTTP connection is refused, then you need to face up to the need for SOAP (with WS-RM of course)."

Alternatively, one could face to up to HTTP, with either HTTPLR, or BTF2.0, depending on whether you want half or full duplex comms. Why stay with HTTP? For one thing, numerous firewall administrators ensure many reliable b2b transmission scenarios have to run over HTTP. But there are other reasons to think a REST based protocol design will be the basis of over-web messaging in the coming years, chief among them, Atom Protocol.

It occurs to me that the next version of HTTPLR should/could extend Atom Publishing Protocol. The Atom Protocol hook into reliability is the fact that each time a document is created its URL is returned in the HTTP "Location:" header. With HTTP, once there is a shared token between the sender and receiver, there is a basis to achieve a reliable once and only once protocol exchange in 3 steps - one, submit the document; two, return the document token; three, acknowledge the token was received. The trick will be distinguishing the the 3rd step (final transmission) from a regular content update. But since Atom Protocol is smart about use of HTTP methods, I think it is a very doable thing to define a reliable state machine on top of it.

There are a number of reasons to choose Atom Protocol as the substrate for web-scale reliable messaging. First, a ton of software will be written to target APP in the next few years, and there is plenty of scope for extending the protocol; this suggests openly available and flexible software stacks. Second, since all document collections in Atom Protocol are served as Atom Feeds, it has inherent support for systems management and end to end reconciliation. Third, Atom entries have identity and are natural envelopes, unlike SOAP, where identity and true enveloping requires further specification (essentially raw Atom presents a better basis for interoperation than raw SOAP). Fourth, Atom Protocol can support binary content transmission not just XML, and thus can transmit arbitrary payloads. Finally, because Atom Protocol respects media types and deployed HTTP infrastructure, independent proxy inspection and security check-pointing can be installed cleanly, also eliminating the need to rewrite 2 stack layers and buy XML appliances to support and secure SOAP backed web services. It seems to be a question of when, rather than if, this will get built out.

links for 2007-04-04

April 03, 2007

links for 2007-04-03

April 01, 2007

links for 2007-04-01