Automatically synchronize files with Mercurial

Posted on November 27, 2009

Basic features of Mercurial (or any other version control system) are file transfer, synchronization and (to some extend) automatic resolution of change conflicts. In combination with free Mercurial hosting platforms like BitBucket this makes Mercurial an excellent tool to keep a set of files in sync across workstations.

Use case

Personally I use Mercurial to keep a calendar file, some ToDo lists and user spelling dictionaries synchronized between my work and home machines. I use a repository at BitBucket as the synchronization hub:

          BitBucket
           /     \
         sync    sync
         /         \
      Home         Work

First I've set up a repository at BitBucket and then I cloned this repository on my work stations. By pushing local changes in a work station repository to the hub repository at BitBucket I can easily synchronize them on my other work station.

Automatic synchronization

However, synchronizing manually is a tedious task. Ideally this process is automated by a script which periodically synchronizes local repositories with the remote hub repository. Such a script should regularly

  1. commit local changes,
  2. pull remote changes
  3. do a merge if needed and
  4. push local commits to the remote hub repository.
UPDATE: The content below is outdated. You should use the Mercurial autosync extension instead. (November 30, 2009)

The crucial thing in automating synchronization is handling errors and conflicts. The script should be robust to not crash or cause damage on errors or conflicts. Below is a script for Linux systems which tries to meet those requirements:

#!/bin/sh
# Script to automatically synchronize Mercurial repositories.
# Copyright 2009 by Oben Sonne. Licensed as GPLv3. Use at your own risk.

#--------------------------------------------------------------------------
# init
#--------------------------------------------------------------------------

REPO=$1
[ -n "$REPO" ] || { echo "need a repo dir"; exit 1; }
HG="hg -R $REPO"
$HG id > /dev/null 2>&1 || { echo "no hg repository at $REPO"; exit 1; }

export HGMERGE=false

#--------------------------------------------------------------------------
# configuration (change for your needs)
#--------------------------------------------------------------------------

LOG="/var/tmp/hgsync.`basename "$REPO"`.log"
#LOG=/dev/stdout
INTERVAL=600 # sync every 10 minutes
AUTO_ADDREMOVE=1 # automatically track/untrack new/deleted files?

#--------------------------------------------------------------------------
# helper functions
#--------------------------------------------------------------------------

log() {
    echo "`date +%Y-%m-%d-%H-%M`: $1" >> $LOG
}

check() {
    while [ "`$HG id -r tip`" != "`$HG id -r .`" ] ; do
        log "not at tip (please check and fix manually)"
        sleep $INTERVAL
    done
    while [ -n "`$HG resolve -l`" ] ; do
        log "unresolved merge conflict (please check and fix manually)"
        sleep $INTERVAL
    done
}

addremove() {
    if [ $AUTO_ADDREMOVE -eq 1 ] ; then
        while ! ($HG addremove >> $LOG 2>&1) ; do
            log "addremove failed (please check and fix manually)"
            sleep $INTERVAL
        done
    fi
}

commit() {
    MSG=$1
    FORCE="$2"
    CHANGE="`$HG st --modified --added --removed`"
    if (test -n "$FORCE" || test -n "$CHANGE") ; then
        log "commit local changes"
        while ! ($HG ci -m "$MSG" >> $LOG 2>&1) ; do
            log "commit failed (please check and fix manually)"
            sleep $INTERVAL
        done
    fi
}

pull() {
    if ($HG incoming > /dev/null 2>&1) ; then
        log "pull remote changes"
        while ! ($HG pull -u >> $LOG 2>&1) ; do
            log "pull failed (please check and fix manually)"
        done
        HEADS=`$HG heads --template "{rev}\\n" | wc -l`
        if [ $HEADS -gt 1 ] ; then
            log "merge remote changes"
            if ($HG merge >> $LOG 2>&1) ; then
                commit "auto merge" "force"
            else
                log "merge failed (please resolve manually)"
                while [ -n "`$HG resolve -l | grep "^U "`" ] ; do
                    sleep $INTERVAL
                done
                commit "manual merge" "force"
            fi
        fi
    fi
}

push() {
    if ($HG outgoing > /dev/null 2>&1) ; then
        log "push local changes"
        while ! ($HG push >> $LOG 2>&1) ; do
            log "push failed (please check and fix manually)"
            sleep $INTERVAL
        done
    fi
}

#--------------------------------------------------------------------------
# sync loop
#--------------------------------------------------------------------------

log "start sync loop"

while (true) ; do
    check # some initial checks
    addremove
    commit "auto update" # commit local changes
    pull # pull and merge remote changes
    push # push local commits
    sleep $INTERVAL # sleep
done

Put it into a file, e.g. hgsync.sh, and run it with the repository to sync as first argument:

$ sh hgsync.sh /path/to/repo

The log output goes into /var/tmp/hgsync.repo.log. Ideally this script gets started at some point in a work station's startup process.

Handling errors and conflicts

The general approach of this script to cope with errors and synchronization conflicts is to log the failing opration's output (so you know what's wrong and how to fix it) and to repeatedly try to proceed until the problem has been fixed or the conflict has been resolved manually.

Conflicts

Concerning synchronization (a.k.a. merge) conflicts, this script works flawlessly as long as there are no changes in parallel at the different work stations. But even if there are changes in parallel, Mercurial merges them in most cases without user interaction. Only in rare situations Mercurial cannot merge automatically. However, such a conflict is not a big problem. The script continues to run, but it simply does nothing as long as the merge conflict has not been resolved manually. Running hg resolve in the repository fixes this situation and the script continues.

Errors

Similar to merge conflicts, when other problems occur, the script simply stays at its current step and waits until the problem gets fixed manually. Once fixed the script continues its work.

Murphy's law

Though this script has been written to be robust it still may fail, perhaps on a semantic level, e.g. it automatically merges changes to nonsense. Anyway, as we are using a version control system for synchronization the history of all changes is available and synchronization errors can always be fixed later.

Notify conflicts

Merge conflicts get logged but until you notice something is wrong, days may pass. Better would be some kind of visual or audio notice. Feel free to adjust that script to call zenity, kdialog, beep or whatever tool you think is reasonable for that task.