bash option parsing

Since I know I'm going to forget how to do it properly, this is a tiny recipe to easily parse options in bash. I use getopts (and not getopt . !!! mind the S) that is a builtin shell command. help getopts to read the doco.

The snippet is pretty easy and self explanatory :

#!/bin/bash

usage()
{
DESCRIPTION=
cat << EOF
usage: $0 options
 
$DESCRIPTION
 
OPTIONS:
   -h      Show this message
   -v      Verbose
   -d      Debug
   -f      file
EOF

}
 
VERBOSE=
DEBUG=
FILE=
OTHERARGS=

while getopts "vhdf:" flag
do
  case "$flag" in
    f) FILE=$OPTARG ;;
    d) set -x ; DEBUG=true;;
    v) VERBOSE=true ;;
    h) usage ; exit 0 ;;
  esac
#  echo "$flag" $OPTIND $OPTARG
done
 
if [ -z $FILE ]; then
  shift $((OPTIND-1))
  OTHERARGS="$@"
fi

if [ -n "$VERBOSE"] ; then
  echo "verbose $VERBOSE"
  echo "debug $DEBUG"
  echo "file $FILE"
  echo "other args $OTHERARGS"
fi

exit 0

the Ocaml format module

Honestly ocaml format module is a royal PITA to use. The only documentation apart the reference manual is this document here. Don't get me wrong. I think it's a very nice piece of software and absolutely worth having it in the stdlib, but it simply not intuitive (at least for me) to use at the first glance. I'll write down a couple of example. hopefully this will help me - and others - the next time I'll need to use it.

I'm going to use the Format.fprintf function quite a lot. This function uses similar formatting string to the more widely used Printf.fprintf. In the Format module page you can find all the details. Let's start easy and print a string. We write a pretty printer function pp_cell that gets a formatter and an element. This is my favourite way of writing printing function as I can daisy chain together in a printf function call using the "%a" formatting string. If the formatter is Format.std_formatter the string will be printed on stdout.

let pp_cell fmt cell = Format.fprintf fmt "%s" cell
Next we examine a simple function to pretty printer a list of elements. The signature of this function is quite similar as before, but this time we also pass an optional separator and a pretty printer for the element of the string.
let rec pp_list ?(sep="") pp_element fmt = function
  |[h] -> Format.fprintf fmt "%a" pp_element h
  |h::t ->
      Format.fprintf fmt "%a%s@,%a"
      pp_element h sep (pp_list ~sep pp_element) t
  |[] -> ()
The function takes care of printing the separator after all elements but the last one.

Let's start playing witht the boxes. The formatting boxes are the main reason why I use the format module and they are very handy if you want to pretty print nested structure easily.

If we use the std_formatter and the list pretty printer without formatting box, we obtain this output.

# let fmt = Format.std_formatter ;;
# (pp_list ~sep:"," pp_cell) fmt ["aa";"bb";"cc"];;
aa,bb,
cc- : unit =
#
that is the same as :
# Format.fprintf fmt "%a" (pp_list ~sep:"," pp_cell) ["aa";"bb";"cc"];;
aa,bb,
cc- : unit = ()
To be frank, I don't quite get yet why the formatter decide to add a new line after the last comma... but moving on. If I now use a formatting box, the result is different. To print the list one one line, I can use the hbox. If I want a vertical list, I can use the vbox. This gives respectively:
# Format.fprintf fmt "@[<h>%a@]@." (pp_list ~sep:"," pp_cell) ["aa";"bb";"cc"];;
aa,bb,cc
# Format.fprintf fmt "@[<v>%a@]@." (pp_list ~sep:"," pp_cell) ["aa";"bb";"cc"];;
aa,
bb,
cc
If we want to print a list with one character of indentation, this can be easily done as:
Format.fprintf fmt "@[<v 1>@,%a@]@." (pp_list ~sep:"," pp_cell) ["aa";"bb";"cc"];;
 aa,
 bb,
 cc
The idea is that by changing the type of formatting boxes, the soft break @, is interpreted differently by the formatter, once as newline, once as space. Moreover by adding an indentation, the formatter will take care of adding an offset to all strings printed within that box. And this is a winner when pretty printing nested structures.

Lets now delve a bit deeper and let's try to format a table... I didn't found any tutorial on the net about this, but bit and pieces of code buried into different projects... A table for me is a tuple composed by a header (a string array) and two-dimensional array string array. The point here is to format the table in a way where each element is displayed in a column in relation to the longest element in the table. First we need two support pretty printers, one for the header and the other one the each row in the table. In order to set the tabulation margins of the table, we need to find, for each column the longest string in the table. The result of this computation (the function is shown below in pp_table) is an array of integer widths. When we print the header of the table, we make sure to set the width of each column with the Format.pp_set_tab fmt function. The magic of the Format module will take care of the rest. The second function to print each row is pretty straightforward to understand.

let pp_header widths fmt header =
  let first_row = Array.map (fun x -> String.make (x + 1) ' ') widths in
  Array.iteri (fun j cell ->
    Format.pp_set_tab fmt ();
    for z=0 to (String.length header.(j)) - 1 do cell.[z] <- header.(j).[z] done;
    Format.fprintf fmt "%s" cell
  ) first_row

let pp_row pp_cell fmt row =
  Array.iteri (fun j cell ->
    Format.pp_print_tab fmt ();
    Format.fprintf fmt "%a" pp_cell cell
  ) row

The pretty printer for the table is pretty easy now. First we compute the width of the table, then we open the table box, we print the headers, we iterate on each row and we close the box. tadaaaa :)

let pp_tables pp_row fmt (header,table) =
  (* we build with the largest length of each column of the
   * table and header *)

  let widths = Array.create (Array.length table.(0)) 0 in
  Array.iter (fun row ->
    Array.iteri (fun j cell ->
      widths.(j) <- max (String.length cell) widths.(j)
    ) row
  ) table;
  Array.iteri (fun j cell ->
    widths.(j) <- max (String.length cell) widths.(j)
  ) header;

  (* open the table box *)
  Format.pp_open_tbox fmt ();

  (* print the header *)
  Format.fprintf fmt "%a@\n" (pp_header widths) header;
  (* print the table *)
  Array.iter (pp_row fmt) table;

  (* close the box *)
  Format.pp_close_tbox fmt ();
for example this is what we get :
let a = Array.make_matrix 3 4 "aaaaaaaa" in
let h = Array.make 4 "dddiiiiiiiiiiiiiiiii" in
let fmt = Format.std_formatter in
Format.fprintf fmt "%a" (pp_tables (pp_row pp_cell)) (h,a);;
dddiiiiiiiiiiiiiiiii          dddiiiiiiiiiiiiiiiii          dddiiiiiiiiiiiiiiiii           dddiiiiiiiiiiiiiiiii
aaaaaaaa             aaaaaaaa             aaaaaaaa             aaaaaaaa
aaaaaaaa             aaaaaaaa             aaaaaaaa             aaaaaaaa
aaaaaaaa             aaaaaaaa             aaaaaaaa             aaaaaaaa
Well ... more or less. On the terminal you will notice that everything is well aligned. This is of course only to scratch the surface. There are still few things I don't really understand, and many functions that I didn't consider at all. Maybe I'll write a second chapter one day.

redmine 1.0.0

A long while ago I wrote about installing redmine as a normal user on debian. I've been using and administering a redmine installation for quite some time now and I'm very happy with it. Since the new stable release of redmine was released in june, I decided to give it another try.

The redmine installation manual is already very detailed. These are the steps I followed to get a simple redmine instance running on my system. First we need to install a specific version of rails and rack. I didn't check which version we have in debian (sid in this example), but since the gem system is very easy to use, I prefer to stay on the safe side and to use the recommended versions. To install the ruby gems is an arbitrary directory it is sufficient to set the GEM_PATH environment variable. Everything else follows pretty easily.

export GEM_PATH=`pwd`/gems
gem install -i $GEM_PATH rails -v=2.3.5
gem install -i $GEM_PATH rack -v=1.0.1

Since I don't want to bother with a full fledged database, in this example I'll use the sqlite3 backend. In order to do this, I need to install another gem :

gem install -i $GEM_PATH sqlite-ruby

I'm ready to get the redmine code. I prefer to get it from the svn directly, so I conveniently can keep it up-to-date.

svn co http://redmine.rubyforge.org/svn/branches/1.0-stable redmine-1.0

Now we need to configure the database backend, This is accomplished just be adding the following two line to the file config/database.yml. Mind that the backend is called sqlite and not sqlite3 !

production:
  adapter: sqlite
  database: db/test.db

Time to create and initialize the db with the default data. These two commands will create the db structure and add trackers, default users and take care of other details.

RAILS_ENV=production rake db:migrate
RAILS_ENV=production rake redmine:load_default_data

Since we are going to run the redmine instance either as a fcgi or with the standalone server, we just need to make sure that few directories are writable to my user :

chmod -R 755 files log tmp public/plugin_assets

Time to try it out !

ruby script/server webrick -e production
and to point our favorite browser to http://localhost:3000/

DONE!

Last time I tried redmine I had to fight a bit to install the ldap authentication plugin. Just by looking around the new stable version, I'm pretty happy to notice that it is now part of the default bundle. This is nice ! I've left to setup the fastcgi server and to configure the SCM repositories. Maybe I'll write a part two later...

edos-distcheck - new YAML output format

During the last two days I spent some time to implement part of the proposed features for distcheck/ edos-distcheck. Since everybody is at debconf and talk is silver, but code is gold, I hope that a real implementation can get the ball rolling and get us closer to a stable release of the next generation of edos/mancoosi tools.

In particular this post is about the new YAML output format for distcheck. The rational to use YAML is to have a data structure that is at the same time human and machine friendly. There are a lot of scripts in debian that rely on distcheck and we want to provide a grep friendly output that sat the same time doesn't hurt your eyes. The other proposed solution was to use json, but was ditched in favor of YAML. We also removed the xml output.

In order to provide a machine readable output and to minimize parsing mistakes, I used the schema language proposed here. This is the resulting data structure definition :

type: seq
sequence:
  - type: map
    mapping:
      "package": { type: str, required: true }
      "version": { type: text, required: true }
      "status":  { type: str, enum: [ broken, ok ], required: true }
      "installationset":
         type: seq
         sequence:
           - type: map
             mapping:
               "package": { type: str, required: true }
               "version": { type: text, required: true }
      "reasons":
         type: seq
         sequence:
           - type: map
             mapping:
               "conflict":
                  type: map
                  mapping:
                    "pkg1":
                      type: map
                      required: true
                      mapping:
                        "package": { type: str, required: true }
                        "version": { type: text, required: true }
                    "pkg2":
                      type: map
                      required: true
                      mapping:
                        "package": { type: str, required: true }
                        "version": { type: text, required: true }
                    "paths":
                      type: seq
                      sequence:
                        - type: map
                          mapping:
                            "path":
                              type: seq
                              sequence:
                                - type: map
                                  mapping:
                                    "package": { type: str, required: true }
                                    "version": { type: text, required: true }
                                    "vpkg": { type: str, required: false}
               "missing":
                  type: map
                  mapping:
                    "pkg":
                      type: map
                      required: true
                      mapping:
                        "package": { type: str, required: true }
                        "version": { type: text, required: true }
                        "vpkg": { type: str, required: false}
                    "paths":
                      type: seq
                      sequence:
                        - type: map
                          mapping:
                            "path":
                              type: seq
                              sequence:
                                - type: map
                                  mapping:
                                    "package": { type: str, required: true }
                                    "version": { type: text, required: true }
                                    "vpkg": { type: str, required: false}
There are few improvements on the old output from edos-distcheck. We are going to discuss these with a real example. The following tow snippets are from the output of distcheck on sid/amd64 (04/08/2010).

Distcheck now outputs a list of broken or installable packages depending on the given options (--failoures , --success, --explain and combinations of thereof ) . Two quick examples :

-
 package: python-gi-dbg
 version: 0.6.0-1
 status: broken
 reasons:
  -
   conflict:
    pkg1:
     package: python-gobject
     version: 2.21.4-1
    pkg2:
     package: python-gi
     version: 0.6.0-1
    paths:
     -
      path:
       -
        package: python-gi-dbg
        version: 0.6.0-1
        vpkg: python-gi (= 0.6.0-1)
       -
        package: python-gi
        version: 0.6.0-1
     -
      path:
       -
        package: python-gi-dbg
        version: 0.6.0-1
        vpkg: python-gi (= 0.6.0-1)
       -
        package: python-gi
        version: 0.6.0-1
        vpkg: python-gobject (>= 2.20)
       -
        package: python-gobject
        version: 2.21.4-1
In the example above, the package python-gi-dbg is broken because there is a conflict between the packages python-gobject and python-gi. The reason why python-gi-dbg is affected by this conflict is explained by following the dependency chain from python-gi-dbg to the two offending packages. Note that for each package element of each path we specify the vpkg, that is the dependency (as it is written in the control file) that lead to the conflict. Since a dependency can be a virtual package or a package with a version constraint, it can be expanded to a disjunction of packages (think a dependency on mta-agent can be expanded as postfix. exim or sendmail...). All possible paths to an offending package are reported.

Likewise if a package is broken because there is an unfulfilled dependency, distcheck will show the path leading to the problem . In the following example we show that the package gnash-tools is broken because there are two dependency that depend on the missing package  libboost-date-time1.40.0 (>= 1.40.0-1).

-
 package: gnash-tools
 version: 0.8.7-2+b1
 status: broken
 reasons:
  -
   missing:
    pkg:
     package: gnash-common
     version: 0.8.7-2+b1
     vpkg: libboost-date-time1.40.0 (>= 1.40.0-1)
    paths:
     -
      path:
       -
        package: gnash-tools
        version: 0.8.7-2+b1
        vpkg: gnash-common-opengl (= 0.8.7-2+b1) | gnash-common (= 0.8.7-2+b1)
       -
        package: gnash-common
        version: 0.8.7-2+b1
        vpkg: libboost-date-time1.40.0 (>= 1.40.0-1)
  -
   missing:
    pkg:
     package: gnash-common-opengl
     version: 0.8.7-2+b1
     vpkg: libboost-date-time1.40.0 (>= 1.40.0-1)
    paths:
     -
      path:
       -
        package: gnash-tools
        version: 0.8.7-2+b1
        vpkg: gnash-common-opengl (= 0.8.7-2+b1) | gnash-common (= 0.8.7-2+b1)
       -
        package: gnash-common-opengl
        version: 0.8.7-2+b1
        vpkg: libboost-date-time1.40.0 (>= 1.40.0-1)

The code is still in a flux and it is not ready for production yet (everything is in the mancoosi svn). I hope this is a good step in the right direction. Comments on the debian wiki are welcome.

if we compare the output of distcheck with the old edos-debcheck we get the following:

$cat /var/lib/apt/lists/ftp.debian.org_debian_dists_unstable_main_binary-amd64_Packages | edos-debcheck -failures -explain
[...]
holotz-castle-milanb (= 0.0.20050210-1): FAILED
  holotz-castle-milanb (= 0.0.20050210-1) depends on one of:
  - holotz-castle (= 1.3.14-2)
  holotz-castle-data (= 1.3.14-2) and holotz-castle-milanb (= 0.0.20050210-1) conflict
  holotz-castle (= 1.3.14-2) depends on one of:
  - holotz-castle-data (= 1.3.14-2)

$./debcheck --explain --failures /var/lib/apt/lists/ftp.debian.org_debian_dists_unstable_main_binary-amd64_Packages
[...]
-
 package: holotz-castle-milanb
 version: 0.0.20050210-1
 status: broken
 reasons:
  -
   conflict:
    pkg1:
     package: holotz-castle-milanb
     version: 0.0.20050210-1
    pkg2:
     package: holotz-castle-data
     version: 1.3.14-2
    paths:
     -
      path:
       -
        package: holotz-castle-milanb
        version: 0.0.20050210-1
        vpkg: holotz-castle
       -
        package: holotz-castle
        version: 1.3.14-2
        vpkg: holotz-castle-data (= 1.3.14-2)
       -
        package: holotz-castle-data
        version: 1.3.14-2

apt-get / aptitude test upgrades

After reading this interesting blog post from Petter Reinholdtsen, I've decided to repeat his experiments and save the results in with dudf-save . Using the Petter's script, I've created a lenny schroot, installed mancoosi-contest and the run apt-get and aptitide in simulation mode to create and upload the dudf to mancoosi.debian.net.

For example :

from lenny to squeeze (2010-07-28).

I'll repeat these tests from time to time. The idea would be to find upgrade problems, but in particular to compare apt-get / aptitude results with other solvers.

using apt-get and aptitude with cudf

apt-get and aptitide were two missing competitors of the misc competition. However it is important and interesting how these two tools compete against other solvers submitted to MISC. In this post I want to present two simple tools to convert cudf documents to something that apt-get based tools can handle. Cudf and debian share many characteristics but also have important semantic differences. One important difference is about installing multiple versions of the same package. Since this is allowed in cudf, but not in debian, we can use apt-get and aptitude only to solver cudf problems that respect this constraint, ruling out, for example, all cudfs from the rpm world. Another difference to take care is about the request semantic. In cudf, request can contain version constraints. For example, one can ask to upgrade the package wheel to a version greater then 2. Since it is not possible to translate directly this request in cudf we are forced to add a dummy package encoding the disjunction of all packages that respect this constraint. This problem does not arise with remove request as the refer always to the currently installed package.

Apt-get needs two files : The Packages file that contains the list of all packages known to the meta-installer and the status file that contains the list of packages that must result currently installed. To generate these files I wrote a small utility using the dose3 framework imaginatively called cudftodeb . This tool gets a cudf and produces three files : Packages, status and Request with the Request file containing the list of files to install or remove in a syntax compatible with apt-get .

In other to run apt-get/aptitude with these files, you would need a simple bash script. You can find details here for apt-get and here for aptitude. Most important option is the -s used to simulate an installation.

With the -v option of apt-get we can generate a parsable solution. This output is the piped through an other tool called aptgetsolutions in order to produce a cudf solution closing the circle.

For example, this is the trace produced by aptitude when trying to solve the legacy.cudf problem :

Reading package lists...
Building dependency tree...
Reading extended state information...
Initializing package states...
Reading task descriptions...
The following NEW packages will be installed:
  bicycle dummy_wheel electric-engine{b} glass{a} window{a}
The following packages will be upgraded:
  door wheel
2 packages upgraded, 5 newly installed, 0 to remove and 1 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
The following packages have unmet dependencies:
  gasoline-engine: Conflicts: engine which is a virtual package.
  electric-engine: Conflicts: engine which is a virtual package.
The following actions will resolve these dependencies:

     Remove the following packages:
1)     gasoline-engine

The following NEW packages will be installed:
  bicycle dummy_wheel electric-engine glass{a} window{a}
The following packages will be REMOVED:
  gasoline-engine{a}
The following packages will be upgraded:
  door wheel
2 packages upgraded, 5 newly installed, 1 to remove and 0 not upgraded.
Need to get 0B of archives. After unpacking 0B will be used.
WARNING: untrusted versions of the following packages will be installed!

Untrusted packages could compromise your system's security.
You should only proceed with the installation if you are certain that
this is what you want to do.

  wheel bicycle dummy_wheel door glass electric-engine window

*** WARNING ***   Ignoring these trust violations because
                  aptitude::CmdLine::Ignore-Trust-Violations is 'true'!
Remv gasoline-engine [1] [car ]
Inst bicycle (7 localhost) [car ]
Inst glass (2 localhost) [car ]
Inst window (3 localhost) [car ]
Inst door [1] (2 localhost) [car ]
Inst wheel [2] (3 localhost) [car ]
Inst dummy_wheel (1 localhost) [car ]
Inst electric-engine (1 localhost)
Conf bicycle (7 localhost)
Conf glass (2 localhost)
Conf window (3 localhost)
Conf door (2 localhost)
Conf wheel (3 localhost)
Conf dummy_wheel (1 localhost)
Conf electric-engine (1 localhost)

Not the package dummy_wheel used to encode the upgrade request of wheel>>2. This dummy package encodes the request as a dependency :

Package: dummy_wheel
Version: 1
Architecture: i386
Depends: wheel (= 3)
Filename: /var/fakedummy_wheel1

One last remark about apt-get. I just run on this bug today using an old version of apt-get that is shipped with lenny. For our experiments we are using only the latest version of apt-get in debian testing.

misc 2010, how to run a solver competition

One of the goals of the project Mancoosi is to get together researcher from various disciplines to advance the state of art of package managers. To this end, we organized an sat solving competition specifically tailored to upgrade/installation problems. The winner of the competition was announced during the workshop lococo hosted at the international conference FLOC the 10th of july 2010. I spent several hours preparing the infrastructure for the competition and here I'd like to give a brief account of my experience. This work was done together with Ralf Treinen, Roberto Di Cosmo and Stefano Zacchiroli.

Input - cudf documents

The input for the solvers in the competition are documents in the cudf format. In the last year we collected a number of cudf documents from the community (namely, with the help of mandriva, caixa magica and debian). These documents are stoked on the mancoosi.org servers and can be used to train meta installers on particularly difficult problems. We used 20 of such documents (from debian) for the misc 2010 competition. Unfortunately, due to problems in the conversion between dudf (a distribution specific format use to collect upgrade problems) and cudf, we were not able to use neither problems from mandriva or caixa magica.

Other then these *real* problems, we generated a number of artificial problems built from debian repositories. The utility we used is called randcudf and it is freely available on the mancoosi website, part of the dose3framework. We kept a number of variables into consideration in order to generate difficult problems but not so far away from the reality of every day use.

Among these parameters are

  • the size of the package universe
  • the number of packages that are declared as already installed in the universe (status)
  • the number of packages to install / remove / upgrade
  • the probability of generating install / remove requests with a version constraint
  • the number of packages declared as installed but whose dependencies might not be satisfied
  • the number of packages marked as keep and the type of keep (version or package)

Playing around with these variables we were able to produce problems of different size and different degree of complexity. During the competition, for example, the three categories had respectively a universe with 30k , 50K and 100K packages. Moreover we discarded all problems that do not have a solution at all.

From our experience during the problem selection, considering over 30K packages, if extremely easy to generate cudf problems that do not have a solution at all. For example in debian lenny there are 17K packages connected by a kernel of 80 conflicts. This configuration produce around 5K strong conflicts. This means that if we pick two packages among these 17K there is a high probability that these two packages are in conflict. This is because of the high level of inter-dependencies of open source distributions. With bigger remove/install requests this probability grows even bigger. Since the goal was to provide random problems as close as possible to reality our documents have a request to install at most 10 packages and remove 10 packages at the same time.

The five categories used in the competition :

  • cudf_set : Encoding in cudf of 1-in-3-SAT
  • debian-dudf : Real installation problems, Problems collected via dudf-save.
  • easy : Debian unstable / desktop installation from unstable / 10 install - 10 remove
  • difficult : Debian stable + unstable / server installation from stable / 10 install - 10 remove - 1 upgrade all
  • impossible : Debian oldstable, stable, testing, unstable / server installation from oldstable / 10 install - 10 remove - 1 upgrade all

Execution

The execution environment of the competition was set up in a dedicated virtual machine running on Xen. This VM respects the specification given in the rules of the competition. We did two small mistakes (to be fixed in the next version of the competition). First we did not specify the OS running in the virtual machine. To reproduce the results, everybody should be able to replicate the execution environment and re-run the competition. For Misc2010, we used a plain debian lenny (plus security updates). We tried to maintain the strictly minimum all additional software.

Starting from a base system (as generated by debootstrap) we added :

  • subversion (used to update the execution environment from the mancoosi svn)
  • git (used to register various versions of the submitted solvers)
  • sun-java (from non-free)

The second mistake was to not specify exactly the java version. open-java has subtle differences from sun-java and it seems these differences created a few problems for one of the participants. This problem was quickly rectified.

Running the competition

To run the competition I wrote few simple bash scripts and test cases. The test cases were meant to test the execution environment and to be sure that all constraints were correctly enforced. The execution environment is available in the mancoosi svn. In practice, we run the competition in four phases.

Phase One

In the first one we deployed all solvers in the execution environment. In order to cleanup the solver directory and "start fresh" after every invocation, I created an empty git repository for every solver. After each invocation, the repository was cleaned-up using

git clean -f -d
git reset --hard
find /tmp/ -user misc2010 -exec rm -Rf {} \;
the last line is to make sure that no temporary files was left in the temporary directory.

Phase Two

In the second phase, we actually run the competition. The script used is runcomp.sh. It takes 3 arguments, the list of solvers, the list of problems and a timeout in seconds. Since we used the same set of problems for the trendy and paranoid track we run the competition only once for both tracks. The output of the runcomp.sh script is a directory (i.e. tmp/201007060918) with all the raw results of the competition. All raw data is publicly available here .

Phase Three

In third phase we compute the aggregate results by track using the script recompute.sh. This script takes 4 arguments: the list of all solvers in one track, the list of problems (the same used before), the timestamp of the last run (ex 201007060918) and the name of the track. The output of this script is a file containing all the aggregate results, one per line, of the form category, problem, solver, time, results. For example a snippet from this file looks like :

cudf_set huge1.cudf apt-pbo-paranoid-1.0.5 - ABORT
cudf_set huge1.cudf p2cudf-paranoid-1.6 2.28814 FAIL
cudf_set huge1.cudf uns-paranoid-0.0002 - ABORT
cudf_set huge1.cudf ucl-cprel-1.0 0 FAIL
cudf_set huge1.cudf aspcud-paranoid-1.0 - ABORT
cudf_set huge1.cudf inescp-1.0 1.94812 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf apt-pbo-paranoid-1.0.5 11.75 SUCCESS -18,-21
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf p2cudf-paranoid-1.6 6.06838 SUCCESS -16,-34
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf uns-paranoid-0.0002 - ABORT
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf ucl-cprel-1.0 0 FAIL
debian-dudf 103c9978-5408-11df-9bc1-00163e7a6f5e.cudf aspcud-paranoid-1.0 14.4089 SUCCESS -15,-29
...
Results are generated using the solution checker .

Phase Four

The last step is the classification of the solutions. The misc-classifier gets as input the aggregates results and outputs the html tables that will be then published on the web.

Conclusions

Running a solver competition is not as easy as it seems. To get it right we run an internal competition in january 2009 that helped us to highlight, understand and solve different problems. It is mostly a matter of writing down the rules, specify a clear and understandable protocol for the solver submission (for example asking to version their solver and a md5 hash associated to the binary is a very good idea in order to avoid mix-up ) and spend some time to debug the scripts. The runsolver utility from Olivier Roussel (available here) is a very nice tool that can take care of many delicate details in process accounting and resource management. I added a small patch to be able to specify a specific signal as *warning* signal. The code is in my private git repository : git clone http://mancoosi.org/~abate/repos/runsolver.git/ . This is the actual code we used for the competition. The 32 bit binary is available in the svn. All in all it was a great experience.

 
The results of the competition are published here .

rssh and suid mode

I use rssh to allow restricted shell access to my servers. a few weeks ago I've noticed a lot of errors in my log of the form

Jun 22 11:04:06 dev rssh[13508]: setting log facility to LOG_USER
Jun 22 11:04:06 dev rssh[13508]: setting umask to 022
Jun 22 11:04:06 dev rssh[13508]: line 38: configuring user xxxx
Jun 22 11:04:06 dev rssh[13508]: setting xxxx's umask to 022
Jun 22 11:04:06 dev rssh[13508]: allowing rsync to user xxxx
Jun 22 11:04:06 dev rssh[13508]: chrooting xxxx to /home/xxxx
Jun 22 11:04:06 dev rssh[13508]: chroot cmd line: /usr/lib/rssh/rssh_chroot_helper 5 "rsync --server --sender -lde.L . "
Jun 22 11:04:06 dev rssh_chroot_helper[13508]: new session for xxxx, UID=1000
Jun 22 11:04:06 dev rssh_chroot_helper[13508]: chroot() failed, 5: Operation not permitted

It turned out a problem with a recent security update that removed the set user id from /usr/lib/rssh/rssh_chroot_helper. Dpkg has a nice way to make permanent such changes, that is dpkg-stateoverride. It simply boils down to:

#dpkg-statoverride --add root root 4755 /usr/lib/rssh/rssh_chroot_helper
#dpkg-statoverride --update
# stat  /usr/lib/rssh/rssh_chroot_helper
  File: `/usr/lib/rssh/rssh_chroot_helper'
  Size: 24904         Blocks: 56         IO Block: 4096   regular file
Device: 801h/2049d    Inode: 21326498    Links: 1
Access: (4755/-rwsr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2010-06-22 12:03:17.170460092 +0200
Modify: 2010-04-04 02:07:37.000000000 +0200
Change: 2010-06-22 12:02:23.040115785 +0200

xslt : namespace, move nodes, cdata

Every now and then I have to re-learn xslt to transform xml documents. My mantra is to use the right tool for the right job... so I'm here again struggling with xslt. There is a bit of a love/hate relation between me and this language. I've used a lot during my master thesis transforming an enormous - industry size - software specification (in SDL) to a formal language. This was a very painful way of learning a new language, After that I just used it to perform small tasks, but I always manage to forget part of the specification...

Anyway, just to avoid repeating this error again, the lesson today is about xml namespaces. | Here you can find a lengthy explanation about namespace and all possible related problems to xslt. I stumbled on a very simple case. Why my xslt stylesheet does not work in the presence of namespace ?

A small motivating example. Suppose you have a very simple xml document

<doc xmlns="http://mynamespace.org">
  <a><b>aaa</b><c>bbb</c></a>
</doc>

and you want to transform it in a different xml document as :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org"><![CDATA[aaa]]></b></newdoc>

Therefore you want to - change the document root (and namespace) - copy all the content of the element <a> but the element <b> - move the element <b> below <a> - embed the text content of <b> in a CDATA section.

We go one step at the time. First we transform the root element and we change the default namespace. The header of the xsl file is standard except for the declaration of the attribute xmlns:ns="http://mynamespace.org". Since our source xml document as a namespace, we have to match elements in this namespace. In other to do so, we associate a label ns to the namespace http://mynamespace.org and the we use it to match the element <doc>.

<?xml version="1.0"?>
<xsl:transform
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
 xmlns:ns="http://mynamespace.org">

The second part is the standard way of copying nodes and contents...

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

In the third part we match the root element, we create a new element and we copy everything. Since  <xsl:apply-templates/> does not have a select it applies by default to all nodes inside the root element.

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://mynamespace.org">
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>

</xsl:transform>

The result :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><b>aaa</b><c>bbb</c></a>
</newdoc>

Now we want to move the element <b>. We add a new template to match <a> and copy all its content except the node <b>

<xsl:template match="ns:a">
  <xsl:copy>
    <xsl:apply-templates select="*[not(self::ns:b)]"/>
  </xsl:copy>
</xsl:template>

This template being more specific then the default template we specified at the beginning of the document will be applied to <a> . Then we have to modify the template for the root element in order to copy the content of <a> and the the content of <b> below.

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://mynamespace.org">
    <xsl:apply-templates/>
    <xsl:apply-templates select="ns:a/ns:b"/>
  </xsl:element>
</xsl:template>

This will give something like this :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org">aaa</b></newdoc>

Last we want to embed the content of <b> in a cdata section. This is easy as we just need to add an output directive at the beginning of the file. The complete example :

<?xml version="1.0"?>
<xsl:transform
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
 xmlns:ns="http://mynamespace.org">

<xsl:output method="xml" indent="yes" cdata-section-elements="ns:b" />

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="ns:a">
  <xsl:copy>
    <xsl:apply-templates select="*[not(self::ns:b)]"/>
  </xsl:copy>
</xsl:template>

<xsl:template match="ns:doc">
  <xsl:element name="newdoc" namespace="http://othernamespace.org">
    <xsl:apply-templates/>
    <xsl:apply-templates select="ns:a/ns:b"/>
  </xsl:element>
</xsl:template>

</xsl:transform>
<code lang="xml">

And the final result :

<?xml version="1.0"?>
<newdoc xmlns="http://othernamespace.org">
  <a xmlns="http://mynamespace.org"><c>bbb</c></a>
<b xmlns="http://mynamespace.org"><![CDATA[aaa]]></b></newdoc>

transparently open compressed files ocaml (now with bz2 support)

A long time ago I wrote about how to handle compressed files in ocaml using extlib : http://mancoosi.org/~abate/transparently-open-compressed-files-ocaml

Today I got back to it and added bz2 support. The code is trivial. The only small problem to notice is that since the bz2 interface does not support a char input function, I've to simulate it using Bz2.read. A bit of a hack. I want to look at the bz2 bindings to fix this small shortfall. This is the code :

open ExtLib

let gzip_open_file file =
  let ch = Gzip.open_in file in
  let input_char ch = try Gzip.input_char ch with End_of_file -> raise IO.No_more_input in
  let read ch = try Gzip.input ch with End_of_file -> raise IO.No_more_input in
  IO.create_in
  ~read:(fun () -> input_char ch)
  ~input:(read ch)
  ~close:(fun () -> Gzip.close_in ch)
;;

let bzip_open_file file =
  let ch = Bz2.open_in (open_in file) in
  let input_char ch =
   (** XXX ugly ! *)
    try let s = " " in ignore (Bz2.read ch s 0 1) ; s.[0]
    with End_of_file -> raise IO.No_more_input
  in
  let read ch s pos len =
    try Bz2.read ch s pos len
    with End_of_file -> raise IO.No_more_input
  in
  IO.create_in
  ~read:(fun () -> input_char ch)
  ~input:(read ch)
  ~close:(fun () -> Bz2.close_in ch)
;;

let std_open_file file = IO.input_channel (open_in file)
let open_ch ch = IO.input_channel ch
let close_ch ch = IO.close_in ch

let open_file file =
  if Filename.check_suffix file ".gz" || Filename.check_suffix file ".cz" then
    gzip_open_file file
  else
  if Filename.check_suffix file ".bz2" then
    bzip_open_file file
  else
    std_open_file file
;;

let main () =
  let ch = open_file (Sys.argv.(1)) in
  try while true do print_string (IO.nread ch 10240) done
  with IO.No_more_input -> ()
;;

main ();;

Syndicate content