Batch Architect: Instruction File Format

Return to Batch Architect Page

David Maddison & Wayne Maddison

A Mesquite Instruction File is a text file describing the contents of one or more other text files containing data. The Instruction File gives enough details about the other text files to allow Mesquite to be able to read it.

Structure of the data file to be read

The data file that is to be described by the Instructions File must consist of a series of records, each record represented by the same number of elements, and each element must be separated by either a tab or a new line or carriage return. For example, you might have a file with various numbers given for each of a series of five trees. The data for each tree might be on a single line, and there might be three numbers per tree. There would thus be five records (corresponding to the five trees), and each record would have exactly three numbers.

Instruction File format

The format of the instruction file is currently very primitive and simple; eventually, it will be altered to use standard languages such as XML, and allow more flexibility, but for the moment it is formatted as follows.

  1. The first line must be "MesquiteInstructions"
  2. The second line gives the number of data elements to be read in and stored for each record. Currently the maximum number of elements to be stored is 12. There can be many more elements per record, but if there are more than the maximum number, some of them will not be able to be stored by Mesquite. The form of this line is

    numVariables = <number of elements to be read and stored>
  3. The third line describes how many files are to be read. The current maximum is 2. If more than one file is to be read, the two files need not have the same format.
  4. The next lines specify the layout of each file to be read, using the format:

    file1 = 'itemsPerRecord=<number of elements total in each record> v1=<element number of first variable> v2=<element number of second variable>...'

    For the second file, use

    file2 = ...
  5. An optional line can be included specifying the name to be applied to the records:

    recordLabel = <label for records>
  6. An optional line can be included specifying a simple formula to manipulate the data, of the form:

    formula = 'v1-v2'

    The formula can consist of nothing more than two variables with one of the standard four operators (+, -, *, /).

    If a formula is defined, you can label it with

    formulaLabel = <label for formula>
  7. Optional lines can then be give naming each of the variables that has been stored:

    label1 = <label for variable 1>
    label2 = <label for variable 2>

For each of these parts of the description, the <specification> part must consist of a single token; thus, if you wish to label something with multiple words, surround them by single quotes, as in 'Gamma Shape'.

Example

Imagine that you wished to ask PAUP* to estimate the value of the gamma shape parameter and proportion of invariant characters for a series of trees. The PAUP* commands you might give are:

   lset basefreq=empirical nst=1 pinvar=estimate rates=gamma shape=estimate;
   lscoreall/scorefile='score';

The start of the score file that would be generated would look something like this:

   Tree -lnL p-inv gamma shape
   1 8430.14114160 0.59050567 1.075992
   2 8410.59386351 0.57306194 0.991805
   3 7922.53719113 0.64386747 1.401232

The first step would be to strip the first line out of the score file, to yield:

   1 8430.14114160 0.59050567 1.075992 
   2 8410.59386351 0.57306194 0.991805 
   3 7922.53719113 0.64386747 1.401232

Each record has four items. If you wished Mesquite to read in only the estimate of the porportion of invariable characters and the gamma shape parameters, then you will want Mesquite to read in the third and fourth item in each record. The MesquiteInstructions file for processing this might then be:

   MesquiteInstructions
   numVariables = 2
   numFiles = 1
   file1 = 'itemsPerRecord=4 v1=3 v2=4'
   recordLabel = 'Replicates'
   label1 = 'proportion invariable
   label2 = 'gamma shape'

 


© David Maddison & Wayne Maddison, 2005