Annotation File

Annotation file

Description of Annotation Files

An annotation file is a tab-delimited text file containing annotation data for a specific slide_type. mev files can be associated with an annotation file only if both types of files are based on the same slide_type. The keys to this association are the unique ids in both files. Rows of mev and annotation files can be associated with each other if the unique ids are identical. A single header row is required to precede the annotation data in order to identify the columns below. Each remaining row of the file stores annotation data for a particular spot/feature on the array.

Annotation files may contain any number of non-computational comment lines. These lines, starting with '#', will be treated identically to comment lines in mev files, and should precede the header row.

Annotation files created at TIGR will use UIDs that match the format used in the mev files, most likely database_name:spot_id. The structure of each annotation file is detailed below. The header row consists of headers that identify each column of data. Each subsequent row of the file stores data for a particular spot/feature on the array. The annotation files created at TIGR will typically contain at least one comment at the top of the file with the following information:

version Version number based on revisions of annotation data

format_version/td> The version of the .mev file format document

date/td> Date of file creation or update

analyst/td> Owner or the person responsible for creating the file

created_by/td> Software tool used to create the document

gi_version/td> Version of the Gene Indices (or db?) that produced this annotation data

slide_type/td> type from the slide_type table that this array is based on

output_row_count/td> Number of rows of annotation (eg. non-header) data

description/td> Common name or other details about the experiment

An example of the leading comments:

# version: V3.0
# format_version: V4.0
# date: 04/20/2004
# analyst: jwhite
# created_by: Database script
# gi_version: 3.0
# slide_type: IASCAG1
# output_row_count: 32448
# description: Standard annotation file

The header row consists of the field names for each subsequent row in this file. Only the UID field is required. It must be the first field present and it must be named 'UID'. Any number of additional fields may be included. Annotation files created at TIGR will always contain the following columns:

UID unique identifier for this line of annotation

R row (slide row)

C column (slide column)

The remaining fields may vary, and a standard set has yet to be determined. Such a list will be published on a future date. R and C have been included to allow for manual alignment of the mev and corresponding annotation files in the event that the mev files were not generated in a traditional manner (ie. using Madam, etc.).

Some varieties of annotation files follow. The format may vary depending on the purpose of the file:

UID \t R \t C \t FeatN \t GBNum \t TCNum \t ComN \t …
UID \t R \t C \t GeneN \t Rxn \t PathwayN \t …
UID \t R \t C \t FeatN \t End5 \t End3 \t ChrNum \t …

Of course, it would be possible to combine the fields of these files, or add fields that have not been mentioned here. The goal is to keep the annotation flexible and the processing seamless.

There are not any naming conventions for annotation files at this time. If such a standard is introduced in the future, it will be detailed here.

version	Version number based on revisions of annotation data
format_version/td>	The version of the .mev file format document
date/td>	Date of file creation or update
analyst/td>	Owner or the person responsible for creating the file
created_by/td>	Software tool used to create the document
gi_version/td>	Version of the Gene Indices (or db?) that produced this annotation data
slide_type/td>	type from the slide_type table that this array is based on
output_row_count/td>	Number of rows of annotation (eg. non-header) data
description/td>	Common name or other details about the experiment

UID	unique identifier for this line of annotation
R	row (slide row)
C	column (slide column)