MEV File Description

MEV file description

This is short .mev File Format Description. Please look TM4 suite package for full document.

Production Revision 4.0 (July 16, 2004)

A MultiExperimentViewer or .mev file is a tab-delimited text file that contains coordinate and expression data for a single microarray experiment. A single header row is required to precede the expression data in order to identify the columns below. With the exception of optional comment lines, each remaining row of the file stores data for a particular spot/feature on the array.

MeV and other TM4 software tools will consider comment lines non-computational. A comment line must start with the pound symbol '#', and can be included anywhere in the file. If the pound symbol is the first character on a line, the entire line (up to the newline character '\n') will be ignored by the software tool.

The mev files created at TIGR will typically contain at least one comment at the top of the file with the following information. The format and fields contained within these comments are subject to change. See Appendix 2 for details.

version Version number based on revisions of expression data

format_version The version of the .mev file format document

date Date of file creation or update

analyst Owner or the person responsible for creating the file

analysis_id id from the analysis table that corresponds to this set of expression values

slide_type slide_type from the slide_type table that this array is based on

input_row_count Number of rows of expression (eg. non-header) data in input files

output_row_count Number of rows of expression (eg. non-header) data in this file

created_by Software tool used to create the file

description Common name or other details about the experiment

An example of the leading comments:
# version: V1.0
# format_version: V4.0
# date: 10/06/2004
# analyst: aisaeed
# analysis_id: 10579
# slide_type: IASCAG1
# input_row_count: 32448
# output_row_count: 32448
# created_by: TIGR Spotfinder 2.2.3
# TIFF files processed: gpc30025a_532_nm.tif, gpc30025a_635_nm.tif
# description: Tumor type comparison
# This is the 4th experiment in a series of 20 to identify tissue-specific genes.

The header row consists of the field names for each subsequent row in this file (with the exception of comment lines). A minimum of seven columns must be present, and these must use a set of specifically named headers. Any number of additional columns may be included. The seven required column headers are:

As of version 4.0 of this file format the IA and IB columns can be substituted with MedA and MedB. The new requirement is that at least one integrated intensity (IA, IB, etc.) or one median (MedA, MedB, etc.) value be reported for each channel in the microarray. For example, a two channel microarray .mev file would require either IA and IB or MedA and MedB.

MedA Median intensity in channel A

MedB Median intensity in channel B

The mev files created at TIGR may use one of the two formats for the header row, depending on the origin of the mev file. The non-required columns (i.e. anything after the 7th column) may be rearranged and their names are subject to change at this time.

1) Spotfinder created mev file:

UID \t IA \t IB \t R \t C \t MR \t MC \t SR \t SC \t FlagA \t FlagB \t SAA \t SAB \t SFA \t SFB \t QCS \t QCA \t QCB \t BkgA \t BkgB \t SDA \t SDB \t SDBkgA \t SDBkgB \t MedA \t MedB \t AID

UID Unique identifier for this spot

IA Intensity value in channel A

IB Intensity value in channel B

R Row (slide row)

C Column (slide column)

MR Meta-row (block row)

MC Meta-column (block column)

SR Sub-row

SC Sub-column

FlagA TIGR Spotfinder flag value in channel A

FlagB TIGR Spotfinder flag value in channel B

SA Actual spot area (in pixels)

SF Saturation factor

QC Cumulative quality control score

QCA Quality control score in channel A

QCB Quality control score in channel B

BkgA Background value in channel A

BkgB Background value in channel B

SDA Standard deviation for spot pixels in channel A

SDB Standard deviation for spot pixels in channel B

SDBkgA Standard deviation of the background value in channel A

SDBkgB Standard deviation of the background value in channel B

MedA Median intensity value in channel A

MedB Median intensity value in channel B

MNA Mean intensity value in channel A

MNB Mean intensity value in channel B

X X coordinate of the spot cell rectangle

Y Y coordinate of the spot cell rectangle

PValueA P-value in channel A

PValueB P-value in channel B

DBID Data Base ID (used when UID is substituted)

The first seven fields (UID, IA, IB, R, C, MR and MC) are required as specified above.

This flexible format allows users to track slide-specific data of interest, such as background, spot size and alternate intensities without requiring them of all users or adopting a limited 'vocabulary' of field names. This header row serves to identify the required and additional data columns. UID must be the left-most column in the mev file. Other columns do not need to be present in a fixed order.

For mev files generated at TIGR, the UIDs may be of the form: database_name:spot_id (eg. cage:20238). For any given microarray database, the id field in the spot table will be unique. The combination of database and spot_id will therefore uniquely identify any spot on any array created at TIGR. It is important to note that this is not enough information to distinguish between spots in the same location on two slides of the same slide_type, as this would typically require an analysis_id. Since annotation data is based on slide_type, it is not necessary to make this distinction, as all slides of a given type will use the same annotation file.

The AID column will usually contain an incremental sequence of numbers starting at 1. These can be used to return the file to the original sorted order and can function as a unique row identifier if necessary.

Applications that generate files of expression data (commonly in tav format) by retrieving records from the database access the spot table. TIGR Spotfinder, Midas and Madam are all capable of generating UIDs of the form described above in addition to the typical coordinate and intensity data.

mev files are required to end with the extension '.mev'. At this time there are no further naming conventions for mev files.

version	Version number based on revisions of expression data
format_version	The version of the .mev file format document
date	Date of file creation or update
analyst	Owner or the person responsible for creating the file
analysis_id	id from the analysis table that corresponds to this set of expression values
slide_type	slide_type from the slide_type table that this array is based on
input_row_count	Number of rows of expression (eg. non-header) data in input files
output_row_count	Number of rows of expression (eg. non-header) data in this file
created_by	Software tool used to create the file
description	Common name or other details about the experiment

UID	Unique identifier for this spot
IA	Intensity value in channel A
IB	Intensity value in channel B
R	Row (slide row)
C	Column (slide column)
MR	Meta-row (block row)
MC	Meta-column (block column)

MedA	Median intensity in channel A
MedB	Median intensity in channel B