//===================================================================== // File: ABILaneFilter.java // Class: ABILaneFilter // Package: AFLPcore // // Author: James J. Benham // Date: August 10, 1998 // Contact: james_benham@hmc.edu // // Genographer v1.0 - Computer assisted scoring of gels. // Copyright (C) 1998 Montana State University // // This program is free software; you can redistribute it and/or // modify it under the terms of the GNU General Public License // as published by the Free Software Foundation; version 2 // of the License. // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with this program; if not, write to the Free Software // Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. // // The GNU General Public License is distributed in the file GPL //===================================================================== package AFLPcore; import java.io.File; import java.io.RandomAccessFile; import java.io.IOException; import java.util.NoSuchElementException; /** * This class reads data from a lane file produced by extracting lanes * from a gel run on an ABI377. It has been tested with lanes extracted * by GeneScan 2.0. It will probably work on with the ABI373 as well, but * this has not been tested. This class reads in the processed data, so * the lane must be processed as well as simply extracted. Also, it relies * on the ABI software to find the peaks in the size standard. * *
It will extract the following pieces of information from the file: *
This information will be stored in a Lane
object,
* which is used by the program. The peaks read in will be passed to
* a SizeFunction
which will use them to calculate the sizing
* information for the data. Since the ABI software also calls peaks that
* are not part of the size standard, the program compares all of the
* peaks to an internal SizeStandard
and use only the sizes
* it finds in that internal size standard. For example, the peaks with
* locations of 50.00, 100.00, and 150.00 bp would be used, but 54.23 would
* not. (Unless 54.23 was defined as part of the size standard, which can't
* really happen since the size standard must contian whole values.)
*
*
The filter has three options that must be set before it can run *
getOptions()
and
* setOptions(Option[])
. All three options are a list of
* choices, one of which must be selected. The possible values for the
* color option are red, blue, green, and yellow. The size function and
* the size standard can be the name of any size function/standard known
* to the program. This class uses the FeatureList
class to
* retrieve the known functions. Once the options have been set, the
* readLane
method can be called to read the actual file.
*
* The first 4 bytes contains the value of "ABIF" which indicates * that the file is an ABI Lane file (I think). The file contains a * record structure. Each record is 28 bytes long. The number of records * is given in a 32-bit integer at byte 18 (indexed to 0), and the offset * from the beginning of the file to the start of the first record is given * as a 32-bit integer at byte 26. A record has the following structure: *
* struct{ * byte[4] name; Four ASCII character name, like "DATA" * int tagNumber; Distinguishes fields with the same name for * example: DATA1, DATA2, ... , DATA12 * short data_type; Denotes the type of data 4 = integer * 7 = float, 10 = mm/dd/yy 11 = hh/mm/ss * 18 = pascal string, 1024 = some sort of structure * SHORT elementSize; The size of each element. * int numElements; The number of elements. * int recordLength; The length of the whole record. * int dataOffset; The offset from the beginning of the file to * the start of the record, unless the recordLength * is less than 4, in which case it contains the * actual data. * int unknown; Usually 0, but seems to change with the editing * of the file. * } ** Most of this information was obtained from Clark Tibbetts paper. ( * Tibbetts, Clark. "Raw Data File Formats and the Digital and Analog * Raw Data Streams of the ABI PRISM DNA Sequencer(c)." 1995.) * *
The following records are of interest: *
9 + colorNumber
.
* stdColor
, the standard
* trace is in DATA(9+stdColor) and PEAK(stdColor) contains the size
* standard peaks.
* * Value Start Length(bytes) Type * scan 0 4 integer (1000 + this value) * height 4 2 integer * area 18 4 integer * size 26 4 IEEE 754 single-percision float ** * * @see SizeFunction * @see SizeStandard * * @author James J. Benham * @version 1.0.0 * @date August 10, 1998 */ public class ABILaneFilter extends ImportFilter { // Variables from parent class //private protected int filetype; // the type, see constants above //private protected String name; // the name of this filter //private protected String descript; // a brief description //private protected File helpFile; // represents the file that contains // the help info for this filter. // Used to indentify the different entries of interest in the ABI file and // store the index into an array that contains the info. private static int NUM_ENTRIES = 10; // the number of entries. private static final int DATA = 0; private static final int GELN = 1; private static final int LANE = 2; private static final int LANS = 3; private static final int PEAK1 = 4; // Keep these 4 in order! private static final int PEAK2 = 5; private static final int PEAK3 = 6; private static final int PEAK4 = 7; private static final int SpNm = 8; private static final int StdF = 9; private ABIIndexEntry entries[]; /** color channel */ public static final int YELLOW = 2; /** color channel */ public static final int RED = 3; /** color channel */ public static final int BLUE = 0; /** color channel */ public static final int GREEN = 1; public static final int ALL = 4; private int colorChannel=0; private int stdColorChannel; private String standardName; private SizeFunction sizeFn; /** * Creates a new filter to read in ABI lane files. */ public ABILaneFilter() { // Initialize the variables for this filter filetype = LANE; name = "ABI Trace"; descript = "Reads lane files from ABI 377, not gel files."; helpFile = "abitrace.html"; // Options must be set. options = null; standardName = "not set"; sizeFn = null; } /** * Access the name of the filter. * * @return name of the import filter */ public String getName() { return name; } /** * Returns the type of input file supported by this filter In this case *
ImportFilter.LANE
, since the filter reads in lane data.
*
* @return constant LANE.
*/
public int getFileType()
{
return filetype;
}
/**
* Retrieves a short, approximately one sentence, description of the filter.
*
* @return the description
*/
public String getDescription()
{
return descript;
}
/**
* The help file describes which files the filter reads and the options
* that this filter accepts.
*
* @return File that contains the help information, either html or
* plaintext.
*/
public String getHelpFile()
{
return helpFile;
}
/**
* Returns the options for this filter, which includes the color of the
* data, the size function to use, and the size standard. The first
* option is the color to read, which can be one of four possilbe
* values: Red, Blue, Green, or Yellow. The color choice is given as
* a Option
of type CHOICE
. The second
* option is also of type CHOICE
. It tells which size
* method should be used to compute the size of the fragements. Please
* see the help files and the code for the size functions for a
* description of how the work. The third option describes the size
* standard to use. This simply gives the program a list of values.
* These are stored in a file called "standards.cfg" Possible values
* for all of these options are read in from the
* FeatureList
class.
*
* @return an array containing the options described above.
*
* @see Option
* @see FeatureList
* @see SizeFunction
* @see SizeStandard
*/
public Option[] getOptions()
{
Option[] returnOpts = new Option[3];
// Pick the color
String[] colors = new String[5];
colors[RED] = "Red";
colors[BLUE] = "Blue";
colors[GREEN] = "Green";
colors[YELLOW] = "Yellow";
colors[ALL] = "All";
Option param = new Option("Color", Option.CHOICE, true, colors, "Blue");
returnOpts[0] = param;
// The size function option, possiblities retrieved from the
// feature list.
param = new Option("Size Method", Option.CHOICE, true,
FeatureList.getSizeMgr().getNames(),
FeatureList.getSizeMgr().getDefaultName());
returnOpts[1] = param;
// the size standards defined
try {
param = new Option("Size Standard", Option.CHOICE, true,
FeatureList.getStandardMgr().getNames());
} catch(IOException e) {
throw new MissingParameterError("Error accessing standards file. " +
e.getMessage());
}
returnOpts[2] = param;
return returnOpts;
}
/**
* Sets the parameters for the filter to the specified values, including
* color. The color must be set before this filter can run. The option
* representing the color should have a string value naming the color.
* The size function must also be set for the filter to work. It
* must contain the name of a valid SizeFunction
. Note that
* the name is not the class name of the SizeFunction
, but
* the name each SizeFunction
stores internally. The
* third option must also be set.
*
* @param opts an array of length 3 which contains the options
* mentioned above and described in getOptions()
* The order must be: color, size function, size standard.
*
* @exception MissingParameterError occurs when the filter fails to
* extract a string from the first option in opts
.
* @exception IllegalArgumentException occurs when a string is found but
* cannot be matched to one of the colors: Red, Blue, Green, or Yellow.
* Or if an array with length not equal to 3 is given as
* opts
, or if the specified size function, the second
* option, could not be matched to a defined size function.
*/
public void setOptions(Option[] opts)
{
// Check the length.
if(opts.length != 3)
throw new IllegalArgumentException("Invalid options for ABI Lane " +
"Filter. 3 options expected, but " +
opts.length + " were provided.");
// extract the option
String value = opts[0].getStringValue();
// store the options
options = opts;
// check to make sure we have a string
if (value == null)
throw new MissingParameterError("Color not provided as parameter to " +
"ABI Lane Filter.");
if(value.equalsIgnoreCase("Red"))
colorChannel = RED;
else if(value.equalsIgnoreCase("Blue"))
colorChannel = BLUE;
else if(value.equalsIgnoreCase("Green"))
colorChannel = GREEN;
else if(value.equalsIgnoreCase("Yellow"))
colorChannel = YELLOW;
else if(value.equalsIgnoreCase("All"))
colorChannel = ALL;
else {
// didn't match a color, so something is wrong.
// set the options back to null since the ones we got were no good.
options = null;
// and complain
throw new IllegalArgumentException("Invalid color specified for ABI" +
" Lane Filter.");
}
// Next should be the size function
String sizeFnName = opts[1].getStringValue();
try {
sizeFn = (SizeFunction) FeatureList.getSizeMgr().get(sizeFnName);
}
catch(NoSuchElementException e) {
options = null;
throw new IllegalArgumentException("Invalid sizing function specified"
+ " for ABI Lane Filter. ");
}
// The final option is the size standard definition
standardName = opts[2].getStringValue();
// this will be checked later
}
/**
* This is the method that is called to preform the actual reading of the
* file. The data in the file represents data from a single lane. The
* options/parameters required for the filter should be set using
* setOptions
, and if they are not, an exception will be
* thrown.
*
* @param inputFile The file that contains the lane data.
*
* @return a Lane object with all of the appropriate information.
*
* @exception MissingParameterError occurs if the options are not
* set. Since this includes the required color, the filter cannot
* read in the lane.
* @exception IOException If an error is encountered in the file,
* then this exception will be thrown
*/
public Lane [] readLane(File inputFile) throws IOException
{
Lane newLane;
Lane [] laneArray;
int numOfLanes;
boolean allChannels;
long indexOffset;
long indexLength;
DataList stdPoints;
int peakIndex;
SizeStandard sizeStd;
SizeFunction sizeFn;
entries = null;
// Make sure we have options set, including the color channel
if(options == null)
throw new MissingParameterError("The color for the filter must be " +
"set before the filter can work.");
// Open the file. Set the mode to read only.
RandomAccessFile in = new RandomAccessFile(inputFile, "r");
// Check the file type. They all seem to start with "ABIF", which
// becomes 0x41424946 in hex.
int magicNum = in.readInt();
if( magicNum != 0x41424946)
throw new IOException("This does not appear to be an ABI lane file." +
" See help for more info.");
// Get the length of the index of types.
in.seek(18);
indexLength = (long) in.readInt();
// Get the location of the index.
in.seek(26);
indexOffset = (long) in.readInt();
//Added 6/25/2001 by Philip DeCamp
if(colorChannel == ALL){
laneArray = new Lane[4];
for(int i =0 ; i < 4; i++)
laneArray[i] = null;
allChannels = true;
colorChannel = -1;
}
else{
laneArray = new Lane[1];
allChannels = false;
}
for(int i = 0; i < 4; i++) {
if(allChannels){
//Goes through and finds a valid color channel
for(;;){
colorChannel++;
if(colorChannel > 3)
break;
entries = readRecords(indexOffset, indexLength, in);
try{
checkForColor();
in.seek(entries[LANS].dataOffset + 2);
stdColorChannel = in.readUnsignedShort() - 1;
if(stdColorChannel != colorChannel)
break;
} catch(Exception e){
// This color is not present in the file, skip it
// and take no other action.
}
}
// If the coloChannel is this high, it means that all channels
// have been checked.
if(colorChannel > 3)
break;
}
else{
entries = readRecords(indexOffset, indexLength, in);
checkForColor();
// Read in the color channel of the size standard. It is located at bytes
// 3 and 4 of the entry pointed to by LANS. It is in the form of an
// unsigned short.
in.seek(entries[LANS].dataOffset + 2);
stdColorChannel = in.readUnsignedShort() - 1;
}
// Move to the location of the Data
in.seek(entries[DATA].dataOffset);
int traceSize = (int) entries[DATA].numElements;
double [] trace = new double[traceSize];
for(int j= 0; j < entries[DATA].numElements; j++)
trace[j] = (double) in.readUnsignedShort();
newLane = new Lane(trace);
// Read in the Gel name.
if(entries[GELN].numElements > 4)
newLane.setGelName(readPString(entries[GELN].dataOffset, in));
else
newLane.setGelName(readPString(entries[GELN].dataOffset));
// Read in the Sample name.
if(entries[SpNm].numElements > 4)
newLane.setName(readPString(entries[SpNm].dataOffset, in));
else
newLane.setName(readPString(entries[SpNm].dataOffset));
// Read in the Lane number
// In this case, the offset is actually the data since the values
// are so small. The number is stored two bytes up from the end of
// the long, so shift it so that we get the correct value.
newLane.setLaneNumber( (int)(entries[LANE].dataOffset >> 16) );
// This doesn't seem to work for every file, so just let the user pick
// if for now.
// Read in the name of the size standard used.
// if(entries[StdF].numElements > 4)
// standardName = readPString(entries[StdF].dataOffset, in);
// else
// standardName = readPString(entries[StdF].dataOffset);
// Select the correct peak entry.
peakIndex = PEAK1 + stdColorChannel;
//=========== Read in the standard peaks===========
// get the size standard.
try{
sizeStd = ((SizeStandard)
FeatureList.getStandardMgr().get(standardName));
} catch(NoSuchElementException e) {
throw new IOException("Unknown size standard! '" +
standardName + "'");
}
stdPoints = new DataList();
Peak pk;
for(int j=0; j < entries[peakIndex].numElements; j++) {
in.seek(entries[peakIndex].dataOffset + j*96);
pk = readPeak(in);
if(sizeStd.contains(pk.getLocation())){
stdPoints.addData(pk);
}
}
// Set the color channel
newLane.setColor(colorChannel);
//================= set the size function ==============
String sizeName = options[1].getStringValue();
sizeFn = (SizeFunction) FeatureList.getSizeMgr().get(sizeName);
sizeFn = (SizeFunction) sizeFn.clone();
sizeFn.init(stdPoints);
sizeFn.setMaxScan(newLane.getNumPoints() - 1);
newLane.setSizeFunction(sizeFn);
laneArray[i] = newLane;
if(!allChannels)
break;
}
//==================clean up=============================
in.close();
/*=================DEBUG===================*/
//System.out.println("Gel Name is: " + newLane.getGelName());
//System.out.println("Sample Name is: " + newLane.getName());
//System.out.println("Lane number is: " + newLane.getLaneNumber());
//System.out.println("Standard name is: " + standardName);
//System.out.println("std color is: " + stdColorChannel);
/*=================DEBUG===================*/
if(allChannels){
for(int i = 0; i < 4; i++)
if(laneArray[i] != null){
allChannels = false;
break;
}
if(allChannels)
throw new IOException("No Color Channels Found");
}
return laneArray;
}
/**
* This filter does not read gels.
*
* @return Always null
*/
public Gel readGel(File inputFile) throws IOException
{
return null;
}
/**
* Parses the records portion of the ABI file to gather information about
* several different data structures in the file. The important pieces of
* information are, where the data structure is in the file, and how big
* it is. All of the records start with a four charachter ASCII value.
* the records of interest are DATA, which stores the trace information;
* SMPL, which is the name of the sample; GELN, which is the name of the
* gel on which the sample was run; and LANE, which stores the lane
* number of the sample. In some cases, these identifers are repeated.
* For example, a file can have up to 12 DATA entries, but each one has
* a tag number to seperate it. Only one of these contains the information
* we want.
*
* The parser works be converting strings like "DATA" into a long value,
* which is the ASCII representation of the string. It then compares
* this to the first four bytes of each record. On a match, it will
* look at the rest of the record and decide to either keep it or discard
* it. More details are in the code for those interested.
*
* @param indexOffset the location from the beginning of the file to
* the start of the index of records. ie, the location of the first
* record
* @param indexLength the number of records in the index
* @param in the ABI trace file.
*
* @return the records of interest stored in ABIIndexEntry(s). Only
* the portions of the records that are needed are returned.
*
* @exception IOException could come from RandomAccessFile methods or
* from the method itself MODIFY after complition
*/
private ABIIndexEntry[] readRecords(long indexOffset, long indexLength,
RandomAccessFile in)
throws IOException
{
// Create the structure to hold the records for the entries.
ABIIndexEntry record[] = new ABIIndexEntry[NUM_ENTRIES];
// Add the values that we are looking for
// in some cases, we are looking for certain records. For example:
// the correct color channel's data entry is 9 + colorChannel
// which represents the matrix corrected data in the file.
// The file actully contains anywhere from 8-12 DATA records,
// but the tag number determines which one we want.
record[DATA] = new ABIIndexEntry("DATA", 9 + colorChannel);
// Go for the raw data...
//record[DATA] = new ABIIndexEntry("DATA", 1 + colorChannel);
record[GELN] = new ABIIndexEntry("GELN");
record[LANE] = new ABIIndexEntry("LANE");
record[LANS] = new ABIIndexEntry("LANS");
record[PEAK1] = new ABIIndexEntry("PEAK", 1);
record[PEAK2] = new ABIIndexEntry("PEAK", 2);
record[PEAK3] = new ABIIndexEntry("PEAK", 3);
record[PEAK4] = new ABIIndexEntry("PEAK", 4);
record[SpNm] = new ABIIndexEntry("SpNm");
record[StdF] = new ABIIndexEntry("StdF");
// Variables to temporarly hold the record info while we decide if the
// record is valid.
long nameKey;
long tag;
long numElem;
long offset;
// Go to the start of the index
in.seek(indexOffset);
// Look for records that begin with the name that we're interested in
// and then look at that record more carefully. If it doesn't look valid,
// we won't copy the temporary values to anything permenant.
for(int count=0; count < indexLength; count++)
{
// read in the name
nameKey = (long) in.readInt();
// Read in the other info
tag = (long) in.readInt();
in.skipBytes(4); // skip to the next interesting part
numElem = (long) in.readInt();
in.skipBytes(4); // skip this too
offset = (long) in.readInt();
// Now see if the name matches any of the ones we're looking for by
// comparing it to each entry in the array.
for(int i=0; i < NUM_ENTRIES; i++)
{
if(nameKey == record[i].nameKey)
{
// Make sure we have the data record we want.
// If the tag is set, check to see if it matches.
if( (offset != 0) &&
!((record[i].matchTagNumber()) &&
(tag != record[i].tagNumber)))
{
// Make sure the data points to something
// good. There seem to be a lot of entries that
// don't point to anything. For example, some
// files contain multiple SMPL entries, but only
// one of these has a non-null data pointer. If
// the record points to something, store the
// temporary values.
record[i].tagNumber = tag;
record[i].numElements = numElem;
record[i].dataOffset = offset;
// we only need to store it once, so don't go through
// the inner loop extra times.
break;
}
} //if (name matches one we're interested in.
} // for(every entry in the record array)
// move to the next record
in.seek(indexOffset + count*28);
} // for every record
return record;
}
/**
* Check to make sure we found the color channel. If we don't throw an
* exception. This could happen because not every file has every channel
* for the processed color data. In this case, the offset will be zero
* since it was never assigned a value.
*
* @exception IOException occurs when the filter cannot find the color
* channel specified with setColorChannel
in the file.
*/
private void checkForColor() throws IOException
{
if( entries[DATA].dataOffset == 0)
{
String errorMsg="";
switch(colorChannel)
{
case RED:
errorMsg = "red";
break;
case BLUE:
errorMsg = "blue";
break;
case GREEN:
errorMsg = "green";
break;
case YELLOW:
errorMsg = "yellow";
break;
}
errorMsg = "Could not find the color " + errorMsg + " in the file.";
throw new IOException(errorMsg);
}
}
/**
* Read in a peak from the file. A peak in the ABI file is 96 bytes
* long. The first 4 bytes are used to store the scan number as 32-bit
* integer. This scan number is different than the one displayed by the
* ABI programs. It is 1000 less, but the number 1000 could vary. 1000 is
* also the value stored in OFFS. The next two bytes are the height, as
* a 16-bit integer. I don't know what the next 12 bytes are. After that,
* the peak area is stored as a 32-bit integer. Skip four bytes again.
* we then have the size of the peak, in bp. This is a IEEE 754 single
* precision float.
*
* * Value Start Length(bytes) Type * scan 0 4 integer (1000 + this value) * height 4 2 integer * area 18 4 integer * size 26 4 IEEE 754 single-percision float ** * @param in the input source * * @return a peak, with the size/location and height read from the file * and the area set as the scan number, not the area. * * @exception IOException occurs if the file cannot be read. */ public Peak readPeak(RandomAccessFile in) throws IOException { int scan; int height; int area; double size; scan = in.readInt(); height = in.readUnsignedShort(); in.skipBytes(12); area = in.readInt(); in.skipBytes(4); size = in.readFloat(); return new Peak(size, (double) height, (double) scan); } /** * Read in a Pascal type string, where the first byte is the length of * the string, and the rest are the charachters. This is accomplised by * reading in the bytes (unsigned) and converting them into a charachter * array of the correct length, and then turing the character arrray * into a String. * * @param location where the string is in the file, relative to the * beginning of the file. * @param in the file with the information. * * @return the string as a
String
object.
*
* @exception IOException occurs if location
can not
* be reached for some reason.
*/
String readPString(long location, RandomAccessFile in) throws IOException
{
// Move to the correct location
in.seek(location);
// Read in the length and set up the array
int length = in.readUnsignedByte();
char gelname[] = new char[length];
// fill the array.
for(int i=0; i < length; i++)
gelname[i] = (char) in.readUnsignedByte();
return new String(gelname);
}
/**
* This converts a long integer into a string. This is used when the
* dataOffset contains the actual data. This will happen if it is
* an extremely short string, < 3 characters. The format is as follows,
* the bits 24-31 contain the length of the string, and the bits following
* contain the sequence character. It is perhaps easier to think of it
* as 8 bytes. The first four high-order bytes are not used. A long is
* used to store the original 32-bit data so sign wrapping can be avoided.
* of the lower 4 bytes, say 3, 2, 1 and 0. With 0 being the low-order byte,
* the length of the string is stored in byte 3, while the characters are
* stored in 2, 1, and 0 as neccessary.
*
* @param stringBits a data structure matching that specified above.
*
* @return the string contained in the bits
*/
String readPString(long stringBits)
{
int length = (int) (stringBits >>> 24);
char name[] = new char[length];
String hal = "";
// fil the array.
for(int i=0; i < length; i++)
name[i] = (char) ((stringBits >>> ( (2 - i)*8 )) & 0x00000000000000ff);
return new String(name);
}
}