|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--AFLPcore.Operation | +--AFLPcore.ImportFilter | +--AFLPcore.SCFFilter
This class reads data from an SCF file, which are fairly common. Reading in the data is actually fairly complicated in that genographer is only very useful with molecular weight standards. SCF files always have four channels for data (these are usually used for A, C, G, T). However, genographer only looks at one channel at a time for data, plus one other channel for the molecular weight standards. Unlike ABI files, SCF files do not contain the peak information, so this filter must use "TracePeak Finder" in order to size the data. Basically, this filter does the following: 1) Extract necessary file information from the header. 2) Extract data from data channel. 3) Extract data from the molecular weight standards channel 4) Detect peaks in the standards channel. 5) Read in the sizing method from the standards.cfg file 6) Combine the information from the peaks with the sizing method file to calibrate the sizing function. 7) Parse through the comments for Name, Gel Name, and Lane Number 8) Return Lane object, which contains the trace data and sizing function. You've probably noticed, if you've ever actually run this beast, that this filter has a number of options that need to be set. 1) Data channel 2) Standards channel 3) Sizing Function 4) Standards Used 5) Min Peak Height These can be manipulated using getOptions() and setOptions(Option[]), which are located within this file. This class uses the FeatureListc lass to retrieve the known functions. Once the options have been set, the readLane method can be called to read the actual file. The File Format I got all my info from the "Staden Package" site, so you may want to take a look if it's still up. The address at the time of this writing is http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_toc.html There are several sections of a SCF file. The first is the header, which should be 128 bytes long. It looks a little something like this: unsigned 4-byte integer: magic number - should equal 779314022 (".scf") unsigned 4-byte integer: number of samples unsigned 4-byte integer: offset from the start of the file to the samples unsigned 4-byte integer: number of bases - we don't care unsigned 4-byte integer: bases left clip - we don't care unsigned 4-byte integer: bases right clip - we don't care unsigned 4-byte integer: bases offset - we don't care unsigned 4-byte integer: comments size in bytes unsigned 4-byte integer: offset from the start of the file to the comments 4 1-byte characters: version - should by '3' '.' '0' '0' unsigned 4-byte integer: sample size - 1=8bit samples, 2=16bit samples unsigned 4-byte integer: code_set - we don't care unsigned 4-byte integer: private size - we don't care unsigned 4-byte integer: private offset - we don't care unsigned 4-byte integer * 18: absolutely nothing Now, onto the data area, which starts at "sampleOffset". In SCF, all four data's will be present, even if some of them are empty. The structure of the entire file looks like this: Header (128 bytes) Data A (numberOfSamples * sampleSize) Data C (numberOfSamples * sampleSize) Data G (numberOfSamples * sampleSize) Data T (numberOfSamples * sampleSize) Offsets for bases (Number of bases * 4)(data in integers) Accuracy estimate for A bases (Number of bases)(data in unsigned bytes) Accuracy estimate for C bases (Number of bases)(data in unsigned bytes) Accuracy estimate for G bases (Number of bases)(data in unsigned bytes) Accuracy estimate for T bases (Number of bases)(data in unsigned bytes) Reserved (Number of bases * 3) Comments (commentsSize)(data in characters) Private data (privateSize) So, the data channel you want should be located at byte position: dataOffset + sampleSize * numberOfSamples * numberOfChannel However, with versions earlier than 3.00, the file structure is a little different. Header (128 Bytes) Sample Structures (4 * samplesize * numberOfSamples) Base Structures (12 * numberOfBases) Comments (commentSize) Private data (privateSize) A Sample Structure looks like this Data A (samplesize) Data C (samplesize) Data G (samplesize) Data T (samplesize) That is, the difference in the data channels between SCF v1/2 and SCF v3 is the data in an SCF 1 or 2 file alternates between each base, whereas in SCF v3 all the samples for A are stored, then C, then G, then T. The data for 3.00 is also run under a pseudo-compression algorithm. I say "pseudo" because the algorithm does not make the data any smaller, it just makes it easier to compress using other programs like tar, or zip. Each value is only the difference of the difference between each sample. If you look down in the "readLane" method, you can see the code that "uncompresses" this, which is quite simple. int [] trace = new int[numberOfSamples]; int delta = 0, temp = 0; for(int i = 0; i < numberOfSamples; i++) { if(sampleSize == 1) temp = (int)inputFile.readByte(); else temp = (int)inputFile.readShort(); trace[i] = delta + temp; delta = trace[i]; } delta = 0; for(int i = 0; i < numberOfSamples; i++) { trace[i] += delta; delta = trace[i]; } If you get a negative value, it probably means that the channel you're looking at is emtpy. Take note that I haven't checked the example code above, and that it differs from the code that I actually used. One of them should work. Comments are stored in the format "VALU=STRING". They are separated by '/n's. For example, Comments = "LANE=1\nGELN=abc+ct+t102\nALeF=NULL" and so on.
Field Summary | |
static int |
ALL
|
static int |
BLUE
color channel |
static int |
GREEN
color channel |
static int |
RED
color channel |
static int |
YELLOW
color channel - note: these probably only correspond to the channels of an SCF file in an arbitrary way. |
Fields inherited from class AFLPcore.ImportFilter |
filetype, GEL, LANE |
Fields inherited from class AFLPcore.Operation |
descript, helpFile, name, options |
Constructor Summary | |
SCFFilter()
CONSTRUCTOR for new SCFFilter. |
Method Summary | |
java.lang.String |
getDescription()
Retrieves a short, approximately one sentence, description of the filter. |
int |
getFileType()
Returns the type of input file supported by this filter In this case ImportFilter.LANE , since the filter reads in lane data. |
java.lang.String |
getHelpFile()
The help file describes which files the filter reads and the options that this filter accepts. |
java.lang.String |
getName()
Access the name of the filter. |
Option[] |
getOptions()
Returns the options for this filter. |
Gel |
readGel(java.io.File inputFile)
This filter does not read gels. |
Lane[] |
readLane(java.io.File inputFile)
This is the method that is called to preform the actual reading of the file. |
void |
setOptions(Option[] opts)
Sets the parameters for the filter to the specified values. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int YELLOW
public static final int RED
public static final int BLUE
public static final int GREEN
public static final int ALL
Constructor Detail |
public SCFFilter()
Method Detail |
public java.lang.String getName()
getName
in class Operation
public int getFileType()
ImportFilter.LANE
, since the filter reads in lane data.getFileType
in class ImportFilter
public java.lang.String getDescription()
getDescription
in class Operation
public java.lang.String getHelpFile()
getHelpFile
in class Operation
public Option[] getOptions()
getOptions
in class Operation
Option
,
FeatureList
,
SizeFunction
,
SizeStandard
public void setOptions(Option[] opts)
setOptions
in class Operation
AFLPcore.Operation
opts
- the values for the options that this operation
understands.public Lane[] readLane(java.io.File inputFile) throws java.io.IOException
setOptions
, and if they are not, an exception will be
thrown.readLane
in class ImportFilter
inputFile
- The file that contains the lane data.MissingParameterError
- occurs if the options are not
set. Since this includes the required color, the filter cannot
read in the lane.java.io.IOException
- If an error is encountered in the file,
then this exception will be thrownpublic Gel readGel(java.io.File inputFile) throws java.io.IOException
readGel
in class ImportFilter
null
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |