AFLPcore
Class SCFFilter

java.lang.Object
  |
  +--AFLPcore.Operation
        |
        +--AFLPcore.ImportFilter
              |
              +--AFLPcore.SCFFilter

public class SCFFilter
extends ImportFilter

This class reads data from an SCF file, which are fairly common. Reading in the data is actually fairly complicated in that genographer is only very useful with molecular weight standards. SCF files always have four channels for data (these are usually used for A, C, G, T). However, genographer only looks at one channel at a time for data, plus one other channel for the molecular weight standards. Unlike ABI files, SCF files do not contain the peak information, so this filter must use "TracePeak Finder" in order to size the data. Basically, this filter does the following: 1) Extract necessary file information from the header. 2) Extract data from data channel. 3) Extract data from the molecular weight standards channel 4) Detect peaks in the standards channel. 5) Read in the sizing method from the standards.cfg file 6) Combine the information from the peaks with the sizing method file to calibrate the sizing function. 7) Parse through the comments for Name, Gel Name, and Lane Number 8) Return Lane object, which contains the trace data and sizing function. You've probably noticed, if you've ever actually run this beast, that this filter has a number of options that need to be set. 1) Data channel 2) Standards channel 3) Sizing Function 4) Standards Used 5) Min Peak Height These can be manipulated using getOptions() and setOptions(Option[]), which are located within this file. This class uses the FeatureListc lass to retrieve the known functions. Once the options have been set, the readLane method can be called to read the actual file. The File Format I got all my info from the "Staden Package" site, so you may want to take a look if it's still up. The address at the time of this writing is http://www.mrc-lmb.cam.ac.uk/pubseq/manual/formats_unix_toc.html There are several sections of a SCF file. The first is the header, which should be 128 bytes long. It looks a little something like this: unsigned 4-byte integer: magic number - should equal 779314022 (".scf") unsigned 4-byte integer: number of samples unsigned 4-byte integer: offset from the start of the file to the samples unsigned 4-byte integer: number of bases - we don't care unsigned 4-byte integer: bases left clip - we don't care unsigned 4-byte integer: bases right clip - we don't care unsigned 4-byte integer: bases offset - we don't care unsigned 4-byte integer: comments size in bytes unsigned 4-byte integer: offset from the start of the file to the comments 4 1-byte characters: version - should by '3' '.' '0' '0' unsigned 4-byte integer: sample size - 1=8bit samples, 2=16bit samples unsigned 4-byte integer: code_set - we don't care unsigned 4-byte integer: private size - we don't care unsigned 4-byte integer: private offset - we don't care unsigned 4-byte integer * 18: absolutely nothing Now, onto the data area, which starts at "sampleOffset". In SCF, all four data's will be present, even if some of them are empty. The structure of the entire file looks like this: Header (128 bytes) Data A (numberOfSamples * sampleSize) Data C (numberOfSamples * sampleSize) Data G (numberOfSamples * sampleSize) Data T (numberOfSamples * sampleSize) Offsets for bases (Number of bases * 4)(data in integers) Accuracy estimate for A bases (Number of bases)(data in unsigned bytes) Accuracy estimate for C bases (Number of bases)(data in unsigned bytes) Accuracy estimate for G bases (Number of bases)(data in unsigned bytes) Accuracy estimate for T bases (Number of bases)(data in unsigned bytes) Reserved (Number of bases * 3) Comments (commentsSize)(data in characters) Private data (privateSize) So, the data channel you want should be located at byte position: dataOffset + sampleSize * numberOfSamples * numberOfChannel However, with versions earlier than 3.00, the file structure is a little different. Header (128 Bytes) Sample Structures (4 * samplesize * numberOfSamples) Base Structures (12 * numberOfBases) Comments (commentSize) Private data (privateSize) A Sample Structure looks like this Data A (samplesize) Data C (samplesize) Data G (samplesize) Data T (samplesize) That is, the difference in the data channels between SCF v1/2 and SCF v3 is the data in an SCF 1 or 2 file alternates between each base, whereas in SCF v3 all the samples for A are stored, then C, then G, then T. The data for 3.00 is also run under a pseudo-compression algorithm. I say "pseudo" because the algorithm does not make the data any smaller, it just makes it easier to compress using other programs like tar, or zip. Each value is only the difference of the difference between each sample. If you look down in the "readLane" method, you can see the code that "uncompresses" this, which is quite simple. int [] trace = new int[numberOfSamples]; int delta = 0, temp = 0; for(int i = 0; i < numberOfSamples; i++) { if(sampleSize == 1) temp = (int)inputFile.readByte(); else temp = (int)inputFile.readShort(); trace[i] = delta + temp; delta = trace[i]; } delta = 0; for(int i = 0; i < numberOfSamples; i++) { trace[i] += delta; delta = trace[i]; } If you get a negative value, it probably means that the channel you're looking at is emtpy. Take note that I haven't checked the example code above, and that it differs from the code that I actually used. One of them should work. Comments are stored in the format "VALU=STRING". They are separated by '/n's. For example, Comments = "LANE=1\nGELN=abc+ct+t102\nALeF=NULL" and so on.


Field Summary
static int ALL
           
static int BLUE
          color channel
static int GREEN
          color channel
static int RED
          color channel
static int YELLOW
          color channel - note: these probably only correspond to the channels of an SCF file in an arbitrary way.
 
Fields inherited from class AFLPcore.ImportFilter
filetype, GEL, LANE
 
Fields inherited from class AFLPcore.Operation
descript, helpFile, name, options
 
Constructor Summary
SCFFilter()
          CONSTRUCTOR for new SCFFilter.
 
Method Summary
 java.lang.String getDescription()
          Retrieves a short, approximately one sentence, description of the filter.
 int getFileType()
          Returns the type of input file supported by this filter In this case ImportFilter.LANE, since the filter reads in lane data.
 java.lang.String getHelpFile()
          The help file describes which files the filter reads and the options that this filter accepts.
 java.lang.String getName()
          Access the name of the filter.
 Option[] getOptions()
          Returns the options for this filter.
 Gel readGel(java.io.File inputFile)
          This filter does not read gels.
 Lane[] readLane(java.io.File inputFile)
          This is the method that is called to preform the actual reading of the file.
 void setOptions(Option[] opts)
          Sets the parameters for the filter to the specified values.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

YELLOW

public static final int YELLOW
color channel - note: these probably only correspond to the channels of an SCF file in an arbitrary way. Maybe not.

RED

public static final int RED
color channel

BLUE

public static final int BLUE
color channel

GREEN

public static final int GREEN
color channel

ALL

public static final int ALL
Constructor Detail

SCFFilter

public SCFFilter()
CONSTRUCTOR for new SCFFilter.
Method Detail

getName

public java.lang.String getName()
Access the name of the filter.
Overrides:
getName in class Operation
Returns:
name of the import filter

getFileType

public int getFileType()
Returns the type of input file supported by this filter In this case ImportFilter.LANE, since the filter reads in lane data.
Overrides:
getFileType in class ImportFilter
Returns:
constant LANE.

getDescription

public java.lang.String getDescription()
Retrieves a short, approximately one sentence, description of the filter.
Overrides:
getDescription in class Operation
Returns:
the description

getHelpFile

public java.lang.String getHelpFile()
The help file describes which files the filter reads and the options that this filter accepts.
Overrides:
getHelpFile in class Operation
Returns:
File that contains the help information, either html or plaintext.

getOptions

public Option[] getOptions()
Returns the options for this filter. 1)Data Channel - selects one of four channels in an SCF file from which to retrieve the sample data. 2)Standards Channel - selects one of four channels in the SCF file from which to retrieve the sample data for the molecular weight standards. 3)SizingFunction - selects which algorithm will be used to determine the size of the data points based on the locations of the molecular weight samples. 4)Standards - selects which set of standards are being used. This is very important because this SCF filter must get the weights of the molecular weight standards from this standards set. Standard sets should be entered manually into the standards.cfg file. 5)Label - this just says "Peak detection parameters". Peaks are detected in the standards trace data, and then correlated with the chosen Standards Set to create a set of benchmarks to compare the data with. 6)MinPeakHeight - this determines the minimum height a point must be to be considered a peak. 7)MinPeakWidth - this determines the minimum width a peak must be.
Overrides:
getOptions in class Operation
Returns:
an array containing the options described above.
See Also:
Option, FeatureList, SizeFunction, SizeStandard

setOptions

public void setOptions(Option[] opts)
Sets the parameters for the filter to the specified values. That is, after the user clicks "OK" in the OptionsDialog window, this method is called to convert all the users selections into variables that can easily be read by the SCFFilter. This portion also saves the selected values to "SCF.def" to use as default values, should the user select that option.
Overrides:
setOptions in class Operation
Following copied from class: AFLPcore.Operation
Parameters:
opts - the values for the options that this operation understands.

readLane

public Lane[] readLane(java.io.File inputFile)
                throws java.io.IOException
This is the method that is called to preform the actual reading of the file. The data in the file represents data from a single lane. The options/parameters required for the filter should be set using setOptions, and if they are not, an exception will be thrown.
Overrides:
readLane in class ImportFilter
Parameters:
inputFile - The file that contains the lane data.
Returns:
a Lane object with all of the appropriate information.
Throws:
MissingParameterError - occurs if the options are not set. Since this includes the required color, the filter cannot read in the lane.
java.io.IOException - If an error is encountered in the file, then this exception will be thrown

readGel

public Gel readGel(java.io.File inputFile)
            throws java.io.IOException
This filter does not read gels.
Overrides:
readGel in class ImportFilter
Returns:
Always null