//===================================================================== // File: LocalSouthern.java // Class: LocalSouthern // Package: AFLPcore // // Author: James J. Benham // Date: November 26, 2000 // Contact: james_benham@hmc.edu // // Genographer v1.4 - Computer assisted scoring of gels. // Copyright (C) 1999 Montana State University // // This program is free software; you can redistribute it and/or // modify it under the terms of the GNU General Public License // as published by the Free Software Foundation; version 2 // of the License. // // This program is distributed in the hope that it will be useful, // but WITHOUT ANY WARRANTY; without even the implied warranty of // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the // GNU General Public License for more details. // // You should have received a copy of the GNU General Public License // along with this program; if not, write to the Free Software // Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. // // The GNU General Public License is distributed in the file GPL //===================================================================== package AFLPcore; import java.io.DataInputStream; import java.io.DataOutputStream; import java.io.IOException; import AFLPcore.Option; /** * This class implements the Local Southern sizing method described by * E.M. Southern (Southern, E. M. "Measurement of DNA Length by Gel * Electrophoresis." (1979) Analytical Biochemistry 100, * 319-323.) * *

The equation *

 *         c
 *  L = --------- + Lo
 *       m - Mo
 *

* is used, where L is the size of the unknown fragement, m is * the mobility (scan number) of the fragment, and c, Mo, and * Lo are constants. The constants are determined in the following * manner. Fragments of known size (from the size standard) are used as known * values for L and m. For every set of three standard points, * a curve is defined. Let the point (xi, yi) denote a standard * point, where xi is the mobility of the known fragement and * yi is the size of the known fragement. These three points give us * the following system of equations: *

 *          c
 *  y1 = ------- + Lo         (1)
 *       x1 - Mo
 *

 *          c
 *  y2 = ------- + Lo         (2)
 *       x2 - Mo
 *

 *          c
 *  y3 = ------- + Lo         (3)
 *       x3 - Mo
 *

* Solving this system of equations yields: *

 *       -(-x3 + x2) (-x3 + x1) (x1 - x2) (-y3 + y2) (y1 - y3) (-y2 + y1)
 *  c =  ----------------------------------------------------------------
 *                                                            2
 *            (-y1 x3 + y2 x3 - y2 x1 + x2 y1 - x2 y3 + x1 y3)
 *

 *       - (-y1 x1 y3 + y1 y3 x3 - y2 x2 y1 + y2 x2 y3 - y2 y3 x3 + y2 y1 x1)
 * Lo =  --------------------------------------------------------------------
 *            (-y1 x3 + y2 x3 - y2 x1 + x2 y1 - x2 y3 + x1 y3)
 *

 *         (-y2 x2 x1 + y2 x2 x3 - y1 x1 x3 + x1 y3 x3 - x2 y3 x3 + x2 y1 x1)
 * Mo =  --------------------------------------------------------------------
 *            (-y1 x3 + y2 x3 - y2 x1 + x2 y1 - x2 y3 + x1 y3)
 *

* However, a problem is encountered when all three of the standard points * used are on a line. In this case, the denominator (which is the same * for all three constants) becomes 0. When that happens, this class * abandons the equation suggested by E.M. Southern and instead uses a * line for the curve between those two points. * *

To size an unknown fragment, the following method is used. For most * fragements, the average of two curves will be used to find the size. * The first curve will be defined by two standard points below the * unknown fragement and one above. The second curve will be defined by * one standard point below and two above. (Above and below are determined * by scan number.) Remember that one (or both) of these curves can be * linear. In the case that the unknown fragement is near the beginning or * the end of the size standard ranges, only one curve is used. For example, * if the curve with only one size standard point below includes the first * size standard point, the curve with two points below does not exists. * For points smaller than the first size standard point, a line is used. * The line is defined by the first size standard point and by assuming * the size at scan number 0 is 0. For points greater than the largest * size standard point, the curve for the last three points is simply * extended. Note: This is not very accurate at all for points * outside of the defined size standards! For those points inside, * the sizing agrees with that of the ABI software and appears to be * reliable. Therefore, it is recommended that only the points * between defined size standards be used. * *

This class also allows one to convert from base pairs to scan numbers. * This answer is not unique (see below). However, in almost every * case, one of the answers is clearly out of bounds. The bounds are defined * such that if a fragements size is between two standard points, it's scan * number must also be between the two size standards. If the correct value * cannot be determined, the class will print out an error message to the * standard output and then take the smaller value. * *

The conversion back is broken down into three cases, a size * derived from two curves, a line and a curve, and two lines. * For two curves, the size s is defined as: *

 *       c1/(m - m1) + L1 + c2/(m - m2) + L2
 *  s = -------------------------------------
 *                       2
 *

* solving for m yeilds: *

 *                -c1m2 - c2m1             2
 *  m = f + sqrt(--------------  - m1m2 + f )
 *                2s - L1 - L2
 *

* or *

 *                -c1m2 - c2m1             2
 *  m = f - sqrt(--------------  - m1m2 + f )
 *                2s - L1 - L2
 *

* where *

 *       1            c1 + c2
 *  f =  - ( m1m2 + -----------)
 *       2          2s - L1 -L2
 *

* For a curve and a line, the size is determined by: *

 *        c/(m -mo) + Lo + am + b
 *  s =  --------------------------
 *                   2
 *

* solving yields: *

 *       g + sqrt(g^2 - 4a(mo*d + c))          g - sqrt(g^2 - 4a(mo*d + c))
 *  m = -----------------------------, or m = ------------------------------
 *                 2a                                    2a
 *

* where *

 *  d = 2s - Lo - b          g = amo + d
 *

* Finally, for two lines, the solution is trivial and unique. This is * because if points a, b, and c are on a line and points b, c, and d are * on a line, then a and d must be on the same line. * *

Overall, this class performs the conversions between base pairs * and scan numbers required by the program. * * @see Lane * @see SizeFunction * @see SizeMgr * @see FeatureList * * @author James J. Benham * @version 1.0.0 * @date August 10, 1998 */ public class LocalSouthern extends SizeFunction { private static double DEFAULT_MAX_SCAN = 5000; private static final int SIZE = 0; private static final int SCAN = 1; private static final int C = 2; private static final int Mo = 3; private static final int Lo = 4; private static final int LINEAR = 5; private double lsValues[][]; private int numPoints; // the number of points used for calculations private double smallM; private double smallB; private double maxScan; public LocalSouthern() { name = "Local Southern"; descript = "Size lane based on the method described by " + "E.M. Southern [1979]"; helpFile = "lsouthern.html"; numPoints = 0; maxScan = DEFAULT_MAX_SCAN; } /** * Gives the name describing this sizing function. * * @return the name */ public String getName() { return name; } /** * Gives a brief description of this sizing function. * * @return a one sentence explination. */ public String getDescription() { return descript; } /** * Gives the help file that describes the function in more detail * * @return a plaintext of html help file. */ public String getHelpFile() { return helpFile; } /** * There are no options for this sizing function. * * @return null */ public Option[] getOptions() { return null; } /** * Since there are no options, this method doesn't do much. In fact, * it does absolutely nothing, but it is required by the abstract class * Operation, which this class is a child of. */ public void setOptions(Option[] opts) { } /** * This takes a list of peaks with the location containing the size * in base pairs, and the AREA must contain the scan number of that * peak. This method must be called since it uses the data points to * calculate the constants for the different curves. See the class * description for an explination of the sizing function. * Some parts of this method are not very efficient, but since it will * not be run very often, it is not a big problem. * * @param standardPeaks a list of peaks with the area containing the * scan number and the location having the size in bp. */ public void init(DataList standardPeaks) { // remember how many points we have numPoints = standardPeaks.size(); // create the array. lsValues = new double[numPoints][6]; double x1, x2, x3; double y1, y2, y3; double denom; // find the value for the constants C, Mo, and Lo for(int i = 0; i < (numPoints - 2); i++) { // get the values x1 = ((Peak) standardPeaks.dataAt(i)).getArea(); y1 = ((Peak) standardPeaks.dataAt(i)).getLocation(); x2 = ((Peak) standardPeaks.dataAt(i + 1)).getArea(); y2 = ((Peak) standardPeaks.dataAt(i + 1)).getLocation(); x3 = ((Peak) standardPeaks.dataAt(i + 2)).getArea(); y3 = ((Peak) standardPeaks.dataAt(i + 2)).getLocation(); // Store the size and scan. Do it to i and i + 2, so we compeletely // fill the array (just i would leave the bottom unfilled and just // i+2 would leave the top empty) This will cause some values to // be overwritten, but that isn't a big deal, since they will be // overwritten with the same value. This method isn't time critical. lsValues[i][SIZE] = y1; lsValues[i][SCAN] = x1; lsValues[i + 2][SIZE] = y3; lsValues[i + 2][SCAN] = x3; // The following three values return the solution for each of the // constants using the three points. The solutions where obtained // by solving the three simultaneous equations: // y1 = C/(x1 - Mo) + Lo // y2 = C/(x2 - Mo) + Lo // y3 = C/(x3 - Mo) + Lo // Calculate the denominator here since it is used by all of the // mehtods. It will also allow us to catch divide by zero errors, // which means that the size standards are in a straight line. denom = -y1*x3 + y2*x3 - y2*x1 + x2*y1 - x2*y3 + x1*y3; if(denom != 0) { lsValues[i][Lo] = solveLo(x1, y1, x2, y2, x3, y3, denom); lsValues[i][Mo] = solveMo(x1, y1, x2, y2, x3, y3, denom); lsValues[i][C] = solveC(x1, y1, x2, y2, x3, y3, denom); lsValues[i][LINEAR] = 0; } else { // it's linear, so use, Mo as the slope and C as the y-intercept lsValues[i][Mo] = (y2 - y1)/(x2 - x1); lsValues[i][C] = y1 - lsValues[i][Mo]*x1; lsValues[i][LINEAR] = 1; } } // Make a line for the smaller sizes. Peak pk; if(standardPeaks.isEmpty()) pk = new Peak(100, 0, 100); // hack for an empty lane else pk = (Peak) standardPeaks.dataAt(0); double zeroX = 0; double zeroY = 0; smallM = (pk.getLocation() - zeroY)/(pk.getArea() - zeroX); smallB = zeroY - smallM * zeroX; /*========DEBUG= System.out.println("Size Scan Lo Mo C"); for(int i =0; i < numPoints; i++) { System.out.println(lsValues[i][SIZE] + " " + lsValues[i][SCAN] + " " + lsValues[i][Lo] + " " + lsValues[i][Mo] + " " + lsValues[i][C] + " " + lsValues[i][LINEAR]); } =DEBUG=======*/ } /** * Sets the maximum scan number to the specified value. This is useful * to the class since it cannot always tell which possible answer * for a scan number is correct. * * @param max the largest possible scan number that should be returned * by getScan(double). * * @see LocalSouthern */ public void setMaxScan(int max) { maxScan = (double) max; } /** * Gives the size in bp from the scan number, using the method described * in the class description. This method will probably get a lot of use, * so it needs to execute as quickly as possible. * * @param scan the scan number * * @return the size in bp. */ public final double getSize(int scan) { int index = 0; int lowerIndex = 0; // index to the group of points with 1 above and 2 // below the specified scan number. int upperIndex = 0; // index to the group of points with 2 above and 1 // below the specified scan number. // find the location of the scan number while( (index < numPoints) && (scan > lsValues[index][SCAN]) ) index++; // figure out what to do with the index, watching for cases where // we are close to the end, or off the end entirely switch(index) { case 0: return smallM*scan + smallB; // break; // Not needed because of return case 1: // We are at the top, so use only the first group of three with 2 // above and one below since the group with 1 above and 2 below // (less than) does not exist. lowerIndex = 0; upperIndex = 0; break; default: // set for the normal cases where we are in the middle lowerIndex = index - 2; upperIndex = index - 1; // look for ones that are on the very top (large) edge // if so, move the upper one so that we don't fall of the end of // the array. Remember that the index is the top of a group of THREE if(index == (numPoints -1)) upperIndex = index - 2; // watch out for points beyond the last standard point if(index == numPoints) { // just use the last group for now upperIndex = numPoints - 3; lowerIndex = numPoints - 3; } } // Now calculate the two values from lowerIndex and upperIndex, and // then return the average. // Use the formula L = c/(m-mo) +Lo, where L is the size and m is // the scan. Except, sometimes it's actually linear, so check first. // if it is, just use L = m*mo + c double valueL = 0; if(lsValues[lowerIndex][LINEAR] == 0) { valueL = (lsValues[lowerIndex][C]/(scan - lsValues[lowerIndex][Mo]) + lsValues[lowerIndex][Lo]); } else { valueL = lsValues[lowerIndex][Mo]*scan + lsValues[lowerIndex][C]; } double valueU = 0; if(lsValues[upperIndex][LINEAR] == 0) { valueU = (lsValues[upperIndex][C]/(scan - lsValues[upperIndex][Mo]) + lsValues[upperIndex][Lo]); } else { valueU = lsValues[upperIndex][Mo]*scan + lsValues[upperIndex][C]; } //DEBUG: System.out.println("Convert: scan: " + scan + " to " + //DEBUG: ((valueL +valueU)/2) + "bp"); if(( (valueL + valueU)/2) < 0.0) { System.out.println("Error on size, negative value for index " + scan); } return( (valueL + valueU)/2 ); } /** * Gives the scan number that will produce the specified size. This method * will perform the inverse operation compared to getSize. * Remember that due to rounding errors, these two methods may not be * exact inverses. Also, it is likely that this method will be called * frequently, so it should be as fast as possible. The * setMaxScan should be used to assist this method. It will * tell the method what the largest possible value is, which will help * it determine which possible scan number is the correct one. * * @param size the size in bp * * @return the scan number * * @exception ScanCalcError occurs when the method cannot determine * the correct scan number. Since the size is determined by an * average of two values, the scan number that produces it may * not be unique. Most of the time one can be discarded, but if not * this exception will occur. The class description contains more * info. Currently, this exception is not thrown and output is * dumped to standard out instead. The smaller of the two values * is taken so that the program can continue. * * @see LocalSouthern */ public final int getScan(double size) { int index = 0; int lowerIndex = 0; // index to the group of points with 1 above and 2 // below the specified scan number. int upperIndex = 0; // index to the group of points with 2 above and 1 // below the specified scan number. double lower; // bounds for the scan number. double upper; // find the location of the scan number while( (index < numPoints) && (size > lsValues[index][SIZE]) ) index++; // figure out what to do with the index, watching for cases where // we are close to the end, or off the end entirely switch(index) { case 0: //DEBUG: System.err.println("Size too small. size: " + size); //DEBUG: System.out.println("Convert: " + size + "bp to " + //DEBUG: ((size-smallB)/smallM)); return (int) Math.round((size - smallB)/smallM); // break; not needed because of return case 1: // We are at the top, so use only the first group of three with 2 // above and one below since the group with 1 above and 2 below // (less than) does not exist. lowerIndex = 0; upperIndex = 0; // set the bounds lower = lsValues[0][SCAN]; upper = lsValues[1][SCAN]; break; default: // look for ones that are on the very top (large) edge // if so, move the upper one so that we don't fall of the end of // the array. Remember that the index is the top of a group of THREE if(index == (numPoints -1)) { lowerIndex = index - 2; upperIndex = index - 2; lower = lsValues[index -1][SCAN]; upper = lsValues[index][SCAN]; } else if(index == numPoints) { // watch out for points beyond the last standard point //DEBUG: System.err.println("Size too large."); // just use the last group for now upperIndex = numPoints - 3; lowerIndex = numPoints - 3; lower = lsValues[index -1][SCAN]; upper = maxScan; } else { // set for the normal cases where we are in the middle lowerIndex = index - 2; upperIndex = index - 1; lower = lsValues[upperIndex][SCAN]; upper = lsValues[index][SCAN]; } } // Now that we have the constants, try to find the scan. See the class // description for the derivation of the formula, but note that it // does not provide a UNIQUE answer, so we'll have to guess at which // value is correct. For example, it had better be positive. // switch on the linearity of the two functions, make a number // from the two by 2*lower + upper. This means that the binary will // be the values of the two. EX both linear 11, lower linear: 10, upper // linear 01, neither 00. double a; // values used in the calculations double f; double sqrtValue; double denom; double scan1 = 0; double scan2 = 0; int switchValue = (int) (2*lsValues[lowerIndex][LINEAR] + lsValues[upperIndex][LINEAR]); switch(switchValue) { case 0: // 00 in binary, neither is linear denom = 2*size - lsValues[lowerIndex][Lo] - lsValues[upperIndex][Lo]; f = 0.5*(lsValues[lowerIndex][Mo] + lsValues[upperIndex][Mo] + (lsValues[lowerIndex][C] + lsValues[upperIndex][C])/denom); sqrtValue = Math.sqrt((-lsValues[lowerIndex][C]* lsValues[upperIndex][Mo] - lsValues[upperIndex][C]* lsValues[lowerIndex][Mo])/denom - lsValues[lowerIndex][Mo]* lsValues[upperIndex][Mo] + f*f); scan1 = f + sqrtValue; scan2 = f - sqrtValue; break; case 1: // 01 in binary, upper one is linear // see the class description for the formula a = 2*size - lsValues[lowerIndex][Lo] - lsValues[upperIndex][C]; f = lsValues[upperIndex][Mo]*lsValues[lowerIndex][Mo] + a; sqrtValue = Math.sqrt(f*f - 4*lsValues[upperIndex][Mo]* (lsValues[lowerIndex][Mo]*a + lsValues[lowerIndex][C])); scan1 = (f + sqrtValue)/(2*lsValues[upperIndex][Mo]); scan2 = (f - sqrtValue)/(2*lsValues[upperIndex][Mo]); break; case 2: // 10 in binary, lower index is linear // see the class description for the formula a = 2*size - lsValues[upperIndex][Lo] - lsValues[lowerIndex][C]; f = lsValues[lowerIndex][Mo]*lsValues[upperIndex][Mo] + a; sqrtValue = Math.sqrt(f*f - 4*lsValues[lowerIndex][Mo]* (lsValues[upperIndex][Mo]*a + lsValues[upperIndex][C])); scan1 = (f + sqrtValue)/(2*lsValues[lowerIndex][Mo]); scan2 = (f - sqrtValue)/(2*lsValues[lowerIndex][Mo]); break; case 3: // 11 in binary, both are linear // This case is really cool because it actually has a unique solution // To bad it almost never happens. // scan = (2*size - C1 - C2)/(m1 + m2); scan1 = ((2*size - lsValues[lowerIndex][C] - lsValues[upperIndex][C])/ (lsValues[lowerIndex][Mo] + lsValues[upperIndex][Mo])); // we have what we need, and we don't need to check to see which // one is real, so just return. return (int) Math.round(scan1); // break; // not needed because of return, or at all really } // Define an epsilon to get around some nasty floating point stuff // Sometimes things should be equal (ie scan1 == upper), but fp // screws up. double epsilon = 0.000000001; // Adjust the values by a small amount upper += epsilon; lower -= epsilon; // See if we can find one that is outside the valid regions and one // that is inside. if( (scan1 < lower) || (scan1 > upper)) { // one of them must be correct, and it isn't scan1, so it must // be scan 2 //DEBUG: System.out.println("Convert: " +size+"bp to " + //DEBUG: (int)Math.round(scan2)); //if(scan2 < 0.0) System.out.println("Error: returing negative size."); return (int) Math.round(scan2); } else if( (scan2 < lower) || (scan2 > upper)) { // Not scan2, so it must be scan1 //DEBUG: System.out.println("Convert: " +size+"bp to " + //DEBUG: (int)Math.round(scan1)); //if(scan1 < 0.0) System.out.println("Error: returing negative size."); return (int) Math.round(scan1); } else { // Just take the smaller one, but complain first System.out.println("WARNING: Local Southern could not determine if " + scan1 + " or " + scan2 + " was the correct " + "scan number for " + size + "bp. Using smaller" + " value."); if(scan1 < scan2) return (int) Math.round(scan1); else return (int) Math.round(scan2); // They are both valid. We're screwed, so throw an exception. //throw new ScanCalcError("Could not determine if " + scan1 + // " or " + scan2 + " was the correct " // + "scan number for " + size + "bp."); } } /** * Gives the value of Lo, based on the three points. Lo, is calculated * from a formula obtained by solving the following three simultaneous * equations: *

   *   y1 = C/(x1 - Mo) + Lo  (1)
   *   y2 = C/(x2 - Mo) + Lo  (2)
   *   y3 = C/(x3 - Mo) + Lo  (3)
   *

   *   y1 = C/(x1 - Mo) + Lo  (1)
   *   y2 = C/(x2 - Mo) + Lo  (2)
   *   y3 = C/(x3 - Mo) + Lo  (3)
   *

   *   y1 = C/(x1 - Mo) + Lo  (1)
   *   y2 = C/(x2 - Mo) + Lo  (2)
   *   y3 = C/(x3 - Mo) + Lo  (3)
   *

* * @param x1 the x-value of the first point * @param y1 the y-value of the first point * @param x2 the x-value of the second point * @param y2 the y-value of the second point * @param x3 the x-value of the third point * @param y3 the y-value of the third point * @param denom a value that is common to this method as well as * solveMo and solveLo. It is equal to * -y1*x3 + y2*x3 - y2*x1 + x2*y1 - x2*y3 + x1*y3 * * @return the value of C */ private double solveC(double x1, double y1, double x2, double y2, double x3, double y3, double denom) { double ans; ans = (x3 - x2)*(-x3 + x1)*(x1 - x2)*(-y3 + y2)*(y1 - y3)*(-y2 + y1)/ (denom*denom); return ans; } /** * Reads in the properties of this class from the specified input stream. * The stream data should have been created by write. This * will retrieve this classes state from the input stream. All current * data in the class will be replaced. * * @param in the input stream with the data for the class. * * @exception IOException occurs when a problem is encountered when * reading from the stream. */ public void read(DataInputStream in) throws IOException { maxScan = in.readDouble(); smallM = in.readDouble(); smallB = in.readDouble(); numPoints = in.readInt(); lsValues = new double[numPoints][6]; // read in the array for(int i=0; i < numPoints; i++) { for(int j=0; j < (LINEAR + 1); j++) lsValues[i][j] = in.readDouble(); } } /** * Writes all of the information this class needs to store in order * to be recreated. This will be used for things like storing the * class in a file. Therefore, the write should output enough information * so that read can recreate the essential properties of this * class. * * @param out the destination to write the data to. * * @exception IOException occurs when a problem is encountered when * writing to the stream. */ public void write(DataOutputStream out) throws IOException { // write out the values for some of the single value constants out.writeDouble(maxScan); out.writeDouble(smallM); out.writeDouble(smallB); out.writeInt(numPoints); // write out the array with all of the values for(int i=0; i < numPoints; i++) { for(int j=0; j < (LINEAR + 1); j++) out.writeDouble(lsValues[i][j]); } } }