October
25, 2018
MAPPING WITH
MOLECULAR MARKERS
Learning Checklist:
1. Be able to identify when two marker bands
cosegregate on a gel.
2. Be able to explain why phenotypic ratios in a dihybrid
cross (eg. AB, Ab, aB, ab) can not be used directly to
calculate linkage distance in F2 data.
3. Understand the basic strategy of mapping with molecular
markers
4. Understand the distinction between genetic distance in
cM and physical distance in Mb.
5. Understand how microsatellite markers work
6. Be able to work with the ClarkCarbon equation that
tells how many markers must be screened to find at least
one marker linked to a phenotypic trait.
7. Understand the concept of a mapping kit.
C. Segregation in a twopoint cross
In most cases, both loci have bands
from one parent or or the other, or both loci are heterozygous.
These are parental genotypes. When one locus is
heterozygous and the other is homozygous, they must be recombinant.
It is important to note that the
%recombinants in F_{2} analysis doesn't give you the
recombination frequency. Consider the case in which two
recombinant gametes join to form a zygote:
A B A B  X    a b a b  v A B a B     a b A b parental recombinant

These progeny would be
heterozygous at both loci, which might be
naively scored as nonrecombinant.
This is why geneticists have
traditionally gone to great extremes to construct
testerstocks for mapping by testcrosses. However, that is not
always possible to do, especially if you want to map hundreds
or thousands of markers to construct a genomic map.
Consequently, you have use analytical methods appropriate for
F_{2} data.
D. Calculating Linkage using molecular markers
Lander, E. S. et. al (1987) MAPMAKER: An
Interactive Computer Package for Constructing Primary Genetic
Linkage Maps of Experimantal and Natural Populations. Genomics
1:174181.
Lander, E.S. and Green, P. (1987)
Construction of multilocus linkage maps in humans. Proc.
Natl.
Acad.
Sci. USA 84:23632387.
To get an idea about how molecular
marker data can be used to build linkage maps, let's examine
some molecular markers from TAIR  The Arabidopsis
Information Resource (http://www.arabidopsis.org) at
Stanford University. One of the RFLP probes documented in AAtDB
is the RFLP probe g4539, which can be examined by clicking here.
(Note: In the current map, g4539 has been converted into a PCR
marker.)
The Southern blot for this probe,
shown at right, indicate two RFLP alleles, 2L and 1C, found in
ecotypes Landsberg erecta and Columbia, respectively. Note
that while several bands are detected with this probe, only two
are polymorphic between these two parents: 2L and 1C.
1. Inspection of mapping data illustrates
the principle of cosegregation of closelylinked markers.
If you take marker g4539 as an
example, you will see that neighboring markers most
closelylinked to it also are most alike in the phenotypic
scores, from plant to plant. The farther you go in either
direction from g4539, the more different will be the scores.
Neighboring loci in any region will always have the most similar
segregation patterns. This is nothing new, it is simply a restatement of the idea of
cosegregation of closelylinked loci. By
definition, the more closelylinked two loci are, the more they
will cosegregate.
F2_Locus Data from Goodman Lab
Mapping Population 4
Ecotype Columbia X Ecotype Landsberg erecta (143 plants)
Data listed in order of map position.
 segregating progeny >
marker/ map posn. g6844 HHAAAAABHHBAAAHBHHHHABHHHABBAHHBHAHHBAAHHAHHBBHHAHHAHHBHHHAAAHHHBBHHHHAHHBAAAAHHABHHHHBAHBABHBHAAHBHBHAHHBAHBAHHHAAHH g3843 HHAAAAABHHBAAAHBHHHHABHHHAHBAHHBHAHHBAABAAHHBBHHAAHHHAHHBHHBAAAHHHBBHAAHAHHHBAAAAAHABHHHHBAHBABHBHAAHBHBHAHHBHHBAHHBAA g2616 HHAAHHHBHHBAAAHBHHHABHHHHHHHBBHBHHAHHHHHHHHHAHHBHAAAAHABHHAABAHBABABHAAHHABHAHHBHABAHHBAHAH m210 HHAHHBHHHHHAAAHHBHHHAHHAHAHHHAABHHAHBHABAAAAHBHAAHHHHAHBHHBBHAHHHAHHAHHBBHHBHHHHAHHBHHAHBAHHABHABAHBHAHHHHHHABAABHH g6837 HHAABHHAHBHHBAAAHHHAAHHBHHHHAHHBHHAHBAHBABHABAHBHAHHHHHHABAHHAHHBA g10086 AHHHAAHHHAHBHHBAHAHAHHHAHHBHHHHAHHBHHAHBAHBABHABAHBHAHHHHAHHBHHAH g4564a HAAHHBHHHHHAAAHHBHHHAHHAHAHHHAAHHHAHBHHBAHAHAHHHAAHHHHA HBHHBBHAHHHAAHAHHBBHHBHHHHABHBHHAHBAHBABHABAHBHAHHHHAHHABAHABHH g3845 HAHHHBHHHAHAAAAHBHHAHBAHAHHHAAHHHAHBHHHAHAHAHAAHAHHHHAHBHHBBHAHHHAAHAHHBBHHHHBHABHBHHAHBABBHBHABAHBHAHHHHAHHABHHABHH g4539 AHHHAAHHHAHBHHHAHAHAHAAHAHHHAHBHHBBHAHHHAAHAHHBBHHHBHHABHBHHAHBABBHHHABAHBHAHHHHAHHABHHABHHAH m557 HAHHHBHHHAHAAAAHBHHAAHBAHAHHHHAHHBAHBHHHAHAHAHAAHAHHH AHBHHBBHAHAHAAHAHHBBHHH BHHABHBBBAHHABBHHHABAHBHAHHHHAHHAHHHABHHAH g3883 HAHHHBHHHAHAAAAHBHHHAHBAHAHHHHAHHBAHBHHHAHAHAHAAHAHHHHAHBHHBBHAHAHAAHAHHBBHHHHBHHABHBBBAHHABBHBHABAHBHAHHHHAHHAHHHABHH g19833 HAHAHBHHAHAAAAHBHHAHBAHHHHAHHHABAHHAHAHHAHAHHHAHBHHBBHAHAHAAHAHHBBHHHBHHABHBBBAHBABHBHABAHBHHHHHHHAHHABHHAHHA g19838 HAHHAHBHHAHAHAHAAAHHHAHBHHBHAHAHAAHHBBHHHHBHHABHBBBABABHBHABAHBHAHHHHAAHHHBHHH m272 HAHAHBHHHAHAAHBHAHHBAAHBHHAHHBAAHHHHABAHAAAHAHHAAHBHHHBHHHAHAAHAHHBBHHBBHHABHHHHAHHHBBHBHABAHBHAHHHHHHHAHAHHHBBHHA g4513 HAHAHBHHHAHAAAAHBHAHHHBAAAHBHHAHBBAAHHHHABAHAAAAHAAHHAAHBHHHBHHHAHAAHAHHBBHHHBBHHABHHHHAHHHBBHBHABAHBHAHHHHHHHAAHAAHHH
A,B: homozygotes for alternative alleles H: heterozygote : ambiguous result

2. Calculation of linkage for multiple loci
is done by iterative application of maximum liklihood
methods.
a) Make a best guess as to the order of
markers on the map.
The more frequently two loci cosegregate, the closer they are
likely to be on the map. By doing pairwise comparisons for all
possible pairs of loci, the order of loci on the map can be
guessed at based on which markers tend to cosegregate with each
other.
b) Determining the spacing between each
pair of adjacent loci.
For any possible order and spacing of loci, it is possible to
calculate the prior probability that "the given map
would exactly give rise to the
observed data". Note that the likelihood is necessarily
very small, because it is the probability that each meiosis under study
would come out exactly the same if the experiment were repeated.
Thus, likelihoods are useful only for comparative purposes. For
example, if an alternative map had a 1000fold lower chance of
giving rise to the data, one might choose to reject it."
Figure 1. Example of multipoint linkage analysis using EM
algorithm, showing convergence to maximumlikelihood genetic map
for 16 RFLPs on human chromosome 7, studied in CEPH families
(see text). The initial assumption of 5% recombination between
consecutive RFLPscorresponded to a log_{10} likelihood
351.45. After 12 iterations, the recombination fractions
converged to a map that was about 10^{ }^{48} times more likely to have produced the observed data.
The analysis used the first genetic reconstruction algorithm
discussed in the text, involving only genotypeknown data, and
it required ~9 sec on an HP9000 minicomputer. Analysis of the
full data set, using the hidden Markovchain reconstruction
algorithm, required ~4 min and did not alter the recombination
fractions significantly.
In Fig. 1, we see that MAPMAKER
begins considering a particluar map in the following way:
Arbitrarily space all loci under consideration at a distance of
5cM. Calulate the likelihood of seeing the observed
data, given the current map. Next, try another guess for map
distances between markers. Calculate the liklihood of seeing the
data, given the new guess. If the new
guess has a higher liklihood, choose it as the current working
map. As the iterations progress, the changes to the map get
smaller and smaller. Repeat the process until the difference in
log liklihood between the current map and the previous map is
less than some threshold (eg. 0.1).
In this way, the program quickly
converges on the spacing that is most consistent with the data