MAPPING GENES TO CHROMOSOMES
Understand how the term "allele" can be applied to mutations at
the DNA level, including silent mutations, missense mutations,
nonsense mutations, insertions and deletions.
Understand why most mutations in eukaryotes are selectively
Understand the methodology behind RFLPs.
Understand how molecular markers such as RFLPs segregate
according to the laws of Mendel. Be able to calculate linkage
distance between two loci in a two point test cross.
Understand why genetic linkage cannot be directly calculated
from progeny ratios in 2 point crosses.
I. MAPPING A PHENOTYPE
Why do we still need genetic mapping? Can't we
just sequence the genome?
No matter how sophisticated and rapid
DNA sequencing and other genomics technologies become, there is
still no getting around the fact that what we're really
interested in, in practical terms, is phenotypes. We want to
somehow get at traits that are responsible for diseases in
humans, or for agronomically useful characteristcs in crop
plants and livestock. Examples of the latter would include
disease resistance, yeild, protein quality, oil quality, or
uniform height or flowering time. Many of these traits are
quantitative traits, governed by multiple loci.
In many cases, the only thing we know
at the beginning is the phenotype.
Consequently, Mendelian genetics is
more relevant than ever, as the primary way to go
from a trait we can see or measure, to a specific
chromosomal locus and gene product.
The field of genetics was built upon the mapping of individual
genes to nearest neighbors using 3 point crosses. Until
molecular markers, there were no specific projects to map entire
Genetics will never be the same:
- greater precision (< 0.5cM)
- fewer crosses
- use of F2 data
- fewer breeding generations
- more reliance on maps, mapping kits
- "everything has already been mapped"
A. A molecular definition of "allele"
Modern genetic mapping using molecular markers still relies upon
the basic principles of Mendellian genetics. We observe genetic
segregation based on the segregation of alleles in genetic
crosses. But for molecular markers, we need a broader molecular
definition of what an allele is.
What is an allele?
Can be defined as:
Do silent mutations count as alleles?
- variant of a phenotype
- variant of a DNA sequence
Do they give us different phenotypes?
An allele defined by molecular
means should have exactly the same genetic properties as a
Infinite alleles model
1. Mutations within protein coding sequences
define molecular allelism at many levels and degrees of
- Identical sequences - the
baseline against which mutations are measured
- Silent mutation - mutation in
"degenerate" position in a codon, so that no amino acid
replacement occurs (see Genetic Code below)
- Missense mutation - mutation
leading to an amino acid substitution. Many, and perhaps
most missense mutations have little or no effect on the
function of the protein. Evidence for this comes from the
fact that comparison of amino acid sequences from
homologous proteins from many species often shows that
some regions of a protein are highly conserved, while
other regions appear to tolerate a great deal of amino
- Nonsense mutation - mutation
which converts a codon into a stop codon. Occurrence of a
stop codon within a protein coding sequence results in a
- Insertions and deletions within a protein coding
sequence can interrupt protein function. As well,
insertions or deletions whose length is not a multiple of
3 will cause a frameshift mutation, meaning that incorrect
amino acids will be inserted downstream from the mutation.
Rearrangements in a protein coding sequence will probably
result in loss of function.
of protein sequences during plant evolution.
acids from the N-terminal region of thionin proteins from
several plant species are shown. Where amino acids have
been inserted or deleted during the course of evolution,
gap (-) characters have been inserted to optimize the
alignment of homologous positions within the proteins.
-- but, mutations within genes are only a
small percentage of the total number of mutations
occurring in the genome!
2. Because only a small percentage of eukaryotic
genomes codes for proteins, most mutations will occur
within non-coding DNA, and will usualy be selectively
In most eukaryotic genomes, the
chromosome is a sea of non-coding DNA punctuated by genes.
This is illustrated in an approximately 72 kb region taken
from Arabidopsis thaliana
to go to the Genome Data Viewer at NCBI
genes are composed of both exons and introns. In comparison to
exons, introns appear to mutate very rapidly, suggesting that
there is little selective pressure against intronic mutations.
In many higher eukaryotes, especially in animals, introns may
be very long compred to exons. (Note that this appears not to
be the case in plants, in which introns tend to be short.).
i) Molecular alleles should segregate by the
same Mendelian Principles as phenotypic alleles.
Because most mutations are selectively neutral and can occur
anywhere in the genome, we can use molecular alleles as
markers for chromosome mapping.
ii) In most cases, molecular alleles are
iii) All of the rules of population genetics
apply to molecular alleles.
iv) Practically speaking, there are an infinite
number of possible molecular alleles. A given phenotypic
allele observed in the population could, in principle,
consist of a set of different molecularly-defined alleles.
B. RFLP's (Restriction Fragment Polymorphisms)
are molecular alleles
1. Restriction sites can be assayed for
polymorphism within a population by Southern hybridization.
Assuming that individuals 1 & 2 are homozygous at the locus
detected by these probes:
2. Restriction sites can reveal polymorphism
DNA sequences are mutating all the
time. As two populations diverge over time, they each accumulate
different mutations at different sites. When restriction sites
in the vicinity of a given gene are compared from one genotype
to another, one genotype may have the site, and the other will
not. This is referred to as a polymorphism. High
polymorphism between two genotypes is evidence of genetic
divergence. Mike Freeling and coworkers have compared
restriction sites within the maize Alcohol dehydrogenase 1 gene
in several maize lines. They found that although the restriction
maps are identical within the Adh1 gene for all
varieties tested, very few of the restriction sites outside
of the coding region are conserved.
Johns, M. A.,
Strommer, J.N. and Freeling, M. (1983) Exceptionally high
levels of restriction site polymorphism in DNA near the
maize Adh1 gene. Genetics 105:733-743.
|Maize genomic DNA was digested usng HindIII
and the bands analyzed by Southern blot hybridization.
DNA from seven maize lines was compared. The blot was
hybridized using the pZmL84 probe, whose location on
with respect to the AdhI gene is shown in the figure
below. Note that this probe overlaps the 3' half of the
Since the HindIII site is internal to the probe region,
the probe detects two bands in all digests. In all maize
lines, a conserved 2.5 kb fragment is seen, as well as a
second band. The size of the second band is determined
by how far away the next HindIII site is, downstream
from the conserved HindIII site. Because numerous
mutations have accumulated in the downstream region, in
the different maize lines over time, the location of the
downstream HindIII site differs between lines.
A restriction map of the
corresponding region of the adh1 locus is shown below.
i) in all lines the 5'
fragment detected by the pZmL84 probe is 2.5kb. The map shows
that the two Hind3 sites that define this fragment are
ii) The 3' fragment is different in
each line, because the next distal Hind3 site occurrs at a
different location in each line.
maps of the seven Adh 1 chromosomal regions. The
boundaries of the transcription units are denoted by the
vertical dashed lines, and the region that hybridized with
the ADH1-cDNA probe, pZML84, is also shown.
Now that hundreds of thousands of
genes have been sequenced from thousands of species, we can
generalize that protein coding regions are typically slow to
diverge, because there is usually selective pressure to retain
a given amino acid sequence. Outside of the protein coding
region, and in non-coding DNA flanking genes, there is little
selection against random mutations. Hence mutations are
allowed to accumulate quickly in non-coding regions.
3. RFLPs behave like any other Mendelian
Each band seen in a Southern blot
indicates the presence of one or more restriction sites in a
sequence. The sequence containing a restriction site is one
allele, while the corresponding sequence missing the restriction
site is the other allele. The "phenotypes" of these alleles are
the differences in banding patterns, due to presence or absence
of bands. The following exercises are designed to illustrate
that the rules of Mendelian genetics apply for RFLPs exactly
they do for any convential genetic trait.
EXERCISE 1 -
Segregation in the F1 generation
EXERCISE 2 -
Measuring Recombination in a Test Cross
EXERCISE 3 - F2