Ancestry
Tracing With mtDNA
Organizational Diagram of Circular Mitochondrial DNA.
Note: every graphic in this paper is sourced from http://www.genebase.com/index.php
maintained by Genetrack
Lou Sheehan, Biology
215
July 28, 2008
Professor Dee M.
Walter
ABSTRACT
This paper discusses the testing for the mtDNA
haplogroup of one individual. The
tests were of Hypervariable Region I, Hypervariable Region II and of the Coding
Region via an SNP Backbone Test.
The paper includes supporting discussions of the identified haplogroup –
I – and the nature of mitochondrial DNA.
INTRODUCTION
DNA provides a record of our biological past. The translation of this – if one will –
history book advances daily. One
chapter in this book details our lineage.
Earlier in 2008 the author’s
mitochondrial DNA (henceforth as: “mtDNA”) was tested.[1]
The Tests involved
(i)
Hypervariable Regions
I,
(ii)
Hypervariable Region
II, and an
(iii)
mtDNA Single
Nucleotide Polymorphism Marker (“SNP”) Backbone Test.
All testing was performed by
Genetrack.
This paper will generally discuss
the above-mentioned three tests.
The results showed the author’s mtDNA haplogroup to be “I”. There are 8[2]
major haplogroups in Europe: H (47%), J (17%), U (11%), T (9%), K (6%), X (6%),
V (5%), and I (2%), all of which
are descendents of the haplogroup N. Each haplogroup is associated with a
different ancestral lineage. (Because of its small size, sometimes haplogroup I
is not considered to be a major European group.) (Fitzpatrick &
Yeiser, 2005).
DISCUSSION
Haplogroup
I
About 2.2% of Europeans are
descendants of mtDNA haplogroup I.
Haplogroup I is one of the oldest groups to inhabit Western Europe
arriving there during the Upper Paleolithic period –-about 30,000 to 40, 000
years ago -- and today is referred to as the Aurignacian culture; most others
arrived in Europe tens of thousands of years later as the Ice Age was in
retreat.[3] The
Aurignacians were the earliest known Cro-Magnon culture. This culture was
marked by certain tools and a pronounced artistic tradition. Aurignacian culture was preceeeded by
Mousterian culture, was contemporary with Perigordian culture, and was
succeeded by the Solutrean culture.
The Aurignacian culture was marked
by a great diversification and specialization of tools, including the invention
of the “burin” – an engraving tool --
that made much of the art possible. The Aurignacian differs from other Upper Paleolithic
cultures mainly in a preponderance of stone flake tools rather than blades.
Flakes were retouched to make nosed scrapers, ridged scrapers, and end
scrapers. Bones and antlers were made into points and awls by splitting,
sawing, and smoothing.
Aurignacian art is said to
represent the first complete tradition in the history of art. Cave art was
produced almost exclusively in Western Europe, where, by the end of the
Aurignacian Period, hundreds of paintings, engravings, and reliefs had been
executed on the walls, the ceilings, and sometimes the floors of limestone
caves. Probably the first paintings were stencilings outlined in color of
actual hands held against the cave walls. The stencilings were followed by the
development of figural painting. A characteristic feature of these early
pictures, which persisted throughout the Aurignacian period, is their “twisted
perspective,” which shows, for example, the head of the animal in profile and
its horns twisted to a front view. Classic examples of Aurignacian art are the
paintings of animals, such as horses and bulls, on the walls and ceilings of
the cave at Lascaux in southwestern France. These figures were painted in vivid polychrome red,
yellow, brown, and black, with solid and closed outlines.
The earliest examples of small,
portable art objects produced during this period consist of pebbles with very
simple engravings of animal forms. Subsequently, animal figures were carved in
pieces of bone and ivory. In the later part of the Aurignacian Period the
carvings show increased naturalism with foreshortening and shading with
cross-hatched lines. (Aurignacian culture. (2008). In Encyclopædia Britannica.
Retrieved July 23, 2008, from Encyclopædia Britannica Online: http://www.britannica.com/EBchecked/topic/43368/Aurignacian-culture;
and
Time and Space The Archaeological
Context. Retrieved July 23, 2008
from:
Mitochondrial
DNA
Organizational Display of Regions of MtDNA.
It is speculated that mitochondria
were originally bacteria and were “engulfed” by and developed a symbiotic
relationship with other life forms living in their cellular cytoplasm. Mitochondria provide their hosts with
ATP in exchange for protection, etc.
Significantly, mitochondria have their own DNA i.e., their DNA is
distinct from the DNA (henceforth as nDNA for nuclear DNA) of their hosts.
Mutations in mtDNA are distinct from changes to the nDNA. [4]
(Hart, 2002).
MtDNA mutates at a very slow rate
-- over tens of generations
-- and thus can be used to
glimpse the past.
MtDNA is understood to be inherited only from the
mother. At conception, essentially
only the male sperm’s nucleus enters the female egg. In the case wherein male mtDNA enters a female egg, human
eggs are estimated to have as many as 100,000 mtDNA molecules which would
result in an extreme dilution. Yet
more, the embryo’s cellular machinery is able to identify any male mtDNA that
has entered the egg and destroy it. Thus, mothers pass along their mtDNA to
their children and share similar mtDNA with their siblings and maternal
relatives and, as such, barring a contemporaneous mutation, an individual’s mtDNA
is not unique to him or her.[5]
(Gibson & Muse 2002).
When remains are old or badly degraded it is often difficult
or impossible to extract nDNA .
However, mtDNA is present in much higher numbers than nDNA and so some
of it is more likely to remain testable.
In short, while nDNA contains much more information, there are only two
copies (one paternal, one maternal) of it in each cell, while mtDNA has a
smaller amount of useful information but typically hundreds of copies of that
information in each cell; the use of mtDNA in this context simply reflects its
increased odds of not being destroyed.[6],[7]
(Goodwin & Linacre & Hadi, 2007).
MtDNA was first sequenced in 1981 from a European placenta
which effort is referred to as either the “Cambridge Reference Sequence” or as
the “Anderson Sequence.” [8]
On average, there are 4-5 copies of mtDNA per mitochondrion
with a range of about 1 – 15 copies.
Each cell typically contains hundreds of mitochondrion, with an
estimated average of about 500 per cell. (Gibson & Muse, 2002).
MtDNA exists as a circular structure. Typically, mtDNA has 16,569 [9] base pairs (some people have
mutations).
MtDNA base pairs are counted clockwise.
Of these, 1,122 base pairs contain the origin of replication
but do not control for any other gene products. 15,447 of the 16,569 coding base pairs
Relative size of the D-Loop (HV1 & HV2) vis-à-vis the
Coding Region.
code for proteins; there are no introns and only a few
noncoding base pairs. (Gibson & Muse, 2002).
MtDNA has 37 genes that code for the products used in the
oxidative phosphorylation process, a.k.a. cellular energy production; 13 of
these code for proteins, 22 for transfer RNAs (tRNA) and 2 for ribosomal RNAs
(rRNA). (Gibson & Muse, 2002).
The two mtDNA strands are designated as the “H” (heavy) and
“L” (light) strands. The H strand
has a greater number of guanine nucleotides (the heaviest of the four different
nucleotides) than does the L strand.
Replication begins in a non-coding region of the H strand (the
displacement loop/D-loop). [10]
Close up of the Hypervariable Regions.
The H strand encodes for 28 gene products and the L strand
transcribes 8 tRNAs and an enzyme known as “ND6.” (Gibson & Muse, 2002).
Because the D-loop region does not code for any functional
cell processes, there are significantly more polymorphisms between individuals
in this region. Due to the larger
number of polymorphisms in this region, lineage tracing typically focuses on
two coding regions in the D-loop known as HV1 [11]and
HV2 (H is for “hypervariable”). (Gibson & Muse, 2002).
MtDNA has fewer repair mechanisms than nDNA and, not
surprisingly, mtDNA has higher rates of mutation than does nDNA. Even more, mtDNA lacks proof-reading
mechanisms increasing the mutation rate vis-à-vis nDNA during replication. Hence, it is estimated that mtDNA’s
mutation rate is 10 (some sources say 6 to 17) times higher than nDNA’s which
has the effect of making the tracing of descent via mtDNA that much easier (of
course, there are many advantages and disadvantages to each of mtDNA and nDNA
for these purposes). (Gibson & Muse, 2002).
At first glance, complicating matters is the fact that
because of the relatively high rate of mtDNA mutation an individual might –
probably does – have more than one mtDNA type, i.e., with different mtDNA
sequences; these different “types” can be located in different cells, the same
cell, and even in the same mitochondrion!
However, often these mutations are found in very low numbers; it is rare
to find more than one position so mutated out of the nucleotides sequenced at
HV1 and HV2 and, even more, “hotspots” for such mutations at HV1 and HV2 have
been identified. (Goodwin, et al., 2007).
A standard mtDNA analysis involves polymerase chain reaction
(PCR) amplification of nine overlapping fragments followed by sequence
analysis/digestion of 12-14 restriction enzymes in the HV1 and HV2
regions. The results are then
compared to the Cambridge Reference Sequence. The estimate is that the sequences in these two areas vary
by 1 to 2 percent between unrelated individuals; typically, these mutations are
found in two particular “hotspots” in the two HV regions. (Goodwin, et al.,
2007).
A weakness with the use of mtDNA is that some haplogroups [12]
are common within some populations.
An example is an FBI study of 1655 Caucasians of whom 168 (a little over
10%) would not be “excluded” as possibly related! Hence, increasingly, SNP Marker Haplogroup Backbone Tests
are being used. It should be noted that to sequence all 16,569 base pairs
provides a 12-fold increase in the ability to differentiate such otherwise
common haplogroups. (Goodwin, et
al., 2007).
Even given all of the above, it must be remembered that the
further one goes back in any such tracings, ever more extrapolations (educated
guesses) of expected rates of mutations are used to “connect the dots.”
HV1
Test
The HV1 Test is the most
informative mtDNA test [13] as:
1. The HV1 region contains an abundance of markers.
2. The HV1 region is easy to test. All 520 nucleotides in the entire HV1
region can be read from a single test.
3. The HV1 region is well studied. Thus, there is more scientific data
available for markers in the HV1 region than any other region of the mtDNA.
All of the other test types serve
to supplement the results of the HVR1 test. (Smolenyak Smolenyak & Turner, 2004).
***** ***** ***** ***** ***** ***** ***** ***** ***** ****
This mutation is documented as follows:
* Location: 40
* Nucleotide Change:
T>G.
***** ***** ***** ***** ***** ***** ***** ***** ***** ****
HV1 Sequence of
Louis Sheehan (Location 16001 to 16520) compared to the CRS:
16001
ATTCTAATTT AAACTATTCT CTGTTCTTTC
ATGGGGAAGC AGATTTGGGT
16051
ACCACCCAAG TATTGACTCA CCCgTCAACA
ACCGCTATGT ATyTCGTACA
16101
TTACTGCCAG CCACCATGAA TATTGTACaG
TACCATAAAT ACTTaACCAC
16151
CTGTAGTACA TAAAAACCCA ATCCACATCA
AAACCCCCTC CCCATGCTTA
16201
CAAGCAAGTA CAGCAATCAA CCtTCAACTA
TCACACATCA ACTGCAACTC
16251
CAAAGCCACC CCTCACCCAC TAGGATACCA
ACAAACCTAC CCACCCTTAA
16301
CAGTACATAG TACATAAAGC CATTTACCGT
ACATAGCACA TTACAGTCAA
16351
ATCCCTTCTt GTCCCCATGG ATGACCCCCC
TCAGATAGGG aTCCCTTGAC
16401
CACCATCCTC CGTGAAATCA ATATCCCGCA
CAAGAGTGCT ACTCTCCTCG
16451
CTCCGGGCCC ATAACACTTG GGGGTAGCTA
AAGTGAACTG TATCCGACAT
16501
CTGGTTCCTA CTTCAGGGcC
Location Mutation
Type [14] Nucleotide
16074 HV1 Substitution
A > g
16093 HV1 Substitution
T > y
16129 HV1 Substitution G > a
16145 HV1 Substitution
G > a
16223 HV1 Substitution
C > t
16360 HV1 Substitution
C > t
16391 HV1 Substitution
G > a
16519 HV1 Substitution
T > c
A total of 8 mutations.
HV2
Test
The HV2 region spans positions 1
to 400. Like HV1, HV2 contains an abundance of markers that are useful for
tracing maternal ancestry.
The HV2 test [15]
supplements the HV1 results as it allows more stringent and precise results for
any comparison. For example, in
cases where people are matching exactly at the HV1 region, further comparison
of the HV2 region provides greater resolution. (Rodden Robinson, 2005).
HV2 Sequence of
Louis Sheehan (Location 1 to 400) compared to the CRS:
1
GATCACAGGT CTATCACCCT ATTAACCACT
CACGGGAGCT CTCCATGCAT
51
TTGGTATTTT CGTCTGGGGG GTgTGCACGC
GATAGCATTG CGAGACGCTG
101
GAGCCGGAGC ACCCTATGTC GCAGTATCTG
TCTTTGATTC CTGCCTCATC
151
CcATTATTTA TCGCACCTAC GTTCAATATT
ACAGGCGAAC ATACTTACcA
201
AAGcGTaTTA ATTAATTAAT GCTTGTAGGA
CATAATAATA ACAATTGAAc
251
GTCTGCACAG CCgCTTTCCA CACAGACATC
ATAACAAAAA ATTTCCACCA
301
AACCCCCCNc CCCCCNCTTN TGGNCNCANC
ACTTAAANNC ATNTNTGCCA
351
AACCCCAAAA ANAAANAANC CTAANACCNG
CCTAACCANA TTTNAAATTT
Location Mutation
Type Nucleotide
73 HV2 Substitution
A > g
152 HV2 Substitution
T > c
199 HV2 Substitution
T > c
204 HV2 Substitution
T > c
207 HV2 Substitution
G > a
250 HV2 Substitution
T > c
263 HV2
Substitution
A > g
310 HV2 Substitution
T > c
A total of 8 mutations.
While the markers in the HV1 and
HV2 Regions can be easily detected by sequencing, the Coding Region is large,
making sequencing (currently) impractical. The SNP Haplogroup Backbone Test is a panel of 20 markers in the Coding Region which
are specific for haplogroup determination. (Fitzpatrick, 2005).
SNP Backbone
Test of Louis Sheehan:
SNP Location Mutations
[20] SNP Identity Mutation
2352 T
> C
T Negative
3594 C
> T
C
Negative
3693 G
> A
G Negative
4312 C
> T
C Negative
4580 G
> A
G Negative
4833 A
> G
A Negative
5178 C
> A
C > T
C Negative
7028 C
> T
T Positive
7055 A
> C
A > G
A Negative
7598 G
> A
G Negative
8618 T
> C
T Negative
10086 A
> G
A Negative
10310 G
> A
G
Negative
10400 C
> T
C Negative
10873 T
> C
T Negative
11251 A
> G
A Negative
11719 G
> A
A Positive
12308 A
> G
A Negative
12705 C
> T
T Positive
14766 C
> T
T Positive
A total of 4 mutations.
CONCLUSION
Maternal Markers
mtDNA HV1: 8 Mutations
mtDNA HV2: 8 Mutations
mtDNA SNP: 4 Mutations
This specific pattern of mutations
confirms Lou Sheehan’s mtDNA Haplogroup to be: I.
Geographic migration of Haplogroup I over time.
Works Cited
Aurignacian culture. (2008). In Encyclopædia Britannica. Retrieved
July 23, 2008, from Encyclopædia Britannica Online: http://www.britannica.com/EBchecked/topic/43368/Aurignacian-culture
Fitzpatrick, C. & Yeiser, A.
(2005). DNA & Genealogy. Fountain
Valley:
Rice Book Press.
Fitzpatrick,
C. (2005). Forensic Genealogy. Fountain Valley, California: Rice Book Press.
Genetic
Geneology (2008) http://www.genebase.com/index.php;
and http://www.genebase.com/index.php
(maintained by Genetrack). Retrieved July 15, 2008 from: http://www.dnaancestryproject.com/
Gibson,
G. & Muse, S.V. (2002). Primer
of Genome Science. (1st ed.). Sunderland, Massachusetts: Sinauer
Associates, Inc.
Goodwin,
W. & Linacre, A. & Hadi, S. (2007). An
Introduction to Forensic Genetics.
West Sussex, England: John Wiley & Sons, Ltd.
Hart,
A. (2002). How to Interpret Your DNA Test
Results for Family History and Ancestry: Scientists Speak Out on Geneaolgy
Joining Genetics. Lincoln, Nebraska: iUniverse.
Nicholl,
D. (2002). An Introduction to Genetic Engineering. (2nd ed.). New
York: Cambridge University Press.
Olson,
S. (2002). Mapping Human History. New
York: Mariner Books.
Ridley,
M. (2000). Genome The Autobiography of a
Species in 23 Chapters. New York: HarperCollins.
Rodden
Robinson, T. (2005). Genetics for Dummies.
Hoboken, New Jersey: Wiley, Inc.
Smolenyak
Smolenyak, M. & Turner, A. (2004).
Trace Your Roots with DNA.
Rodale.
Time and Space The Archaeological
Context. Retrieved July 23, 2008
from:
Wells,
S. (2007). Deep Ancestry Inside the
Ancestry Genographic Project. Washington, D.C.: National Geographic
Society.
Wells,
S. (2003). The Journey of Man a Genetic
Odyssey. New York: Random House Trade Paperback.
Saved at Ancestry Tracing Paper.doc
[1] Extensive testing was also
done relating to the author’s Y chromosome – referred to herein as nDNA – but
the length of a paper discussing both the mtDNA and the nDNA would be far in
excess of what was assigned.
[2] These numbers vary by
source.
[3] During
this time, approximately 70,000 BC to 10,000 BCE, a mile-deep ice sheet covered
the northern part of the continent from just north of London in England and
west to the mainland covering all of Scandinavia and Poland. The glaciers in
the Alps would have grown tremendously as well, fanning out in all directions
to provide an ice cap covering the top of the entire boot of Italy. Sea levels
were approximately 130 yards lower than today, meaning some Mediterranean
islands (as well as some Atlantic islands such as the Azores and the Canaries)
would have much greater landmass than today, and even islands such as Malta
were connected to the mainland, just as Britain and Ireland would have been.
Ice Age Euroasia was cold, with temperatures on average 8 to 12
degrees Centigrade cooler than today, and much drier. The European treeline ran
from the Bay of Biscay in Spain, across the very north of Italy and over to the
Sea of Azov, where it dipped southeast toward the border between Iran and
Turkmenistan. In between the tree line and the glacial line, the landscape
would have been dry, covered with sand dunes and tough, wiry grass, perhaps
similar to some extent to terrain associated with Iceland and Northern Canada
today.
This grassy intermediate zone -- known as the Great Eurasian Plain
-- would have served as a natural game corridor, running all the way from
eastern France to Korea. Great herds of game roamed this grassy thoroughfare:
wooly mammoths, antelopes, and, in particular, a number of species of bison.
[4] MtDNA uses a slightly
different code than nDNA. As
examples:
(i)
UGA in mtDNA codes for trytophan but is a stop codon in nDNA;
(ii)
AUA in mtDNA codes for methionine whereas in nDNA it codes for
isoleucine;
(iii)
AGA and AGG both code for stops on mtDNA but code for arginine in nDNA.
(Goodwin,
et al., 2007).
[5] MtDNA has been one of the
tools used (i) to identify the Vietnam Unknown Soldier, (ii) as part of the process of identifying
the Romanov family, (iii) to
establish that Neanderthals were not direct ancestors of modern humans, (iv) to discredit the
claim of Anna Anderson Manahan to be the Russian Princess Anastasia, and (v) to
establish that Jesse James was indeed the person buried in his tomb. (Wells,
2003).
[6] Both mtDNA’s circular structure
and the encapsulation behind two walls further enhances the survivability of
mtDNA. (Gibson & Muse, 2002).
[7] Interestingly, a portion of
at least some humans’ chromosome 11 carries a portion of an mtDNA “control
region” reflecting an ancient transposition between mtDNA and nDNA. (Goodwin,
et al., 2007).
[8] Today the processes used
1981 – including the use of bovine sequences to fill in gaps – are referred to
as “rudimentary.” In 1999 the same
placenta was resequenced with only 11 changes made to the 1981 standard (plus
an additional 7 nucleotide positions actually related to rare polymorphisms),
but most importantly the two areas of primary focus – the HV1 and HV2 regions –
were not adjusted. Actually, the
resequencing established there were only 16,568 base pairs but the numbering
system keyed to 16,569 base pairs was retained for the sake of reference
consistency. However, it must be
noted that some institutions use different reference-systems -- one example is
based on an African who had 16,571 base pairs -- so it is necessary to be aware of the reference-system being
used. (Goodwin, et al., 2007).
[9] By way of contrast, human nDNA consists of
somewhere between 49,530,000 to 247,200,000 bases. (Wells, 2007.)
[10] The D-Loop (HV1
& HV2) is considered a non-vital part of the mtDNA because it does not have
an immediately useful biological function. Thus, whenever a mutation occurs in this region the
individual does not die from the mutation and might survive and pass the
mutation along to the next generation. (Ridley, 2000).
[11] Sometimes as HVR1 &
HVR2; HVS1 & HVS2; or HVI
& HVII.
[13] The
HV1 test uses “DNA sequencing” technology to read all of the nucleotides from
locations 16,001 to 16,520 (the entire HV1 region).
[14] Substitution -
occurs with a nucleotide changes.
Deletion
- occurs when a nucleotide is removed from the sequence, therefore changing the
overall sequence.
Insertion -
occurs when a nucleotide is added to the sequence, therefore changing the
overall sequence.
[15] The HV2 test also
uses “DNA sequencing” technology.
This test reads all of the nucleotides from locations 1 to 400 (the entire region) of the mtDNA.
[16] There is also a SNP Subclade Test which examines a special panel of markers
in the Coding Region of the mtDNA allowing determination of one’s “sub-clade.”
However, at the moment, SNP Subclade tests are available only for the following
mtDNA Haplogroups: R, M, H.
[17] The 20 markers
that are included in this test and the haplogroups they define are:
SNP Location Mutations
Haplogroups
2352 T
> C L1b,
L3e, U6b1
3594 C
> T L0,
L1, L2, L5
3693 G
> A L1b,
L2d
4312 C
> T L0
4580 G
> A V
4833 A
> G G
5178 C
> A
C > T D
7028 C
> T H
7055 A
> C
A > G L1
7598 G > A E
8618 T
> C L3d
10086 A
> G L3b
10310 G
> A F
10400 C
> T C,
D, E, G, M, Q, Z
10873 T
> C C, D,
E, G, L, M, Q, Z
11251 A
> G JT, J,
T
11719 G
> A Pre-HV,
HV
12308 A
> G K, U
12705 C
> T B, F,
H, J, K, P, T, R, U, V
14766
C > T HV
[18] As each
sequencing reaction can only test approximately 500 nucleotides at a time, many
reactions would be required in order to sequence (“DNA Sequencing”) the entire
Coding Region, thus making it costly.
Also, there are very few mutations in the Coding Region, making it
unnecessary to sequence every single nucleotide in the Coding Region. As such, a process other than
“DNA Sequencing” is used wherein 20 specific Coding Region markers are tested.
[19] The Coding
Region of the mtDNA is considered essential for the survival of the individual,
so typically whenever a mutation occurs in this region it is often lethal and
the individual dies. Thus,
mutations which occur in the Coding Region are usually not passed down to
future generations. For this
reason, over a period of thousands of years, many mutations accumulate in the
D-Loop (HV1 & HV2), but few are found in the Coding Region. (Olson, 2002).
[20] The presence of
a mutation is as informative as is the absence of a mutation. For example,
mtDNA haplogroup H is the most common haplogroup among Europeans, and is
confirmed in part by the absence of all SNP markers tested in the mtDNA SNP
Backbone test. (Nicholl,
2002).
No comments:
Post a Comment