Tuesday, July 28, 2015

c13





Ancestry Tracing With mtDNA



Organizational Diagram of Circular Mitochondrial DNA.
Note: every graphic in this paper is sourced from http://www.genebase.com/index.php maintained by Genetrack

Lou Sheehan, Biology 215
July 28, 2008

Professor Dee M. Walter


ABSTRACT

This paper discusses the testing for the mtDNA haplogroup of one individual.  The tests were of Hypervariable Region I, Hypervariable Region II and of the Coding Region via an SNP Backbone Test.  The paper includes supporting discussions of the identified haplogroup – I – and the nature of mitochondrial DNA.


INTRODUCTION


DNA provides a record of our biological past.  The translation of this – if one will – history book advances daily.  One chapter in this book details our lineage.


Earlier in 2008 the author’s mitochondrial DNA (henceforth as: “mtDNA”) was tested.[1] The Tests involved
(i)            Hypervariable Regions I,
(ii)          Hypervariable Region II, and an
(iii)         mtDNA Single Nucleotide Polymorphism Marker (“SNP”) Backbone Test.

All testing was performed by Genetrack.

This paper will generally discuss the above-mentioned three tests.  The results showed the author’s mtDNA haplogroup to be “I”.  There are 8[2] major haplogroups in Europe: H (47%), J (17%), U (11%), T (9%), K (6%), X (6%), V (5%),  and I (2%), all of which are descendents of the haplogroup N. Each haplogroup is associated with a different ancestral lineage. (Because of its small size, sometimes haplogroup I is not considered to be a major European group.) (Fitzpatrick & Yeiser, 2005).


DISCUSSION


Haplogroup I

About 2.2% of Europeans are descendants of mtDNA haplogroup I.  Haplogroup I is one of the oldest groups to inhabit Western Europe arriving there during the Upper Paleolithic period –-about 30,000 to 40, 000 years ago -- and today is referred to as the Aurignacian culture; most others arrived in Europe tens of thousands of years later as the Ice Age was in retreat.[3]  The Aurignacians were the earliest known Cro-Magnon culture. This culture was marked by certain tools and a pronounced artistic tradition.  Aurignacian culture was preceeeded by Mousterian culture, was contemporary with Perigordian culture, and was succeeded by the Solutrean culture.

The Aurignacian culture was marked by a great diversification and specialization of tools, including the invention of the “burin” – an engraving tool --  that made much of the art possible.  The Aurignacian differs from other Upper Paleolithic cultures mainly in a preponderance of stone flake tools rather than blades. Flakes were retouched to make nosed scrapers, ridged scrapers, and end scrapers. Bones and antlers were made into points and awls by splitting, sawing, and smoothing.

Aurignacian art is said to represent the first complete tradition in the history of art. Cave art was produced almost exclusively in Western Europe, where, by the end of the Aurignacian Period, hundreds of paintings, engravings, and reliefs had been executed on the walls, the ceilings, and sometimes the floors of limestone caves. Probably the first paintings were stencilings outlined in color of actual hands held against the cave walls. The stencilings were followed by the development of figural painting. A characteristic feature of these early pictures, which persisted throughout the Aurignacian period, is their “twisted perspective,” which shows, for example, the head of the animal in profile and its horns twisted to a front view. Classic examples of Aurignacian art are the paintings of animals, such as horses and bulls, on the walls and ceilings of the cave at Lascaux in southwestern France. These figures were  painted in vivid polychrome red, yellow, brown, and black, with solid and closed outlines.

The earliest examples of small, portable art objects produced during this period consist of pebbles with very simple engravings of animal forms. Subsequently, animal figures were carved in pieces of bone and ivory. In the later part of the Aurignacian Period the carvings show increased naturalism with foreshortening and shading with cross-hatched lines. (Aurignacian culture. (2008). In Encyclopædia Britannica. Retrieved July 23, 2008, from Encyclopædia Britannica Online: http://www.britannica.com/EBchecked/topic/43368/Aurignacian-culture; and
Time and Space The Archaeological Context.  Retrieved July 23, 2008 from:


Mitochondrial DNA

Organizational Display of Regions of MtDNA.

It is speculated that mitochondria were originally bacteria and were “engulfed” by and developed a symbiotic relationship with other life forms living in their cellular cytoplasm.  Mitochondria provide their hosts with ATP in exchange for protection, etc.  Significantly, mitochondria have their own DNA i.e., their DNA is distinct from the DNA (henceforth as nDNA for nuclear DNA) of their hosts. Mutations in mtDNA are distinct from changes to the nDNA. [4] (Hart, 2002).

MtDNA mutates at a very slow rate -- over tens of generations  --  and thus can be used to glimpse the past.

MtDNA is understood to be inherited only from the mother.  At conception, essentially only the male sperm’s nucleus enters the female egg.  In the case wherein male mtDNA enters a female egg, human eggs are estimated to have as many as 100,000 mtDNA molecules which would result in an extreme dilution.  Yet more, the embryo’s cellular machinery is able to identify any male mtDNA that has entered the egg and destroy it. Thus, mothers pass along their mtDNA to their children and share similar mtDNA with their siblings and maternal relatives and, as such, barring a contemporaneous mutation, an individual’s mtDNA is not unique to him or her.[5] (Gibson & Muse 2002).

When remains are old or badly degraded it is often difficult or impossible to extract nDNA .  However, mtDNA is present in much higher numbers than nDNA and so some of it is more likely to remain testable.  In short, while nDNA contains much more information, there are only two copies (one paternal, one maternal) of it in each cell, while mtDNA has a smaller amount of useful information but typically hundreds of copies of that information in each cell; the use of mtDNA in this context simply reflects its increased odds of not being destroyed.[6],[7] (Goodwin & Linacre & Hadi, 2007).

MtDNA was first sequenced in 1981 from a European placenta which effort is referred to as either the “Cambridge Reference Sequence” or as the “Anderson Sequence.” [8]

On average, there are 4-5 copies of mtDNA per mitochondrion with a range of about 1 – 15 copies.  Each cell typically contains hundreds of mitochondrion, with an estimated average of about 500 per cell. (Gibson & Muse, 2002).

MtDNA exists as a circular structure.  Typically, mtDNA has 16,569 [9]  base pairs (some people have mutations).


                          
MtDNA base pairs are counted clockwise.

Of these, 1,122 base pairs contain the origin of replication but do not control for any other gene products.  15,447 of the 16,569 coding base pairs

Relative size of the D-Loop (HV1 & HV2) vis-à-vis the Coding Region.

code for proteins; there are no introns and only a few noncoding base pairs. (Gibson & Muse, 2002).

MtDNA has 37 genes that code for the products used in the oxidative phosphorylation process, a.k.a. cellular energy production; 13 of these code for proteins, 22 for transfer RNAs (tRNA) and 2 for ribosomal RNAs (rRNA). (Gibson & Muse, 2002).

The two mtDNA strands are designated as the “H” (heavy) and “L” (light) strands.  The H strand has a greater number of guanine nucleotides (the heaviest of the four different nucleotides) than does the L strand.  Replication begins in a non-coding region of the H strand (the displacement loop/D-loop).  [10]


Close up of the Hypervariable Regions.

The H strand encodes for 28 gene products and the L strand transcribes 8 tRNAs and an enzyme known as “ND6.” (Gibson & Muse, 2002).

Because the D-loop region does not code for any functional cell processes, there are significantly more polymorphisms between individuals in this region.  Due to the larger number of polymorphisms in this region, lineage tracing typically focuses on two coding regions in the D-loop known as HV1 [11]and HV2 (H is for “hypervariable”). (Gibson & Muse, 2002).

MtDNA has fewer repair mechanisms than nDNA and, not surprisingly, mtDNA has higher rates of mutation than does nDNA.  Even more, mtDNA lacks proof-reading mechanisms increasing the mutation rate vis-à-vis nDNA during replication.  Hence, it is estimated that mtDNA’s mutation rate is 10 (some sources say 6 to 17) times higher than nDNA’s which has the effect of making the tracing of descent via mtDNA that much easier (of course, there are many advantages and disadvantages to each of mtDNA and nDNA for these purposes). (Gibson & Muse, 2002).

At first glance, complicating matters is the fact that because of the relatively high rate of mtDNA mutation an individual might – probably does – have more than one mtDNA type, i.e., with different mtDNA sequences; these different “types” can be located in different cells, the same cell, and even in the same mitochondrion!  However, often these mutations are found in very low numbers; it is rare to find more than one position so mutated out of the nucleotides sequenced at HV1 and HV2 and, even more, “hotspots” for such mutations at HV1 and HV2 have been identified. (Goodwin, et al., 2007).

A standard mtDNA analysis involves polymerase chain reaction (PCR) amplification of nine overlapping fragments followed by sequence analysis/digestion of 12-14 restriction enzymes in the HV1 and HV2 regions.  The results are then compared to the Cambridge Reference Sequence.  The estimate is that the sequences in these two areas vary by 1 to 2 percent between unrelated individuals; typically, these mutations are found in two particular “hotspots” in the two HV regions. (Goodwin, et al., 2007).

A weakness with the use of mtDNA is that some haplogroups [12] are common within some populations.  An example is an FBI study of 1655 Caucasians of whom 168 (a little over 10%) would not be “excluded” as possibly related!  Hence, increasingly, SNP Marker Haplogroup Backbone Tests are being used. It should be noted that to sequence all 16,569 base pairs provides a 12-fold increase in the ability to differentiate such otherwise common haplogroups.  (Goodwin, et al., 2007).

Even given all of the above, it must be remembered that the further one goes back in any such tracings, ever more extrapolations (educated guesses) of expected rates of mutations are used to “connect the dots.”
























HV1 Test


The HV1 Test is the most informative mtDNA test [13]  as:

   1. The HV1 region contains an abundance of markers.
   2. The HV1 region is easy to test.  All 520 nucleotides in the entire HV1 region can be read from a single test.
   3. The HV1 region is well studied.  Thus, there is more scientific data available for markers in the HV1 region than any other region of the mtDNA.

All of the other test types serve to supplement the results of the HVR1 test.  (Smolenyak Smolenyak & Turner, 2004).




*****     *****     *****     *****     *****     *****     *****     *****     *****     ****


This mutation is documented as follows:

* Location:  40
* Nucleotide Change:  T>G.



*****     *****     *****     *****     *****     *****     *****     *****     *****     ****


HV1 Sequence of Louis Sheehan (Location 16001 to 16520) compared to the CRS:

16001
ATTCTAATTT AAACTATTCT CTGTTCTTTC ATGGGGAAGC AGATTTGGGT
16051
ACCACCCAAG TATTGACTCA CCCgTCAACA ACCGCTATGT ATyTCGTACA
16101
TTACTGCCAG CCACCATGAA TATTGTACaG TACCATAAAT ACTTaACCAC
16151
CTGTAGTACA TAAAAACCCA ATCCACATCA AAACCCCCTC CCCATGCTTA
16201
CAAGCAAGTA CAGCAATCAA CCtTCAACTA TCACACATCA ACTGCAACTC
16251
CAAAGCCACC CCTCACCCAC TAGGATACCA ACAAACCTAC CCACCCTTAA
16301
CAGTACATAG TACATAAAGC CATTTACCGT ACATAGCACA TTACAGTCAA
16351
ATCCCTTCTt GTCCCCATGG ATGACCCCCC TCAGATAGGG aTCCCTTGAC
16401
CACCATCCTC CGTGAAATCA ATATCCCGCA CAAGAGTGCT ACTCTCCTCG
16451
CTCCGGGCCC ATAACACTTG GGGGTAGCTA AAGTGAACTG TATCCGACAT
16501
CTGGTTCCTA CTTCAGGGcC



Location                 Mutation Type [14]               Nucleotide
16074            HV1                 Substitution        A > g
16093            HV1                 Substitution        T > y
16129            HV1                 Substitution          G > a
16145            HV1                 Substitution        G > a
16223            HV1                 Substitution        C > t
16360            HV1                 Substitution        C > t
16391            HV1                 Substitution        G > a
16519            HV1                 Substitution        T > c

A total of 8 mutations.

HV2 Test

The HV2 region spans positions 1 to 400. Like HV1, HV2 contains an abundance of markers that are useful for tracing maternal ancestry.

The HV2 test [15] supplements the HV1 results as it allows more stringent and precise results for any comparison.  For example, in cases where people are matching exactly at the HV1 region, further comparison of the HV2 region provides greater resolution. (Rodden Robinson, 2005).

HV2 Sequence of Louis Sheehan (Location 1 to 400) compared to the CRS:

1
GATCACAGGT CTATCACCCT ATTAACCACT CACGGGAGCT CTCCATGCAT
51
TTGGTATTTT CGTCTGGGGG GTgTGCACGC GATAGCATTG CGAGACGCTG
101
GAGCCGGAGC ACCCTATGTC GCAGTATCTG TCTTTGATTC CTGCCTCATC
151
CcATTATTTA TCGCACCTAC GTTCAATATT ACAGGCGAAC ATACTTACcA
201
AAGcGTaTTA ATTAATTAAT GCTTGTAGGA CATAATAATA ACAATTGAAc
251
GTCTGCACAG CCgCTTTCCA CACAGACATC ATAACAAAAA ATTTCCACCA
301
AACCCCCCNc CCCCCNCTTN TGGNCNCANC ACTTAAANNC ATNTNTGCCA
351
AACCCCAAAA ANAAANAANC CTAANACCNG CCTAACCANA TTTNAAATTT

Location                    Mutation Type     Nucleotide
73       HV2                 Substitution        A > g
152     HV2                 Substitution        T > c
199     HV2                 Substitution        T > c
204     HV2                 Substitution        T > c
207     HV2                 Substitution        G > a
250     HV2                 Substitution        T > c
263     HV2                 Substitution        A > g
310     HV2                 Substitution        T > c

A total of 8 mutations.


Single Nucleotide Polymorphism Marker (“SNP”) Haplogroup Backbone Test [16], [17] , [18], [19]

While the markers in the HV1 and HV2 Regions can be easily detected by sequencing, the Coding Region is large, making sequencing (currently) impractical.  The SNP Haplogroup Backbone Test is a panel of  20 markers in the Coding Region which are specific for haplogroup determination. (Fitzpatrick, 2005).






SNP Backbone Test of Louis Sheehan:

SNP Location           Mutations [20]                  SNP Identity       Mutation

2352                      T > C
                                                                            T                            Negative
3594                      C > T
                                                                            C                            Negative
3693                      G > A
                                                                             G                          Negative
4312                      C > T
                                                                             C                           Negative
4580                      G > A
                                                                            G                           Negative
4833                     A > G
                                                                           A                            Negative
5178                     C > A
                          C > T
                                                                           C                             Negative
7028                      C > T
                                                                           T                             Positive
7055                      A > C
                          A > G
                                                                           A                            Negative
7598                     G > A
                                                                            G                           Negative
8618                    T > C
                                                                           T                             Negative
10086                   A > G
                                                                           A                            Negative
10310                  G > A
                                                                           G                            Negative
10400                  C > T
                                                                           C                            Negative
10873                        T > C
                                                                           T                            Negative
11251                  A > G
                                                                           A                           Negative
11719                  G > A
                                                                           A                           Positive
12308                  A > G
                                                                           A                           Negative                                                     
  12705                C > T
                                                                           T                            Positive
14766                  C > T
                                                                           T                            Positive
A total of 4 mutations.







CONCLUSION

     Maternal Markers

mtDNA HV1: 8 Mutations
mtDNA HV2: 8 Mutations
mtDNA SNP:   4 Mutations

This specific pattern of mutations confirms Lou Sheehan’s mtDNA Haplogroup to be: I.








































PICT1266




Geographic migration of Haplogroup I over time.












Works Cited
Aurignacian culture. (2008). In Encyclopædia Britannica. Retrieved July 23, 2008, from Encyclopædia Britannica Online: http://www.britannica.com/EBchecked/topic/43368/Aurignacian-culture
Fitzpatrick, C. & Yeiser, A. (2005). DNA & Genealogy. Fountain Valley:  
            Rice Book Press.
Fitzpatrick, C. (2005).  Forensic Genealogy. Fountain Valley, California: Rice Book Press.
Genetic Geneology (2008) http://www.genebase.com/index.php; and http://www.genebase.com/index.php (maintained by Genetrack). Retrieved July 15, 2008 from: http://www.dnaancestryproject.com/
Gibson, G. & Muse, S.V. (2002).  Primer of Genome Science. (1st ed.). Sunderland, Massachusetts: Sinauer Associates, Inc. 
Goodwin, W. &  Linacre, A. &  Hadi, S. (2007).  An Introduction to Forensic Genetics.  West Sussex, England: John Wiley & Sons, Ltd.
Hart, A. (2002). How to Interpret Your DNA Test Results for Family History and Ancestry: Scientists Speak Out on Geneaolgy Joining Genetics. Lincoln, Nebraska: iUniverse. 
Nicholl, D. (2002).  An Introduction to Genetic Engineering. (2nd ed.). New York: Cambridge University Press.
Olson, S. (2002). Mapping Human History. New York: Mariner Books.
Ridley, M. (2000). Genome The Autobiography of a Species in 23 Chapters. New York: HarperCollins. 
Rodden Robinson, T. (2005). Genetics for Dummies. Hoboken, New Jersey: Wiley, Inc. 
Smolenyak Smolenyak, M. & Turner, A. (2004).  Trace Your Roots with DNA. Rodale.
Time and Space The Archaeological Context.  Retrieved July 23, 2008 from:

Wells, S. (2007). Deep Ancestry Inside the Ancestry Genographic Project. Washington, D.C.: National Geographic Society. 
Wells, S. (2003). The Journey of Man a Genetic Odyssey. New York: Random House Trade Paperback. 




Saved at Ancestry Tracing Paper.doc












[1] Extensive testing was also done relating to the author’s Y chromosome – referred to herein as nDNA – but the length of a paper discussing both the mtDNA and the nDNA would be far in excess of what was assigned.
[2] These numbers vary by source.
[3] During this time, approximately 70,000 BC to 10,000 BCE, a mile-deep ice sheet covered the northern part of the continent from just north of London in England and west to the mainland covering all of Scandinavia and Poland. The glaciers in the Alps would have grown tremendously as well, fanning out in all directions to provide an ice cap covering the top of the entire boot of Italy. Sea levels were approximately 130 yards lower than today, meaning some Mediterranean islands (as well as some Atlantic islands such as the Azores and the Canaries) would have much greater landmass than today, and even islands such as Malta were connected to the mainland, just as Britain and Ireland would have been.

Ice Age Euroasia was cold, with temperatures on average 8 to 12 degrees Centigrade cooler than today, and much drier. The European treeline ran from the Bay of Biscay in Spain, across the very north of Italy and over to the Sea of Azov, where it dipped southeast toward the border between Iran and Turkmenistan. In between the tree line and the glacial line, the landscape would have been dry, covered with sand dunes and tough, wiry grass, perhaps similar to some extent to terrain associated with Iceland and Northern Canada today.

This grassy intermediate zone -- known as the Great Eurasian Plain -- would have served as a natural game corridor, running all the way from eastern France to Korea. Great herds of game roamed this grassy thoroughfare: wooly mammoths, antelopes, and, in particular, a number of species of bison.
[4] MtDNA uses a slightly different code than nDNA.  As examples:

(i)             UGA in mtDNA codes for trytophan but is a stop codon in nDNA;
(ii)           AUA in mtDNA codes for methionine whereas in nDNA it codes for isoleucine;
(iii)          AGA and AGG both code for stops on mtDNA but code for arginine in nDNA.
(Goodwin, et al., 2007).


[5] MtDNA has been one of the tools used (i) to identify the Vietnam Unknown Soldier, (ii)  as part of the process of identifying the Romanov  family, (iii) to establish that Neanderthals were not direct ancestors of  modern humans, (iv) to discredit the claim of Anna Anderson Manahan to be the Russian Princess Anastasia, and (v) to establish that Jesse James was indeed the person buried in his tomb. (Wells, 2003).
[6] Both mtDNA’s circular structure and the encapsulation behind two walls further enhances the survivability of mtDNA. (Gibson & Muse, 2002).


[7] Interestingly, a portion of at least some humans’ chromosome 11 carries a portion of an mtDNA “control region” reflecting an ancient transposition between mtDNA and nDNA. (Goodwin, et al., 2007).

[8] Today the processes used 1981 – including the use of bovine sequences to fill in gaps – are referred to as “rudimentary.”  In 1999 the same placenta was resequenced with only 11 changes made to the 1981 standard (plus an additional 7 nucleotide positions actually related to rare polymorphisms), but most importantly the two areas of primary focus – the HV1 and HV2 regions – were not adjusted.  Actually, the resequencing established there were only 16,568 base pairs but the numbering system keyed to 16,569 base pairs was retained for the sake of reference consistency.  However, it must be noted that some institutions use different reference-systems -- one example is based on an African who had 16,571 base pairs --  so it is necessary to be aware of the reference-system being used. (Goodwin, et al., 2007).
[9] By way of contrast, human nDNA consists of  somewhere between 49,530,000 to 247,200,000 bases. (Wells, 2007.)


[10] The D-Loop (HV1 & HV2) is considered a non-vital part of the mtDNA because it does not have an immediately useful biological function.  Thus, whenever a mutation occurs in this region the individual does not die from the mutation and might survive and pass the mutation along to the next generation. (Ridley, 2000).
[11] Sometimes as HVR1 & HVR2; HVS1  & HVS2; or HVI & HVII.


[13] The HV1 test uses “DNA sequencing” technology to read all of the nucleotides from locations 16,001 to 16,520 (the entire HV1 region).
[14] Substitution - occurs with a nucleotide changes.
    Deletion - occurs when a nucleotide is removed from the sequence, therefore changing the overall sequence.
   Insertion - occurs when a nucleotide is added to the sequence, therefore changing the overall sequence.


[15] The HV2 test also uses “DNA sequencing” technology.  This test reads all of the nucleotides from locations 1 to 400  (the entire region) of the mtDNA.  

[16] There is also a SNP Subclade Test which examines a special panel of markers in the Coding Region of the mtDNA allowing determination of one’s “sub-clade.” However, at the moment, SNP Subclade tests are available only for the following mtDNA Haplogroups: R, M, H.



[17] The 20 markers that are included in this test and the haplogroups they define are:

SNP Location        Mutations               Haplogroups

2352                    T > C                         L1b, L3e, U6b1
3594                    C > T                         L0, L1, L2, L5
3693                    G > A                        L1b, L2d
4312                    C > T                        L0
4580                    G > A           V
4833                    A > G                        G
5178                    C > A
                          C > T                              D
7028                    C > T                        H
7055                    A > C
                          A > G                              L1
7598                   G > A                         E
8618                    T > C                        L3d
10086                 A > G                        L3b
10310                G > A                         F
10400                 C > T                       C, D, E, G, M, Q, Z
10873                 T > C                     C, D, E, G, L, M, Q, Z
11251                A > G                      JT, J, T
11719                G > A                      Pre-HV, HV
12308                A > G                      K, U
12705                 C > T                     B, F, H, J, K, P, T, R, U, V
14766                  C > T                    HV





[18] As each sequencing reaction can only test approximately 500 nucleotides at a time, many reactions would be required in order to sequence (“DNA Sequencing”) the entire Coding Region, thus making it costly.  Also, there are very few mutations in the Coding Region, making it unnecessary to sequence every single nucleotide in the Coding Region.   As such, a process other than “DNA Sequencing” is used wherein 20 specific Coding Region markers are tested.



[19] The Coding Region of the mtDNA is considered essential for the survival of the individual, so typically whenever a mutation occurs in this region it is often lethal and the individual dies.  Thus, mutations which occur in the Coding Region are usually not passed down to future generations.  For this reason, over a period of thousands of years, many mutations accumulate in the D-Loop (HV1 & HV2), but few are found in the Coding Region. (Olson, 2002).


[20] The presence of a mutation is as informative as is the absence of a mutation. For example, mtDNA haplogroup H is the most common haplogroup among Europeans, and is confirmed in part by the absence of all SNP markers tested in the mtDNA SNP Backbone test. (Nicholl, 2002).






No comments:

Post a Comment