Addendum — African Genetic Diversity and the Science Behind the Results
Addendum — The Return Path

African Genetic Diversity and the Science Behind the Results

Cross-referenced with Sibomana (2024) · Pharmgenomics Pers Med

Language note: Percentages throughout this series describe DNA sequence similarity to reference populations, not fixed ancestry fractions. Following Kampourakis & Peterson (Genetics, 2023), the term “admixture” has been avoided. All calculator outputs are statistical similarity scores relative to defined reference groups. Kampourakis K & Peterson EL. Genetics 223(3), iyad002 (2023) · Sibomana O. Pharmgenomics Pers Med. 2024;17:487–496. doi:10.2147/PGPM.S485452 · Fortes-Lima CA et al. Am J Hum Genet. 2025;112(2):261–275

This addendum cross-references the DNA similarity analysis documented in this series with a 2024 systematic review of African genetic diversity published in Pharmacogenomics and Personalized Medicine. It does not revise the original findings but places them in a broader scientific context — explaining why the results are as rich and complex as they are, why four calculators were necessary, and what the improving landscape of African genomics may reveal in the future.

Sibomana O. “Genetic Diversity Landscape in African Population: A Review of Implications for Personalized and Precision Medicine.” Pharmacogenomics and Personalized Medicine. 2024;17:487–496. doi:10.2147/PGPM.S485452. PMC11566596.

Section 1African Genetic Diversity — The Scientific Foundation

The DNA similarity analysis documented in this series produced results of unusual richness — West African, North African, East African, Sahelian, Sephardic, and ancient hunter-gatherer signals all appearing across four independent calculators. To a reader unfamiliar with population genetics, this complexity might seem surprising. The Sibomana (2024) review provides the scientific explanation: it is exactly what the data predicts.

Africa as the origin of human genetic diversity

Africa is the cradle of modern human evolution and the origin of our species’ global spread. Paleontological and genetic evidence indicates that modern humans originated in Africa within the past 300 thousand years and spread across the globe within the last 100 thousand years. The consequence of this deep history is profound: the average African genome carries nearly a million more genetic variants than the average non-African genome.

Cross-reference — Your Results

The ancient lineage signals in the researcher's DNA similarity profile — Biaka-Pygmy (2.67%), Hadza (1.59%), Khoi-San (0.58%), Mbuti-Pygmy (0.79%), Nilo-Saharan (1.30%) — are direct expressions of this extraordinary depth. These are not noise or calculator artifacts. They are echoes of lineages that predate the peopling of every other continent. The researcher's genome carries traces of the oldest surviving human genetic lineages on Earth.

Diversity within Africa exceeds diversity between Africans and Eurasians

One of the most counterintuitive findings in population genetics is that genetic differences within Africans are larger than those between Africans and Eurasians. This supports the Out of Africa model of human evolution — the genetic diversity found in Eurasians is largely a subset of that already present in Africans.

Cross-reference — Your Results

This finding reframes the North African / Western Semitic signal (~24–33% across all four calculators). The Eurasian component in the researcher's DNA similarity profile is not a foreign intrusion into an otherwise African genome. Eurasian genetic diversity is itself a subset of African diversity — a branch that left the continent and diversified in isolation. When the researcher's DNA shows similarity to North African Berber and ancient Eurasian reference populations, it is expressing deep African genetic history that predates the separation of those populations from their African origins. The Fortes-Lima et al. (2025) confirmation that the Fulani carry ancient Iberomaurusian and Green Sahara ancestry makes this explicit: these signals trace back to African populations of the Saharan period, not to any recent non-African source.

All non-African lineages derive from African ancestors

Phylogenetic analyses confirm that most ancestral lineages are African-specific and that all non-African lineages can be derived from a single ancestral African haplogroup, consistent with the Out of Africa model.

In the context of this research, this means that the totality of the researcher's DNA similarity profile — including the North African, Eurasian, and Sephardic signals — ultimately traces to African origins. The complexity of the researcher's results is not a sign that the ancestry is scattered across multiple continents. It is a sign that the researcher's genome carries the deep, branching structure of African genetic history itself.

Section 2Why Four Calculators Were Necessary — The Reference Panel Problem

The most practically significant intersection between Sibomana (2024) and this research concerns reference panels — the population databases against which the researcher's DNA is compared. The paper identifies a specific and well-documented problem: African populations are severely underrepresented in the genomic reference panels used by most testing platforms.

The underrepresentation problem

Global genomic research is hindered by the underrepresentation of non-European populations in reference panels. The lack of diversity in current genomic reference data may cause bias in subsequent analyses. Sibomana explicitly identifies underrepresentation in studies, scarcity of African reference genomes, inaccuracy of genetic testing and interpretation, and ancestry misclassification as direct consequences of this gap.

Cross-reference — Your Results

This is the scientific explanation for why the four calculators used in this research returned different labels and different percentages for the same underlying signals. Each calculator’s reference panel has different coverage of African populations. EthioHelix K10, with its Africa-only panel, has better African resolution and correctly identifies the North African signal as “North Africa.” Africa9 and MDLP K23b, with global panels built primarily on European reference populations, misclassify the same signal as “Europe,” “European_Hunters_Gatherers,” or “Western_Semitic” because their panels lack the African reference diversity to distinguish these signals accurately. This is not a user error. It is a documented limitation of the tools.

Running multiple calculators was the correct methodological response

Because no single reference panel adequately represents African genetic diversity, relying on any one calculator would have produced a partial and potentially misleading picture. The decision to run four independent calculators — EthioHelix K10 Africa Only, Dodecad Africa9, puntDNAL K8, and MDLP K23b — and to treat consistent signals across all four as more reliable than signals appearing in only one, is precisely the methodological response the reference panel problem demands.

The convergence test documented in the methodology guide — treating a signal confirmed by 4 of 4 calculators as definitive, and a signal appearing in 1 of 4 as weak — is a practical workaround for the reference panel limitations Sibomana identifies. By triangulating across multiple imperfect instruments, the analysis extracts more reliable signals than any single instrument could provide.

The Kampourakis and Peterson framework is validated by the same problem

The reference panel underrepresentation problem also reinforces the language framework adopted throughout this series. If reference panels are built on incomplete and biased data — skewed toward European populations with African diversity underrepresented — then describing results as fixed ancestry fractions overstates the precision of the measurement and obscures the limitations of the reference data. Describing results as DNA sequence similarity to defined reference populations is not only more epistemically honest — it is more accurate given what the science tells us about how those reference populations were constructed.

Sibomana (2024) and Kampourakis & Peterson (2023) arrive at the same practical conclusion from different directions. Sibomana shows that African genetic diversity is underrepresented in the tools. Kampourakis and Peterson show that the conceptual framework of the tools carries problematic assumptions. Together they make the case that results should be held lightly, compared across multiple instruments, and described in similarity terms rather than ancestry fractions — which is exactly the approach taken in this research.

Section 3What This Means Going Forward

The reference panels are improving

Sibomana (2024) calls for the establishment of genomic research centres, increased funding, biobanks and repositories, and international cooperation to address the African representation gap. This is an active area of work. Initiatives such as the H3Africa Consortium, the African Genome Variation Project, and the 54gene biobank are specifically designed to close the gap in African genomic reference data.

Implication for This Research

The DNA similarity analysis in this series was conducted on reference panels available in 2025. As African genomic representation improves — more Fulani, Yoruba, Igbo, Hausa, Bambaran, and Mandinka reference genomes enter the databases — the resolution of these calculators will increase. The broad signals identified here (West African, Sahelian, North African) are likely to sharpen into more specific geographic and ethnic resolution. Running these same calculators again in three to five years on improved reference panels may reveal finer detail than is currently visible.

A note on the Cherokee signal

The absence of a detectable Native American signal in the initial Ancestry.com test — despite a 3x great-grandmother listed in the US Indian Census as Cherokee — is also directly explained by the reference panel problem. Native American populations are among the most underrepresented in consumer DNA testing reference panels. At 3x great-grandparent distance, the expected genetic contribution is approximately 3% — a signal that is difficult to detect even with adequate reference data, and essentially undetectable when the reference panel for that population is thin. The trace signals detected across multiple GEDmatch calculators when explicitly searching for Native American components are consistent with the genealogical record, even if they fall below the threshold of confident detection on standard ethnicity estimates.

The broader significance of this research

The genetic diversity that the Atlantic Slave Trade tried to render invisible — through deliberate erasure of names, languages, families, and origins — is the same diversity that population genomics is now working to restore to scientific visibility. The Sibomana review is a reminder that African genetic diversity is not merely a genealogical curiosity but one of the most important and underexplored frontiers in the biological sciences. AST descendants who undertake this kind of research are contributing, in a small but meaningful way, to the broader effort to make that diversity visible and understood.

This research began as a personal attempt to recover what the paper trail could not tell. The science reviewed here suggests that the tools available for that recovery are still improving — and that the story the researcher's DNA has to tell will only become clearer as the reference data catches up with the full depth of African genetic history.

SummaryCross-Reference at a Glance

Finding in This ResearchSibomana (2024) Context
Four calculators returned different labels for the same signalsReference panel underrepresentation of African populations causes inaccuracy and ancestry misclassification — a documented limitation, not a user error
Ancient lineage signals (Biaka, Hadza, Khoi-San, Mbuti)Africa has the deepest and most diverse genetic history on Earth; these signals reflect lineages predating the peopling of all other continents
North African / Eurasian signal (~24–33%)Eurasian genetic diversity is a subset of African diversity; signals of this type ultimately trace to African origins via the Out of Africa dispersal
Cherokee signal absent from Ancestry.com but detected as trace in GEDmatchNative American populations are among the most underrepresented in reference panels; thin reference data makes small signals undetectable on standard estimates
Similarity framing adopted throughout (Kampourakis & Peterson 2023)Reference panel bias means percentage figures overstate precision; similarity framing is more epistemically honest given known panel limitations
Fulani as closest reference population match across multiple calculatorsAfrica’s immense genetic diversity and complex population history create populations like the Fulani whose genetic profile crosses multiple regional boundaries — a product of the continent’s extraordinary diversity
Results will improve over timeActive initiatives (H3Africa, African Genome Variation Project, 54gene) are expanding African genomic reference data; better panels will yield sharper resolution

Sibomana O. Pharmgenomics Pers Med. 2024;17:487–496. doi:10.2147/PGPM.S485452 · Kampourakis K & Peterson EL. Genetics 223(3), iyad002 (2023) · Fortes-Lima CA et al. Am J Hum Genet. 2025;112(2):261–275 · GEDmatch Kit NJ7476284

The Return Path · Genetic Genealogy for African Diaspora Reconstruction · Addendum: African Genetic Diversity and the Science Behind the Results

No comments: