African Ancestry DNA Analysis
Language note: Throughout this guide, the term admixture has been replaced with DNA sequence similarity. Following Kampourakis & Peterson (Genetics, 2023), what genomic calculators measure is how closely segments of your DNA resemble segments found in defined reference population databases — a statistical similarity score, not a fixed fraction of ancestry from pure sources. All percentage figures should be read as similarity scores relative to specific reference populations, not as genealogical ancestry fractions. Kampourakis K & Peterson EL. “The racist origins, racialist connotations, and purity assumptions of the concept of ‘admixture’ in human evolutionary genetics.” Genetics 223(3), iyad002 (2023). doi:10.1093/genetics/iyad002
OverviewIntroduction
This guide is written for descendants of the Atlantic Slave Trade (AST) and the broader African diaspora who have taken a consumer DNA test and want to extract a detailed, scientifically grounded picture of their African origins. The paper trail for AST descendants typically ends at emancipation or earlier — making genomic analysis one of the few available tools for tracing ancestral connections.
Throughout this guide, calculator outputs are described as DNA sequence similarity scores relative to defined reference populations, not as fixed proportions of ancestry. This distinction follows Kampourakis & Peterson (2023) and is both scientifically more accurate and historically more honest.
The role of AI in this research. This guide was developed alongside a complete DNA similarity analysis conducted by Bill Gray, Tāmaki Makaurau, Aotearoa New Zealand, using Claude Sonnet, developed by Anthropic, as an AI research partner. Throughout Phases 4, 5, and 6 you will find notes explaining how AI was used to perform tasks that might otherwise require specialist expertise. Any currently available large language model — including ChatGPT, Gemini, and others — is uniquely suited to this kind of work. AST descendants undertaking similar research would be well advised to use them.
What you will need
- A completed consumer DNA test from Ancestry.com, 23andMe, MyHeritage, or FamilyTreeDNA
- A free GEDmatch account at gedmatch.com
- Your raw DNA data file (downloadable from your testing company)
- Access to an AI assistant (Claude, ChatGPT, Gemini, or similar)
- Time — running multiple calculators takes several hours
- A basic spreadsheet to record results as you go
DNA similarity scores are statistical approximations, not genealogical proof. Results vary across calculators due to differing reference populations. They are most meaningful when consistent patterns emerge across multiple independent analyses.
Phase 1Initial DNA Testing
Step 1.1 — Choose a Testing Company
| Company | Strengths for African Ancestry Research |
|---|---|
| Ancestry.com | Largest database of DNA matches; strongest Nigerian and West African subregion breakdowns; best for finding living relatives |
| 23andMe | Good African subregion resolution; active Nigerian diaspora user base; strong relative-matching tools |
| FamilyTreeDNA | Best for haplogroup testing (mtDNA and Y-DNA); Big Y-700 test gives deep paternal lineage detail |
| MyHeritage | Growing West African user base; useful secondary upload platform |
Step 1.2 — Download Your Raw DNA Data
- Log in to your testing company's website
- Ancestry.com: Settings > DNA > Download Raw DNA Data
- 23andMe: Browse Raw Data > Download
- FamilyTreeDNA: myDNA > Download Raw Data
- Save the file — it will be a .zip or .txt file, typically 5–30MB
Step 1.3 — Note Your Ancestry.com Regional Signals
Before moving to GEDmatch, record the regional signals from your Ancestry.com ethnicity estimate. Read percentages as similarity scores, not ancestry fractions. Pay particular attention to:
- Specific African country or regional signals (Nigeria, Cameroon, Senegal, Mali, etc.)
- Any unexpected signals — Mali, North Africa, or Mediterranean are common and historically meaningful
- Any non-African signals such as European, Indigenous Americas, or Middle Eastern
Phase 2GEDmatch Setup and Upload
Step 2.1 — Create a GEDmatch Account
- Go to gedmatch.com and click Register
- Create a free account and verify your email
- Log in — you will see your dashboard
Step 2.2 — Upload Your Raw DNA File
- From your dashboard, click Upload your DNA files
- Select your testing company from the dropdown
- Upload your raw data file
- Wait for processing — typically 30 minutes to several hours
- You will receive an email when complete
- Write down your kit number (e.g., NJ7476284) — you will need it for every calculator run
Phase 3Running the African Genomic Calculators
Why multiple calculators matter. Each calculator uses different reference populations and different mathematics. Because percentages are similarity scores relative to those panels — not objective ancestry measurements — consistency across multiple independent analyses is the strongest evidence that a signal is real.
How to access the calculators. From your GEDmatch dashboard, scroll to Admixture (heritage) and click the link. Select the calculator, enter your kit number, and click Continue.
Step 3.1 — EthioHelix K10 Africa Only (Run This First)
The most African-specific calculator on GEDmatch. Uses ten reference populations drawn entirely from African genetic data, giving it the best resolution for distinguishing West African subpopulations.
What to record
- All similarity percentages (West_Africa, North-Africa, East_Africa1, East_Africa2, Biaka-Pygmy, Hadza, Nilo-Saharan, Khoi-San, Mbuti-Pygmy, Omotic)
- Oracle — Single Population Similarity: all 20 populations and their distances
- Oracle — Mixed Mode Population Similarity: all 20 paired results and distances
- Oracle-4 — the 2-population, 3-population, and 4-population best-fit models
How to interpret EthioHelix K10
| Component | What it means |
|---|---|
| West_Africa | Your strongest similarity signal. Reference populations of Nigeria, Ghana, Cameroon, and the Sahelian belt |
| North-Africa | Similarity to Berber, Amazigh, or Arab-adjacent reference populations. Likely reflects Fulani prehistoric connections rather than a discrete recent North African line |
| East_Africa2 | Similarity to Ethiopia and Horn of Africa reference populations. Often embedded in Fulani reference genome, not necessarily a separate line |
| Nilo-Saharan | Nilotic and Sahelian peoples of the Nile corridor and Chad basin |
| Biaka-Pygmy / Mbuti-Pygmy | Ancient Central African forest lineages — deep prehistory signals |
| Hadza | Ancient East African hunter-gatherers — one of the oldest human lineages |
| Khoi-San | Southern African — among the earliest diverging human lineages |
The Oracle distance score. The lower the distance, the closer the similarity match. A score below 15 is extraordinary. When one population scores dramatically lower than all others, that is the most important finding in your analysis.
Step 3.2 — Dodecad Africa9
Uses a global reference panel. Will often assign African DNA similarity to categories labelled 'Europe' or 'SW_Asia' — this reflects the same ancient Eurasian component that EthioHelix calls 'North Africa,' expressed through different reference populations.
If Africa9 shows 15–25% 'Europe,' do not assume this represents recent European ancestry. For West African individuals, this almost always represents ancient North African Berber similarity that Africa9's reference panel cannot distinguish from European reference populations.
Step 3.3 — puntDNAL K8 African Only
Uses country-specific African reference populations (Nigeria_Fulani, Nigeria_Igbo, Mali_Mandinka, etc.) giving more geographic resolution. Pay particular attention to whether Fulani appears in the single-population list and how closely it ranks.
Step 3.4 — MDLP K23b
Uses ancient DNA reference populations. Decomposes the Eurasian similarity component into prehistoric streams — European_Hunters_Gatherers, European_Early_Farmers, Caucasian — validated by ancient DNA research.
| MDLP K23b label | What it actually represents |
|---|---|
| Subsaharan | Core West and Central African similarity |
| Archaic_African | Ancient pre-agricultural African hunter-gatherer lineages combined |
| European_Hunters_Gatherers | Ancient Western Eurasian similarity via the Green Sahara — NOT recent European ancestry |
| European_Early_Farmers | Neolithic Near Eastern/Anatolian similarity — same stream as Berber/North African ancestry |
| Caucasian | Caucasus hunter-gatherer component embedded in Fulani and North African reference populations |
| North_African | Residual specific North African similarity after the above are accounted for |
| East_African | Ethiopia and Horn region similarity signal |
Phase 4Interpreting the Results
Step 4.1 — Build a Cross-Calculator Comparison Table
After running all calculators, build a table placing all results side by side. Consistent signals across independent analyses are reliable. Signals appearing in only one calculator may reflect that calculator's reference panel.
This step is well suited to AI assistance. Upload your calculator screenshots or results text and direct your AI assistant to build the cross-calculator comparison table, identify consistent signals, note discrepancies, and explain what different calculator labels for the same underlying signal mean.
In this research, I uploaded GEDmatch screenshots and directed Claude Sonnet to build the comparison framework, identify consistent signals across all four calculators, and account for the different terminology each reference panel used. Claude Sonnet executed the analysis. I directed it, reviewed the output, and made all interpretive judgements.
| Signal Category | What to record | Why it matters |
|---|---|---|
| West African similarity | Percentage from each calculator | Should be consistent — your dominant signal |
| North African / Semitic similarity | Percentage from each calculator (sum if split) | Key secondary signal — compare across panels |
| East African similarity | Percentage from each calculator | Smaller but meaningful — note variation |
| Fulani rank and distance | Single-pop rank and distance from each calculator | Most important diagnostic signal |
| Specific West African populations | All populations in top 20 across all calculators | Identifies specific ethnic reference groups |
| 4-population best-fit models | Top 3 from each calculator | Reveals the structural architecture of similarity |
| Ancient lineages | All ancient/HG components with percentages | Reflects deep African prehistoric similarity |
Step 4.2 — Identify the Fulani Signal
In the Oracle single-population list, check if Fulani (labelled Fulani, Nigeria_Fulani, or similar) appears dramatically closer than all other populations. A ratio of 2x or more closer than the next nearest population is highly significant.
Why the Fulani signal matters. The Fulani are genetically unique among West African peoples. They carry approximately 20% ancient North African and Western Eurasian DNA similarity from the Green Sahara period (12,000–5,000 years BP). A strong Fulani Oracle match simultaneously explains the West African core, the North African signal, the East African signal, and the Mali regional signal — as components of a single unified similarity pattern. This was independently confirmed by Fortes-Lima et al. (2025) in the American Journal of Human Genetics.
If Fulani appears as your closest single-population Oracle match across multiple calculators, it is the single most important finding in your analysis and should anchor all further interpretation.
Important caveat: The strength of the Fulani proximity signal documented in the research that produced this guide — distance 9.48 in puntDNAL and 14.18 in EthioHelix, more than twice as close as the next population in both cases — is an unusually strong and distinctive result. It may reflect ancestral connections specific to that individual's lineage rather than a pattern typical of all AST descendants. Readers should not expect the Fulani to appear as a dominant proximity match in their own results. Your closest single-population reference match may differ significantly, and that difference is itself meaningful and worth investigating on its own terms. The methodology documented here applies regardless of which reference populations emerge as closest matches — the Fulani finding is one outcome of the process, not its predetermined destination.
Step 4.3 — Interpret the North African / Semitic Signal
A North African similarity component of 15–35% is common and historically coherent for West African ancestry, but labelled differently across calculators:
| Calculator | Label used | Actual similarity |
|---|---|---|
| EthioHelix K10 | North-Africa (23–25%) | Berber / Amazigh / Iberomaurusian reference populations |
| Dodecad Africa9 | Europe + NW_Africa + SW_Asia (~33%) | Same Eurasian Berber component, split across European reference populations |
| puntDNAL K8 | Western_Semitic (32–33%) | Same component, framed as Semitic |
| MDLP K23b | EHG + EEF + Caucasian (~25%) | Ancient DNA decomposition of same component into prehistoric streams |
Key rule: Sum all North African / Eurasian / European components across calculators and compare the totals. If consistent (within ~10 percentage points), you are seeing the same similarity pattern through different reference panel lenses — not multiple separate sources.
Step 4.4 — Interpret the Multi-Population Oracle Models
- Consistency: A population appearing in all 20 top models is a genuine signal
- The West African / North African pairing: If every Mixed Mode result pairs a West African population with a North African one, this confirms the two-component structure of your similarity profile
- The Fulani in 4-population models: If Fulani appears in nearly every 4-population Oracle model, it is a required component that cannot be explained without it
- The ratio: Note the typical West African / North African split — e.g., 63% / 37%. This ratio is meaningful and consistent
Phase 5Scientific Validation
Step 5.1 — Understanding What the Calculators Measure
Four foundational papers frame the correct interpretation of these results, with David (2024) situating the entire enterprise within the anthropological literature on reparative genealogy:
Kampourakis K & Peterson EL. “The racist origins, racialist connotations, and purity assumptions of the concept of 'admixture' in human evolutionary genetics.” Genetics 223(3), iyad002 (2023). doi:10.1093/genetics/iyad002 — Establishes that calculator outputs are best understood as DNA sequence similarity scores relative to reference populations, not fixed fractions of ancestry from pure sources. Essential reading before interpreting any calculator result.
Fortes-Lima CA et al. “Population History and Admixture of the Fulani People from the Sahel.” Am J Hum Genet. 2025;112(2):261-275. doi:10.1016/j.ajhg.2024.12.015 — The definitive Fulani population genomics study. Confirms ~20% North African similarity, Iberomaurusian ancient DNA, and Green Sahara origins in the Fulani. Directly validates the Oracle Fulani proximity finding documented in the research that produced this guide.
David LT. “Supporting the use of genetic genealogy in restoring family narratives following the transatlantic slave trade.” Am Anthropologist. 2024;126(1):153–157. doi:10.1111/aman.13939 — Establishes genetic genealogy as a reparative anthropological practice for AST descendants. Frames the pursuit of African ancestry through genomics as an act of historical truth-telling and diasporic community reconstruction. The paper trail for AST descendants ends where it always ends; genomics picks up where the archive fails.
Sibomana O. “African Genetic Diversity and Its Implications for Precision Medicine and Pharmacogenomics.” Pharmgenomics Pers Med. 2024;17:487–496. doi:10.2147/PGPM.S485452 · PMC11566596 — Documents the extraordinary genetic diversity within African populations and the limitations of current reference panels in capturing it. Contextualises why calculator outputs for AST descendants should be read as approximations within a continent of unparalleled genomic complexity.
Phase 5 is highly suited to AI assistance. Share your Oracle results and direct your AI assistant to examine papers you have identified, explain how they validate your findings, and summarise the scientific literature in plain language.
In this research, I identified the relevant papers and provided their URLs directly, directing Claude Sonnet to examine and cross-reference them against my Oracle findings. Claude Sonnet performed the analysis — explaining how Fortes-Lima et al. (2025) validated the Oracle Fulani result and how the Kampourakis and Peterson framework applied to the interpretation. Which papers were examined and what questions were asked of them were my decisions. Claude Sonnet executed the cross-referencing and produced the explanatory text.
Step 5.2 — The Convergence Test
| Calculators confirming | Confidence level | Interpretation |
|---|---|---|
| 1 of 4 | Low — possible artifact | Note but do not over-interpret. May reflect reference panel bias. |
| 2 of 4 | Moderate | Worth investigating. Run additional calculators. |
| 3 of 4 | High | Likely a genuine signal. |
| 4 of 4 | Very high | Genuine signal confirmed. |
| 4 of 4 + peer-reviewed science | Definitive | The finding is scientifically validated. |
Phase 6Creating the Similarity Maps
Step 6.1 — Organise Your Findings by Region
- Overall map: All signals combined, with node sizes reflecting similarity strength
- West African origins map: Focus on specific West African populations and Oracle rankings
- North African origins map: Focus on Morocco / Algeria / Libya signals and the Fulani corridor
- East African origins map: Focus on the East African component and its likely Fulani origin
- Multi-calculator summary map: Cross-calculator comparison showing convergence across all four calculators
Step 6.2 — Using AI to Create the Maps
The maps in this guide were produced by Claude Sonnet working from my specifications and data. I directed every aspect — specifying the visual style, colour scheme, content, and layout, uploading the GEDmatch results, and guiding multiple rounds of revision. Claude Sonnet generated the HTML code. The collaboration required no coding on my part — though the engineering instinct informing every decision was very much present throughout.
Phase 6 is the phase most transformed by AI assistance. Share your GEDmatch results — as screenshots, text, or descriptions — and direct your AI assistant to produce HTML visualisation maps. Specify your preferred style, and it will generate complete self-contained HTML files from your specifications.
In this research, I directed the creation of five published maps — North African, East African, West African, scientific validation, and four-calculator comparison — plus an unpublished overall summary that was superseded during revision. These maps are published across the blog under the navigation labels West, North, East, Convergence, and Integration. Claude Sonnet generated the HTML from those specifications and the uploaded GEDmatch data. I reviewed, critiqued, and made all editorial decisions at every stage. Any currently available large language model is capable of this work. Ask for static HTML versions (no JavaScript) if publishing to Blogspot or similar platforms.
Step 6.3 — Design Principles
- Node sizing: Reflect similarity strength proportionally
- Colour coding: Use consistent colours across all maps (e.g., white = West Africa, blue = North Africa, green = East Africa, purple = Fulani Corridor)
- The Fulani Corridor: Represent as a dashed line across the Sahel from Senegal through Mali, Niger, Nigeria, and Sudan
- Methodology banner: Every map should carry a note explaining that percentages represent DNA sequence similarity scores, with the Kampourakis & Peterson (2023) citation
- Schematic geography: Always label maps as schematic — the Africa outline should be simplified for clarity
Phase 7Finding Living Relatives
Step 7.1 — Work Your DNA Matches on Ancestry.com
- Check your Ancestry DNA matches list. Sort by Shared DNA (cM) to see closest matches first
- Look for African matches. Users who list their residence or ancestry as Nigerian, Ghanaian, or from other West African countries
- Use the Leeds Method. Cluster your DNA matches into groups representing different family lines. Free tools include the Leeds Method spreadsheet and DNA Painter
- Reach out thoughtfully. African diaspora users in the US and UK who appear in your match list are often actively researching and receptive to contact
Step 7.2 — Upload to Additional Platforms
| Platform | Notes |
|---|---|
| GEDmatch (gedmatch.com) | Already done in Phase 2. Important for calculator analysis. |
| MyHeritage (myheritage.com) | Free raw data upload. Growing Nigerian and West African user base. |
| FamilyTreeDNA (familytreedna.com) | Free upload for autosomal matching. Best for haplogroup testing. |
| African Ancestry (africanancestry.com) | Paid service. Matches mtDNA and Y-DNA to present-day African ethnic groups. |
Phase 8Documentation and Ongoing Research
Step 8.1 — Document Everything
- Save screenshots or PDFs of every calculator result
- Record your kit number and the date each calculator was run
- Note the calculator version where available
- Save your raw data file in at least two locations
- Keep notes on interpretive conclusions and supporting evidence
Step 8.2 — Stay Current with Research
- Google Scholar alerts: Set up alerts for 'Fulani genetics,' 'West African population history,' 'Yoruba genomics,' 'genetic genealogy African diaspora,' etc.
- ISOGG Wiki (isogg.org): International Society of Genetic Genealogy — updates on best practices and new tools
- The DNA Geek blog (thednageek.com): Excellent explanatory writing on similarity estimates and calculator interpretation
- David LT. Am Anthropologist. 2024;126(1):153–157: Essential anthropological framing of genetic genealogy as reparative practice for AST descendants. Situates this kind of research within the scholarly literature on historical truth-telling and diasporic community reconstruction. Open access via PMC10836826.
Step 8.3 — Haplogroup Testing
- mtDNA haplogroup: Traces your direct maternal line. West African haplogroups include L1, L2, L3
- Y-DNA haplogroup (men only): For West African similarity, Haplogroup E1b1a is common. For North African / Berber similarity, E1b1b is common
- Recommended: FamilyTreeDNA for both tests. African Ancestry for results matched to specific African ethnic groups
Appendix ACalculator Reference
| Calculator | Best for | GEDmatch location |
|---|---|---|
| EthioHelix K10 Africa Only | First-pass African similarity analysis; best North vs West vs East Africa resolution | Admixture > EthioHelix |
| Dodecad Africa9 | Comparison; detecting split between European-labelled and African-labelled Eurasian components | Admixture > Dodecad |
| puntDNAL K8 African Only | Country-level African population specificity; Fulani detection | Admixture > puntDNAL |
| MDLP K23b | Ancient DNA decomposition of Eurasian component; Afroamerican reference confirmation | Admixture > MDLP |
Appendix BInterpreting Oracle Distance Scores
| Distance range | Interpretation |
|---|---|
| Under 5 | Extremely close — essentially the same reference population |
| 5–15 | Very close — strong similarity to this reference population |
| 15–25 | Close — meaningful similarity; worth investigating |
| 25–35 | Moderate — present in similarity profile but not dominant |
| 35–50 | Distant — this reference population appears but is not primary |
| Over 50 | Very distant — weak or indirect connection at best |
The ratio matters more than the absolute score. A Fulani score of 14 is only meaningful because the next population scores 32 — a ratio of 2.3x. Always compare the closest match to the second closest.
Appendix CKey Reference Populations
| Population | Geographic and genetic context |
|---|---|
| Igbo | Nigeria — southeastern Nigeria. Major AST source from Bight of Biafra |
| Yoruba | Nigeria and Benin — southwestern Nigeria. Major AST source from Bight of Benin |
| Hausa | Northern Nigeria and Niger — Sahelian people at the trans-Saharan trade intersection |
| Brong / Akan | Ghana — Akan-speaking people of central Ghana |
| Bambaran / Bambara | Mali — largest ethnic group in Mali |
| Mandenka / Mandinka | Senegambia — historically associated with the Mali Empire |
| Fulani | Pan-Sahelian — Senegal through Mali, Niger, Nigeria, Cameroon to Sudan. Genetically unique, carrying ~20% North African similarity. See Fortes-Lima et al. 2025. |
| Bantu Kenya | Kenya and East Africa — general Bantu-speaking East African reference population |
| Hadza | Tanzania — ancient hunter-gatherers; genetically among most divergent human populations |
| Biaka-Pygmy | Central African Republic — ancient forest hunter-gatherer people |
| Khoi-San | Southern Africa — among the earliest diverging lineages in the human family tree |
This guide was developed alongside a complete DNA similarity analysis conducted by Bill Gray, Tāmaki Makaurau, Aotearoa New Zealand, using publicly available consumer DNA tools, free genomic calculators, and AI assistance (Claude Sonnet, Anthropic). It demonstrates what is possible without institutional access or formal genomics training — only a disciplined analytical mind, patience, and a willingness to follow the evidence honestly.
A final note on time. When the AI assistant was asked how long this project might have taken using traditional methods — given approximately 80 hours invested — the response was instructive: conservatively, six to eighteen months, assuming the right connections were in place.
The DNA analysis itself — running four calculators, interpreting Oracle results, cross-referencing against reference populations — would have required weeks of self-directed learning without an AI partner able to explain outputs in real time. The scientific literature presented a steeper challenge still: identifying and reading four peer-reviewed papers spanning population genomics, genetic philosophy, reparative genealogy, and African pharmacogenomics — and understanding their specific relevance to these Oracle findings — is graduate-level work that might have taken months without a research partner able to cross-reference on demand. (Fortes-Lima et al., 2025; Kampourakis & Peterson, 2023; David, 2024; Sibomana, 2024.)
The five published HTML visualisation maps — produced to precise specifications without writing a line of code — would have required either hiring a developer or acquiring the skills to do it independently. Both cost time and money. The writing across nine posts and this guide, sustained to a consistent standard of coherence, is a significant editorial undertaking on its own.
And the conceptual framework itself — the decision to replace “admixture” with “DNA sequence similarity” and build the entire project on that foundation — is the kind of clarity that typically emerges late in a research process, if at all. Here it was established at the outset.
Eighty hours to produce what is documented here is what AI-assisted citizen science actually enables.
Kampourakis K & Peterson EL. Genetics 223(3), iyad002 (2023) ·
Fortes-Lima CA et al. Am J Hum Genet. 2025;112(2):261–275 ·
David LT. Am Anthropologist. 2024;126(1):153–157 ·
GEDmatch Kit NJ7476284
No comments:
Post a Comment