Thursday, November 21, 2013
The old Eurogenes K13 has been replaced by a new model with different, and hopefully more robust, ancestral clusters. The new version also includes Oracles as well as 2D and 3D Principal Component Analyses (PCA). The K13 population averages and genetic (Fst) distances between the inferred ancestral clusters are available here and here, respectively.
GEDmatch > Ad-Mix Utilities > Eurogenes > K13 Below is a 2D PCA based on the average K13 results of the European and Asian reference populations, courtesy of project member PL16.
Thus, Eurogenes now has four tests at GEDmatch with Oracles: the Jtest, EUtest, EUtest V2 and the K13. It's useful to keep in mind that these tests will differ in their interpretation of the data, and perhaps accuracy, depending on the ancestry of the user. For instance, the new K13 should be more useful for Central and South Asians than any of the others, because it features new reference samples from these regions.
Monday, October 7, 2013
This new test is essentially an upgraded version of the EUtest. Unlike the original, it includes an Amerindian component and five native reference populations from North and Central America. So obviously it should be a lot more useful for users from the New World who are wondering about Amerindian admixture. GEDmatch > Ad-Mix Utilities > Eurogenes > Eurogenes EUtestV2 K15 I just tried it myself, and have say that the 4-Ancestors Oracle results were impressive. In other words, they were very accurate based on what I know about my recent ancestry. On the other hand, I'd say the default Oracle was picking up more ancient gene flows. However, this might not be the case for everyone, so let's hear some feedback, discuss the outcomes, and perhaps tweak the settings if necessary. One of the most important things to keep in mind is to ignore all results under 1%. These are likely to be noise. Also, please note that this test was designed for 23andMe and FTDNA files (not Geno 2.0 or Ancestry). Here are the populations averages and Fst distances between the components. Below are gradient maps of the main West Eurasian components courtesy of Gui (FR7): Baltic, North Sea, Atlantic, East Euro, West Med, East Med, West Asian.
Sunday, August 25, 2013
I've recently been looking at ways to incorporate X chromosome data into my ancestry tests and experiments. Below I describe an analysis of the X chromosomes of two samples from the 1000 Genomes Project. The X chromosome is not like the 22 autosomal chromosomes. For instance, males only carry one copy, which means it can be a challenging source of markers for some analyses. I overcame this problem by using only male or female X chromosomes, and creating more female-like X chromosomes by combining two male chromosomes into one from the same populations. I ran a diagnostic global PCA (see here) to find outlier, and thus presumably admixed, X chromosomes in order to test them further. There were quite a few interesting results, including two from Kent, England. They're shown on the global PCA linked to above, and the one below, as as Kent_HG00141 and Kent_HG01791. On both plots they're drifting slightly towards East Asia. here.
Friday, May 3, 2013
In a new article at my main blog I describe how Europeans became more hunter-gatherer-like genetically after the Neolithic. Yep, after the Neolithic. It's a theory that was entertained by some archaeologists and linguists in the past, but we're now seeing strong signals in ancient and modern DNA data to back it up.
Modern European ADMIXTURE components = Neolithic ecological zones (+ post-Neolithic in-situ expansions)
Update 05/05/2013: I was about to make a GEDmatch ancestry test based on the three Neolithic components described in my post, but then I realized that such a test was already available there, and it's called the Eurogenes K9. All you have to do is re-name several of the components in the following way...
North European > Corded Ware, Single Grave, Battle Axe, Unetice and/or Bohemian Bell BeakerAncient DNA will tell us one day whether these assumptions are correct. However, please note that the new component names won't be relevant for everyone. For instance, if you're from East Africa or Turkey, then the Caucasus-like admixture you score won't be due to LBK farmers, because they were based in Western and Central Europe.
Mediterranean > Impressed Ware, Cardial Ware, Maritime Bell Beaker and/or Funnelbeaker (TRB)
Caucasus & Southwest Asian > Linearbandkeramik (LBK)
Saturday, March 9, 2013
I've just put together a new test for GEDmatch called the Eurogenes K36. Obviously, the K36 means that it features thirty six ancestral clusters. It probably won't include any Oracles, mostly because the Calculator Effect would render these useless if they were based on the average results of the reference samples (see the sheet here for details), and it'd be very time consuming for me to test a wide variety of other samples in supervised mode using thirty six sets of allele frequencies.
The main purpose of the Eurogenes K36 is to help users unravel the ethnic origins of local areas of their genomes (aka. half-segments), hence the high number of ancestral categories, some of which are very specific. In other words, the test is mainly a chromosome painting utility. It's accessible via the GEDmatch Ad-Mix link below:
GEDmatch > Ad-Mix page > Eurogenes > Eurogenes K36
An important point to keep in mind is not to take the ancestry proportions too literary. If you're, say, English, and you get an Iberian score of 12% this doesn't actually mean you have recent ancestry from Spain or Portugal. What it means is that 12% of your alleles look typical of the reference samples classified as Iberian, and this figure might only indicate recent Iberian admixture if it's clearly higher than those of other English users.
Another way to look at it is that the ancestry proportions are like map coordinates, and they'll place you with a very high degree of accuracy on a genetic map featuring other users. Indeed, please feel free to post your scores and ancestry details in the comments below to help others get an idea of what their results might represent. My results are listed below. The scores put me squarely in Poland relative to those of other European samples I've run, which is correct.
Also worth mentioning is that this test focuses on much deeper ancestry than the Ancestry Composition at 23andMe. Hence, I expect that many Europeans will score a few percent in non-European clusters. However, like many ADMIXTURE results, this could give us strong hints about population movements into Europe during prehistory and early history, so it's worth keeping an eye on.
Monday, December 3, 2012
The Jtest and EUtest at GEDmatch now include a new tool called the 4-Ancestors Oracle (aka. Oracle-4), as well as the 3D PCAs I promised earlier. Oracle-4 will attempt to pinpoint your ethnic group of origin, and then also work out the most likely combinations of two, three and four ancestral populations which make up your genome. However, this doesn't mean the results will actually show your ethnic group, or those of your parents (in dual mode) or grandparents (4-way mode). They might for many people, but for others they'll reflect the best possible outcomes from the reference samples available.
GEDmatch Ad-Mix Utilities
Enjoy, and feel free to give feedback to John at GEDmatch if you think it might be useful (but please don't spam his account).
Saturday, November 10, 2012
Following a recent update, the SPatial Ancestry analysis (SPA) software now allows individual users to pinpoint their genetic origins on a Google map (see here). Unfortunately, the allele "model" files used for this purpose only seem to produce accurate results for people of British ancestry.
I suspect the problem with the European model file (europe.model) is that it's based on the POPRES dataset, which includes hundreds of samples from the British Isles and surrounds, but very few or none at all from many other parts of Europe. I'm not sure what the issue is with the world model file, based on the HGDP dataset, but there's definitely an issue, because apparently most people end up in the Atlantic.
As a result, I thought I'd help out and put together a few SPA model files of my own, using my Eurogenes Project dataset, which is now more comprehensive than most academic datasets. But please, in order to get sensible results, only use the files that are at least broadly relevant to your ancestry.
Eurogenes_Eurasian_cline.model - as per the name, this file will plot you along the genetic cline that runs from Europe to the Far East. Most Europeans will land somewhere in Central or Eastern Europe, while most unadmixed East Asians in China or Korea. South Asians will plot somewhere between Europe and South India, depending on the level of the so called Ancestral North Indian (ANI) component in their genomes.
Eurogenes_Finland.model - Finns are difficult to plot correctly on genetic maps when lots of other groups are included, due to their demographic history and eastern admixture. This Finland-specific file attempts to get around such problems. West Finns will land on the Baltic coast, or in the Baltic Sea, while East Finns will plot near Russia, or just over the Russian border.
More model files will appear here in due time. Below are some basic instructions how to run SPA with my model files. There are different ways of doing this, but this is what I do.
Download spa.exe from the SPA website and place it into your WINDOWS folder.Please let me know how it went, so I can make adjustments to the files if needed. However, it's important to understand that placing many people withing the borders of their countries of origin won't be possible, basically due to the fact that modern political borders often don't reflect substructures created by ancient gene flows. This is especially true for groups that don't fall within the normal clines of genetic diversity, like Sardinians and Basques.
Download the Eurogenes_Eurasian_cline.model zip file, unzip it, and place it into your C: folder.
Download your raw data from 23andMe, unzip it, rename the text file Test.txt, and place it into your C: folder.
Call up the Command Prompt window (cmd). For instance, go to the windows menu in the bottom left of the screen, type cmd into the search field, and double click on cmd.exe.
In your cmd window type the following command: spa --mfile c:\Test.txt --model-input c:\Eurogenes_Eurasian_cline.model --location-output EUC.loc
To split your genome into two, type: spa --mfile c:\Test.txt --model-input c:\Eurogenes_Eurasian_cline.model --location-output EUC.loc -n 2
Go to your user directory to collect the results (C: > Users > Whatever your username is).
Indeed, based on my experiments with this software, I have to conclude that the correlation between genes and geography isn't as tight as has been claimed recently (see here). It's actually very difficult to manipulate the analysis to make sure the vast majority of ethnic groups match their geography.
Therefore, as always, the best way to interpret the results is to compare them with those of other users. For instance, if you're Korean and find yourself just outside of Korea when using the Eurasian cline file, this need not mean you have recent foreign ancestry. In order to test this, check where other Korean samples land, specifically those from the same region as yourself. You're likely to find that they all plot near your pin (or pins in dual mode).
SPatial Ancestry analysis (SPA) for 23andMe clients (at my other blog)
SPatial Ancestry analysis (SPA) website
The main orientations of human genetic differentiation