UPDATE 10/16/2016: The metrics below are becoming dated. To see how current NGS offerings compare for Y-DNA coverage, please visit: haplogroup-r.org. Statistics are updated on a weekly basis.
The advent of low-cost Next Generation Sequencing has revolutionized the understanding of the Y-DNA tree. Rather than looking at a few hundred bases or a fixed panel of probes, these tests read millions of base pairs in the subject’s genome. Discovering the patterns between many such results and building from ISOGG’s 2014 tree the R1b group has seen an explosion of new branches.
One of the most frequent questions I’m asked as a project administrator and data analyst is what NGS test should I take? The common refrain from conference speakers has been all to often Family Tree DNA’s Big Y because they have a huge database. This made sense when the cost differential with their chief competition Full Genome Corporation’s Full Y (aka Y Elite 1.0) was a factor of two. Now that there is only $200 separating the two tests is that still the best advice?
Having run my analysis tool chain on over 300 NGS tests perhaps the numbers can steer us to a better conclusion. Both tests are primarily designed to capture the Y-DNA sequence. Adamov et al. (2014) combines the regions from Poznik et al. (2013) and the generalized Big Y BED file from the FTDNA (2014) Big Y white paper. The intersection of these regions is 8,473,821 base pair of the Y-chromosome. These coordinates are most suitable for mapping short reads like those in Big Y and Y Elite. This allows a higher confidence that variants detected in these regions are not false positives. The test that can maximize coverage of this region is the most effective in finding useful branch points for the Y tree.
The resulting BAMs of the Y-DNA Variant Discovery Workflow – Pt.1 are processed with bedtools genomecov to produce a BED coverage file. A custom script is used to total the ranges for each chromosome. The results are then tabulated for minimum, maximum, average, and median values. The current data set includes all positions with at least one read. Future analysis may be performed to only look at regions with five or more reads.
Family Tree DNA – Big Y ($575 regular price)
Figure 1 includes the summary of 277 Big Y BAMs sourced from haplogroup R1b. FTDNA began actively filtering segments not assigned to the Y chromosome by their aligner. This reduces the overall average of read lengths, which are not considered to be on target compared with earlier results. Off-target results are still reported in my calling process due to BWA-MEM aligning them differently.
Full Genomes Corp – Y Elite 2.0 ($775 regular price)
Figure 2 includes the summary of 27 Y Elite 1.0 BAMs sourced from haplogroup R1b. Based on examination of the BAM headers Full Genomes uses a methodology that is quite similar to my own. Therefore the results in the table are directly from the original files. Recently FGC has updated Y Elite to include longer 250 base pair read lengths. No 2.0 results were available at the time this is being prepared however.
From a pure coverage perspective Y Elite clearly provides more data in a consistent manner. There is 1,447,472 base pair difference between the minimum and maximum Y chromosome results. On average 22,772,412 base pair are read for a total of 38.35% of chromosome. The combBED regions see 99.78% coverage on average. The test results in 29,625 base pair per dollar.
The less costly Big Y test presents a bit more variability. There is a 8,769,725 base pair difference between the minimum and maximum Y chromosome results. 16,086,565 base pair are returned on average for the entire chromosome. The combBED regions see 96.07% coverage on average. At regular price 27,968 base pair are sampled per dollar.
In total Y Elite covers 6,685,847 additional base pair for $200 more. This extra 11% of the chromosome offers additional opportunity to discover branches, which will go completely unnoticed in the field currently dominated by Big Y.
An example of this effect is R-A542 below CTS4466. The branch has four Big Y results and a sample from PGP: Harvard. With the roughly the same coverage area as Y Elite the PGP sample retains 39 ‘private’ variants. The Big Y testers have an average of 5.66 ‘private’ variants. To determine if any of the PGP samples variants might further divide the group, the four Big Y testers would need to individually test the sites at $17.50 at a lab like yseq.net. The other option is to fully retest with Y Elite 2.0. R-Z16251 (formerly L270.2) is another large branch resisting subdivision with the areas tested in Big Y.
From a reporting standpoint neither test is particularly friendly for the tester. While FTDNA has made strides on this end as the SNP Packs are helping to refine their tree, they remain months behind community knowledge of the tree. FGC’s reporting is similarly limited in that they can only provide a comparison with others in their database. At the end of the day you are beholden to your haplogroup’s volunteer analysts to make heads or tails of the results.
The old advice to always recommend Big Y is not the best from a value perspective. Testers have large opportunity costs, as important variants for their line may remain undetected in the lower total coverage. They may find themselves Sanger testing variants found in matches, which can quickly eat into the upfront savings. If the $200 difference is not out of the question, Y Elite 2.0 is the better option on all metrics except absolute cost and processing time.
Another important consideration is that the off-target coverage in Y Elite is significant. Many testers will also receive full coverage of their mtDNA, and approximate values for their Y-STRs. In future articles I’ll be comparing the better than 20% autosomal coverage with 23andMe V3 and AncestryDNA results.
Adamov et al. (2015), Defining a New Rate Constant for Y-Chromosome SNPs based on Full Sequencing Data. Retrieved June 12, 2015 from The Russian Journal of Genetic Genealogy.
FTDNA (2014), Big Y White Paper. Retrieved Nov 10, 2015 from www.ftdna.com.