What Do We See With WGS Testing?
Previous articles on this blog have explored the number behind what some NGS testing results can achieve, but the abstract concepts can be hard to grasp. Recently I’ve added the ability to create histograms to compare coverage in a more visual manner with the Next Generation Sequencing Statistics table. This raised an interesting thought. How well is each chromosome being captured in a typical 30x WGS test? I have just such a sample from YSEQ and the software is adaptable.
Explanation of Graphs
Each chromosome’s histogram at full scale consists of a horizontal pixel equating to 10,000 bases of the GRCh38. The colors indicate regions with at least four reads. Green indicates that 90% of the reads have mapping quality scores with less than 10% probability of being aligned incorrectly. Red indicates a good likelihood the reads may belong somewhere else on the genome. White gaps in the lines indicate regions where the reference sequence is unknown due to its incomplete nature. Each histogram bar link can be viewed for higher definition.
Chr1 – Callable Length: 215,989,357
Chr2 – Callable Length: 231,332,640
Chr3 – Callable Length: 192,259,680
Chr4 – Callable Length: 183,065,635
Chr5 – Callable Length: 171,964,004
Chr6 – Callable Length: 160,243,678
Chr7 – Callable Length: 149,224,935
Chr8 – Callable Length: 136,344,857
Chr9 – Callable Length: 109,687,357
Chr10 – Callable Length: 127,758,507
Chr11 – Callable Length: 127,659,488
Chr12 – Callable Length: 127,733,307
Chr13 – Callable Length: 94,014,907
Chr14 – Callable Length: 83,497,057
Chr15 – Callable Length: 72,753,886
Chr16 – Callable Length: 72,008,056
Chr17 – Callable Length: 70,793,649
Chr18 – Callable Length: 72,826,341
Chr19 – Callable Length: 52,445,036
Chr20 – Callable Length: 60,060,518
Chr21 – Callable Length: 32,903,198
Chr22 – Callable Length: 32,612,901
ChrX – Callable Length: 144,455,757
ChrY – Callable Length: 15,260,227
ChrM – Callable Length: 16,486
[Not Shown Due To Scale]
When looking at our sequencing options for genetic genealogy, we tend to focus on ChrY and ChrM. We as a community spend countless hours discussing the cost per allele in Next Generation Sequencing Statistics. As we currently value the test for Y DNA discoveries, this test was two times as expensive as a Big Y 700. That figure is not representative though as seen above. Now applying the same metric, the 30x WGS comes in with an astounding 2.3 million bases per dollar invested. Attention needs to move away from squeezing every last generation marker in chrY, and looking in the other 99.4% of our genome.
The SNP tsunami in the last five years is an insignificant blip compared what we still have to learn of genetic genealogy.
Heading Into the Future
As I finish this brief report many new questions start to pose themselves:
- What can we learn from our study of Y DNA and apply to the recombining autosomal regions?
- How do we even try to compare this data while protecting personal and familial privacy?
- Should we be trying this at all given the current unintended uses of our samples in databases like FTDNA or Gedmatch for law enforcement?