Recently there have been accusations of poor quality from BGI-SEQ 500 tests, such as those used by Dante Labs for their 30x WGS product. These claims are not supported by the literature.
“The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093).” Mak et al. (2017). Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing. GigaScience, 6(8), 1–13. doi:10.1093/gigascience/gix049
“Overall, the BGISEQ-500 and HiSeq X Ten sequencing platforms show a high concordance to germline genotypes ascertained from SNP arrays. Both sequencing platforms show a high concordance to each other in their ability to detect germline and somatic SNVs and indels.” Patch et al. (2018). Germline and somatic variant identification using BGISEQ-500 and HiSeq X Ten whole genome sequencing. https://doi.org/10.1371/journal.pone.0190264
To investigate further six samples were selected from the Y-DNA Warehouse’s BAM collection. The samples are sourced from three donor males each who have multiple NGS tests. The first donor has taken the controversial Dante Labs test and Full Genomes flagship 10x Chromium test. The second donor has taken Veritas Genetics 30x WGS test with 150 base pair reads and the 10x Chromium as well. The third sample was selected as the donor has both a targeted Y Elite 100 base pair test and a more modern 30x WGS test with 150 base pairs reads run on a NovaSEQ. All three men are R-DF13.
Each WGS sample was prepared using the same pipeline of BWA-MEM and GATK components to generate a gVCF file of chrY. The 10x Chromium tests used the vendor provided BAM, but all other steps remain the same. The gVCF files for each man was then joint genotyped with GATK to produce a call table. The table was summarized for concordant calls (both BAMs show the same mutation), calls that are covered in one test but not the other, and finally sites where the calls do not match.
Sample 1 (Dante 30x WGS vs 10x Chromium)
|Sample 1||Total||Concordant||% Concordant||BGI Only||10x Chromium Only||Mismatch||% Mismatch|
Sample 2 (Veritas 30x WGS vs 10 Chromium)
|Sample 2||Total||Concordant||% Concordant||Veritas Only||10x Chromium Only||Mismatch||% Mismatch|
Sample 3 (Y Elite 1 vs 30x WGS)
|Sample 3||Total||Concordant||% Concordant||100bp Targeted Only||150bp WGS Only||Mismatch||% Mismatch|
The first observation is all three men see large percentages of sites with mismatching calls between their tests. The 49.6 to 53.3% ratio is largely due to occurring in palindromic regions where multiple alleles are present. This phenomena is caused by ambiguous read alignments to the reference. It is possible that the 10x Chromium tests will be more reliable in these regions. The Chromium homozygous SNP calls in the non-matching set account for 71% of the difference in Sample 1 and 59% in Sample 2. In sample 3 the additional 50 base pair read length only sees 2% more of the calls becoming homozygous. This potentially much improved call rate shows the true potential of 10x Genomics platform and why one may wish to consider the test for Y DNA applications.
Next we can notice that the Dante 30x WGS performs similarly to the Veritas Genetics 30x WGS test. The mismatch rate on SNPs slightly favors the BGISEQ-500 test, while the INDEL rate slightly favors the Illumina HiSeq. From these samples we see what Mak et al. (2017) and Patch et al. (2018) already established, the data quality of the BGISEQ-500 is very similar to traditional Illumina result sets.
Finally, the Y Elite and 30x WGS test shows the most concordant results as expected. The main difference in the technology are generations of Illumina instruments and 50 base pair longer reads in the WGS. The samples were mostly included for the purposes of showing effects of the read lengths in traditional NGS approaches.
The existing peer reviewed literature has found no significant differences between the BGISEQ-500 and Illumina instruments. An initial comparison of the data between a man from the Y-DNA Warehouse tested on Dante Lab’s $599 regular priced 30x WGS test showed a higher concordance of calls with the 10x Genomics solution compared with a similar individual. This adds more evidence supporting the literature by mixing in a much more capable instrument.
The most significant problem plaguing Dante Labs is the inordinate wait time for sequencing results. Reports are of wait times of six months or more to receive the VCF file. Then longer still to have a hard-drive delivered at extra cost with the raw BAM files. While the final results appear to hold up well on all accounts, getting to that finish line requires a good amount of patience. If you need results more quickly, a traditional 15x WGS test is just a few hundred dollars more from other direct-2-consumer vendors.