BLR pipeline source code
Archived version of BLR pipeline for the submission of paper "BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies".
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. Here we present Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10x Genomics, TELL-seq and stLFR. We used BLR to explore the impact of using single versus multiple genomic DNA molecules per droplet in the DBS technology. While both approaches, in spite of short molecule lengths, yielded megabase-scale phase blocks with low switch-error rates, single-molecule resolution improved phasing contiguity. In a high-coverage data, combining the two DBS datasets, large structural variants showed concordance with 10x Genomics and PacBio. In addition, the phasing of protein-coding genes showed that most (93.8%) matched phasing from a GIAB benchmark set. Comparing Long Ranger to BLR on 10x Genomics data showed a four-time increase in phase block N50 for BLR while maintaining low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications for the respective technologies. In conclusion, BLR presents a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.