Targeted sequencing of 252 genes based on their relevance in lymphoid malignancies

dataset

posted on 2022-05-12, 14:05 authored by Silvia Bonfiglio, Lesley-Ann Sutton, Viktor LjungströmViktor Ljungström, Antonella Capasso, Tatjana Pandzic, Simone Weström, Hassan Foroughi AslHassan Foroughi Asl, Aron Skaftason, Anna Gellerbring, Anna Lyander, Francesca Gandini, Gianluca Gaidano, Livio Trentin, Lisa Bonello, Gianluigi Reda, Csaba Bodor, Niki Stravoyianni, Costantine S. Tam, Roberto Marasca, Francesco Forconi, Panayiotis Panayiotidis, Ingo Ringshausen, Ozren Jakšić, Alessandra Tedeschi, Anna Maria Frustaci, Sunil Iyengar, Marta Coscia, Stephen P. Mulligan, Loïc Ysebaert, Vladimir Strugov, Carolina Pavlovsky, Zadie Davis, Anders Österborg, Diego Cortese, Pamela Ranghetti, Panagiotis Baliakas, Kostas Stamatopoulos, Lydia Scarfò, Richard Rosenquist BrandellRichard Rosenquist Brandell, Paolo Ghia

Dataset description

Data consists of CRAM file from capture-based gene panel sequencing (Twist Bioscience) of 252 genes selected based on their relevance in lymphoid malignancies. The panel also included genome-wide backbone probes for copy-number analysis. The preprared libraries were then subsequenlty equenced in paired-end mode (2x150bp) on the Illumina NovaSeq 6000 (Illumina Inc.).

BALSAMIC was used to analyze the FASTQ files and aligning them to reference genome. Trimmed reads were mapped to the reference genome hg19 using BWA MEM v0.7.15 4. The resulting SAM files were converted to BAM files and sorted using samtools v1.6. Duplicated reads were marked using Picard tools MarkDuplicate v2.17.0. And finally converted to CRAM files using samtools v1.6.

Note: CRAM is a sequencing read file format that is highly space efficient by using reference-based compression of sequence data and offers both lossless and lossy modes of compression: https://www.ebi.ac.uk/ena/cram/

Data Access Statement

The data is under restricted access and can be accessed upon request through the email-adress below.

The targeted sequence datasets are only to be used for research aimed at advancing the understanding of genetic factors in the chronic lymphocytic leukemia. Applications aimed at method development including bioinformatics would not be considered as acceptable for use of this dataset.