Data for the Populus tremula v2.2. genome project and associated genome-wide association study
Background to the study: We have produced a chromosome-scale genome assembly generated using long-read sequencing, optical and high-density genetic maps containing 39,894 annotated genes with functional annotations for 73,765 transcripts in 37,184 genes. We conducted whole-genome resequencing of the Umeå Aspen (UmAsp) collection comprising 227 aspen individuals. We utilised the assembly and existing whole genome re-sequencing data to perform genome-wide association analyses (GWAS) using Single Nucleotide Polymorphisms (SNPs) in the UmAsp, Swedish Aspen (SwAsp) and Scottish Aspen (ScotAsp) collections for leaf physiognomy phenotypes. We conducted Assay of Transposase Accessible Chromatin sequencing (ATAC-Seq) and identified genomic regions of accessible chromatin, and subset SNPs to these regions, which improved the GWAS detection rate. We identified candidate long non-coding RNAs in leaf samples and quantified their expression in an updated co-expression network, which we used to explore the functions of candidate genes identified from the GWAS.
This data set comprises: the ATAC-Seq peaks from the ATAC-Sequencing of aspen leaves, 'Aspen leaf ATAC_Seq peaks.zip'; and the gene expression matrix of mean values per aspen genotype from the SwAsp collection, 'Gene_Expression_matrix_genotype_mean.tsv'. We provide a zipped directory for each of 'ScotAsp.zip', 'SwAsp.zip' and 'UmAsp.zip' providing the raw leaf image scans, the cropped leaf images and raw data files from the LAMINA leaf shape analyses of these images, and the processed data files and genotypic BLUP values for each of these ScotAsp, SwAsp and UmAsp collections. We provide the GWAS associations of SNPs ranked by decreasing P-value until the 1000th gene for each of the 26 leaf physiognomy traits for each collection, i.e. 'ScotAsp top-ranked GWAS results', 'SwAsp top-ranked GWAS results' and 'UmAsp top-ranked GWAS results'. The single nucleotide polymorphism (SNP) data for each of the aspen collections is in 'ScotAsp_biallelic_Het.HWE.recode.vcf.gz', 'SwAsp_AfterBatchRemoval_biallelic_Het.HWE.recode.vcf.gz' and 'UmAsp_biallelic_Het.MAF.HWE.recode_.vcf.gz'.