4. Ecological genomics of the Northern krill: Genome-wide signatures of DNA methylation
This item holds gzipped files and archives with CpG DNA methylation data inferred from signal analysis of Nanopore reads. The files represent increasingly processed methylation data, most of which is contained in tab-separated TSV/CSV spreadsheets.
Methylation was detected using the tool f5c and program and format documentation can be found at:
https://github.com/hasindu2008/f5c
Archive contents:
- methylation_calls.tsv.gz, first-step output from "f5c_x86_64_linux_cuda call-methylation", containing per-read methylation calls.
- methylation_calls_freq.tsv.gz, second-step output from "f5c_x86_64_linux_cuda meth-freq", containing per-CpG-group methylation frequencies from overlapping reads.
- methylation_calls_freq.annotated_all_cpg_sites.tsv.gz, per-CpG-site methylation calls (74.8 M) cross-referenced with genomic regions (e.g. intergenic, intron, CDS) based on genome annotation.
- methylation_calls_freq.annotated_repeated_cpg_sites.tsv.gz, per-CpG-site methylation calls cross-referenced with genomic regions as above, but only including sites that were found to be in repeats (56.7 M).
- methylation_calls_freq.annotated_non_repeated_cpg_sites.tsv.gz, per-CpG-site methylation calls cross-referenced with genomic regions as above, but only including sites that were found not to be in repeats (18.2 M).
- methylation_for_genes.tar.gz, an archive of per-gene DNA methylation and splice isoform data.
- methylation_for_LTRs.tar.gz, an archive of per-LTR DNA methylation data.
- 1.m_norvegica.main_w_mito.fasta.LTR_methylation.tsv, contains DNA methylation statistics for LTR retrotransposons of different evolutionary "age".
For TSV files 3-5
Groups of closely spaced CpGs were split into individual CpG sites. These files were used to measure the DNA-methylation distribution across the genome. The columns in these three files are:
- CONTIG = the name of the sequence
- POS = the position of the CpG site
- REGION = the genomic region
- DNA = read depths
- METHYLATION_FREQ = DNA methylation frequency
methylation_for_genes.tar.gz
This archive contains:
- a tabular file with gene names and gene type
- a tabular file with RNA-seq splice variants per gene
- a tabular file with DNA methylation levels per gene and gene region (e.g. UTRs, CDS, exon, intron, ...)
Together with a script refered to on Github, this data can be used to estimate the average number of transcripts (with 95% confidence intervals) for genes with different DNA methylation levels.
methylation_for_LTRs.tar.gz
This archive contains:
- a tabular file with LTR names and identity scores
- a tabular file with the detected LTR domains per repeat
- a tabular file with DNA methylation levels per gene and gene region (e.g. UTRs, CDS, exon, intron, ...)
Together with a script refered to on Github, this data can be used to estimate the mean LTR levels (with 95% confidence intervals) for LTRs with different identity scores. It includes the data in methylation_calls_freq.annotated_all_cpg_sites.tsv.gz.
1.m_norvegica.main_w_mito.fasta.LTR_methylation.tsv
This TSV file contains CpG DNA methylation data for 1,706 LTR retrotransposons. It also includes the identity scores between 5' and 3' LTRs. TSV column fields are:
- CHROM = name of sequence
- REPEAT = name of LTR repeat
- REPEAT_START = start coordiate of repeat
- REPEAT_STOP = stop coordinate of repeat
- REPEAT_LENGTH = repeat length (bp)
- IDENTITY = identity scores between 5' and 3' LTR regions
- METHYLATION_N = number of CpG sites
- METHYLATION = average DNA methylation rates
Funding
Climate genomics in the Northern krill: the past, present and future of an important marine species
Swedish Research Council for Environment Agricultural Sciences and Spatial Planning
Find out more...