SciLifeLab
Browse
.GZ
methylation_calls.tsv.gz (23.06 GB)
.GZ
methylation_calls_freq.tsv.gz (849.01 MB)
.GZ
methylation_calls_freq.annotated_all_cpg_sites.tsv.gz (381.84 MB)
.GZ
methylation_calls_freq.annotated_repeated_cpg_sites.tsv.gz (286.07 MB)
.GZ
methylation_calls_freq.annotated_non_repeated_cpg_sites.tsv.gz (95.77 MB)
ARCHIVE
methylation_for_genes.tar.gz (3.86 MB)
ARCHIVE
methylation_for_LTRs.tar.gz (382.03 MB)
DATASET
1.m_norvegica.main_w_mito.fasta.LTR_methylation.tsv (130.52 kB)
TEXT
MANIFEST.txt (0.46 kB)
TEXT
README.txt (4.09 kB)
1/0
10 files

4. Ecological genomics of the Northern krill: Genome-wide signatures of DNA methylation

dataset
posted on 2024-03-28, 00:10 authored by Andreas WallbergAndreas Wallberg, Per UnnebergPer Unneberg

This item holds gzipped files and archives with CpG DNA methylation data inferred from signal analysis of Nanopore reads. The files represent increasingly processed methylation data, most of which is contained in tab-separated TSV/CSV spreadsheets.

Methylation was detected using the tool f5c and program and format documentation can be found at:

https://github.com/hasindu2008/f5c

Archive contents:

  1.  methylation_calls.tsv.gz, first-step output from "f5c_x86_64_linux_cuda call-methylation", containing per-read methylation calls.
  2. methylation_calls_freq.tsv.gz, second-step output from "f5c_x86_64_linux_cuda meth-freq", containing per-CpG-group methylation frequencies from overlapping reads.
  3. methylation_calls_freq.annotated_all_cpg_sites.tsv.gz, per-CpG-site methylation calls (74.8 M) cross-referenced with genomic regions (e.g. intergenic, intron, CDS) based on genome annotation.
  4. methylation_calls_freq.annotated_repeated_cpg_sites.tsv.gz, per-CpG-site methylation calls cross-referenced with genomic regions as above, but only including sites that were found to be in repeats (56.7 M).
  5. methylation_calls_freq.annotated_non_repeated_cpg_sites.tsv.gz, per-CpG-site methylation calls cross-referenced with genomic regions as above, but only including sites that were found not to be in repeats (18.2 M).
  6. methylation_for_genes.tar.gz, an archive of per-gene DNA methylation and splice isoform data.
  7. methylation_for_LTRs.tar.gz, an archive of per-LTR DNA methylation data.
  8. 1.m_norvegica.main_w_mito.fasta.LTR_methylation.tsv, contains DNA methylation statistics for LTR retrotransposons of different evolutionary "age".

For TSV files 3-5

Groups of closely spaced CpGs were split into individual CpG sites. These files were used to measure the DNA-methylation distribution across the genome. The columns in these three files are:

  • CONTIG = the name of the sequence
  • POS = the position of the CpG site
  • REGION = the genomic region
  • DNA  = read depths
  • METHYLATION_FREQ = DNA methylation frequency

methylation_for_genes.tar.gz

This archive contains:

  • a tabular file with gene names and gene type
  • a tabular file with RNA-seq splice variants per gene
  • a tabular file with DNA methylation levels per gene and gene region (e.g. UTRs, CDS, exon, intron, ...)

Together with a script refered to on Github, this data can be used to estimate the average number of transcripts (with 95% confidence intervals) for genes with different DNA methylation levels.

methylation_for_LTRs.tar.gz

This archive contains:

  • a tabular file with LTR names and identity scores
  • a tabular file with the detected LTR domains per repeat
  • a tabular file with DNA methylation levels per gene and gene region (e.g. UTRs, CDS, exon, intron, ...)

Together with a script refered to on Github, this data can be used to estimate the mean LTR levels (with 95% confidence intervals) for LTRs with different identity scores. It includes the data in methylation_calls_freq.annotated_all_cpg_sites.tsv.gz.

1.m_norvegica.main_w_mito.fasta.LTR_methylation.tsv

This TSV file contains CpG DNA methylation data for 1,706 LTR retrotransposons. It also includes the identity scores between 5' and 3' LTRs. TSV column fields are:

  1. CHROM = name of sequence
  2. REPEAT = name of LTR repeat
  3. REPEAT_START = start coordiate of repeat
  4. REPEAT_STOP = stop coordinate of repeat
  5. REPEAT_LENGTH = repeat length (bp)
  6. IDENTITY = identity scores between 5' and 3' LTR regions
  7. METHYLATION_N = number of CpG sites
  8. METHYLATION = average DNA methylation rates



Funding

Climate genomics in the Northern krill: the past, present and future of an important marine species

Swedish Research Council for Environment Agricultural Sciences and Spatial Planning

Find out more...

History

Publisher

Uppsala University

Usage metrics

    Andreas Wallberg Lab

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC