SciLifeLab
Browse

Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (<i>Parnassius mnemosyne</i>)

Version 2 2024-06-26, 11:34
Version 1 2024-06-25, 13:00
dataset
posted on 2024-06-26, 11:34 authored by Jacob Höglund, Guilherme Dias, Remi-André Olsen, André Soares, Ignas Bunikis, Venkat Talla, Niclas Backström
<p dir="ltr">This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (<i>Parnassius mnemosyne</i>), published in:</p><p dir="ltr">Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). <i>A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): </i><i>A Species of Global Conservation Concern. </i>Genome Biology and Evolution, 16(2), evae031. <a href="https://doi.org/10.1093/gbe/evae031" rel="noreferrer" target="_blank">https://doi.org/10.1093/gbe/evae031</a></p><p dir="ltr">Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project <a href="https://www.ebi.ac.uk/ena/browser/view/PRJEB76269" rel="noreferrer" target="_blank">PRJEB76269</a>.</p><p dir="ltr">The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: <a href="https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1" rel="noreferrer" target="_blank">GCA_963668995.1</a>), and the mitochondrial genome assembly (ENA accession: <a href="https://www.ebi.ac.uk/ena/browser/view/OZ075093.1" rel="noreferrer" target="_blank">OZ075093.1</a>).</p><p><br></p><p dir="ltr">Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.</p><p><br></p><ul><li>pmne_functional_edit1.gff.gz</li></ul><p dir="ltr">contains the functional annotation (protein coding genes) of the primary genome assembly (<a href="https://www.ebi.ac.uk/ena/browser/view/GCA_963668995.1" rel="noreferrer" target="_blank">GCA_963668995.1</a>). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/963/668/995/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11/GCA_963668995.1_Parnassius_mnemosyne_n_2023_11_genomic.gff.gz" rel="noreferrer" target="_blank">at this link</a>.</p><p dir="ltr">The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (<a href="https://github.com/NBISweden" rel="noreferrer" target="_blank">https://github.com/NBISweden</a>).</p><p><br></p><ul><li>pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz</li></ul><p dir="ltr">contains a transcript assembly of the Illumina RNAseq reads (ENA accession: <a href="https://www.ebi.ac.uk/ena/browser/view/ERX11559451" rel="noreferrer" target="_blank">ERX11559451</a>). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).</p><p><br></p><ul><li>pmne_mtdna.gff.gz</li></ul><p dir="ltr">contains the functional annotation of the mitochondrial genome assembly (ENA accession: <a href="https://www.ebi.ac.uk/ena/browser/view/OZ075093.1" rel="noreferrer" target="_blank">OZ075093.1</a>). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).</p><p><br></p><ul><li>pmne_ncRNAs.gff.gz</li></ul><p dir="ltr">contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.</p><p><br></p><ul><li>pmne_tRNAs_and_pseudogenes.gff.gz</li></ul><p dir="ltr">contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).</p><p><br></p><ul><li>pmne_PacBio_isoseq.sorted.bam</li></ul><p dir="ltr">contains the PacBio IsoSeq transcripts (ENA accession: <a href="https://www.ebi.ac.uk/ena/browser/view/ERX11559436" rel="noreferrer" target="_blank">ERX11559436</a>) aligned to the primary genome assembly.</p><p><br></p><ul><li>pmne_repeat_library.fa.gz</li></ul><p dir="ltr">contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).</p><h3>Available variables</h3><p dir="ltr">For a description of the column headers of the files, please see the following links to the documentation of the different file formats.</p><p dir="ltr">The GFF3 format (.gff) is described here: <a href="https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md" rel="noreferrer" target="_blank">https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md</a></p><p dir="ltr">The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: <a href="https://samtools.github.io/hts-specs/SAMv1.pdf" rel="noreferrer" target="_blank">https://samtools.github.io/hts-specs/SAMv1.pdf</a></p><p dir="ltr">The fasta (.fa) format is described here: <a href="https://www.ncbi.nlm.nih.gov/genbank/fastaformat/" rel="noreferrer" target="_blank">https://www.ncbi.nlm.nih.gov/genbank/fastaformat/</a></p><h3>Contact</h3><p dir="ltr">For questions about this dataset, please contact:<br>jacob.hoglund@ebc.uu.se<br>niclas.backstrom@ebc.uu.se</p>

Funding

Molecular mechanisms and evolutionary forces underlying recombination frequency in butterflies

Swedish Research Council

Find out more...

NBIS/SciLifeLab long-term bioinformatics support (WABI)

Swedish Rescue Program for P. mnemosyne through the local administrative board (Länsstyrelsen) of Blekinge

History

Publisher

Uppsala University

SciLifeLab acknowledgement

  • National Genomics Infrastructure unit
  • Bioinformatics platform (NBIS)
  • SciLifeLab Data Centre