Supplemental data from the genome assembly and annotation of the Clouded Apollo Butterfly (Parnassius mnemosyne)
This dataset contains supplementary data from the genome sequencing of the Clouded Apollo Butterfly (Parnassius mnemosyne), published in:
Höglund, J., Dias, G., Olsen, R. A., Soares, A., Bunikis, I., Talla, V., & Backström, N. (2024). A Chromosome-Level Genome Assembly and Annotation for the Clouded Apollo Butterfly (Parnassius mnemosyne): A Species of Global Conservation Concern. Genome Biology and Evolution, 16(2), evae031. https://doi.org/10.1093/gbe/evae031
Previous data from the project has been deposited at the European Nucleotide Archive (ENA) in the umbrella project PRJEB76269.
The data contained in this archive at SciLifeLab Data Repository describe the genome assembly (ENA accession: GCA_963668995.1), and the mitochondrial genome assembly (ENA accession: OZ075093.1).
Below follows a brief description of each file. The information on the methods used to generate the files was adapted from Höglund et al. 2024.
- pmne_functional_edit1.gff.gz
contains the functional annotation (protein coding genes) of the primary genome assembly (GCA_963668995.1). This is the original file that was submitted to ENA. A derived version of the file is available from NCBI; the NCBI version was generated from the EMBL records of each annotated gene and differs in that it for instance use a different naming scheme for the seqid column and the locus tags. The NCBI version is available at this link.
The genes were predicted using BRAKER (v3.03), GALBA (v1.0.6), and GeneMarkS-T (v5.1). The resulting gene models were combined and filtered using TSEBRA (version: long_reads branch commit 1f2614). The combined gene model was functionally annotated by the NBIS nextflow pipeline v2.0.0 (https://github.com/NBISweden).
- pmne_Illumina_RNAseq_StringTie_sorted-transcripts_match.gff.gz
contains a transcript assembly of the Illumina RNAseq reads (ENA accession: ERX11559451). The reads were aligned to the genome with HiSat2 (v2.1.0) and then assembled with StringTie (v2.2.1).
- pmne_mtdna.gff.gz
contains the functional annotation of the mitochondrial genome assembly (ENA accession: OZ075093.1). This is the original file that was submitted to ENA. The annotation was generated using MitoFinder (v1.4.1).
- pmne_ncRNAs.gff.gz
contains the annotation of putative non-coding RNA (ncRNA) genes. The prediction was done with Infernal (v1.1.4) and the Rfam (v14.1) covariance models.
- pmne_tRNAs_and_pseudogenes.gff.gz
contains the annotation of putative tRNA genes and pseudogenes. The prediction was done with tRNAscan-SE (v2.0.12).
- pmne_PacBio_isoseq.sorted.bam
contains the PacBio IsoSeq transcripts (ENA accession: ERX11559436) aligned to the primary genome assembly.
- pmne_repeat_library.fa.gz
contains the nucleotide sequences of the prediced repeats in fasta format. The prediction was done with RepeatModeler2 (v2.0.2a).
Available variables
For a description of the column headers of the files, please see the following links to the documentation of the different file formats.
The GFF3 format (.gff) is described here: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
The BAM format (.bam) is a compressed version of the SAM format, both of which are described here: https://samtools.github.io/hts-specs/SAMv1.pdf
The fasta (.fa) format is described here: https://www.ncbi.nlm.nih.gov/genbank/fastaformat/
Contact
For questions about this dataset, please contact:
jacob.hoglund@ebc.uu.se
niclas.backstrom@ebc.uu.se
Funding
Molecular mechanisms and evolutionary forces underlying recombination frequency in butterflies
Swedish Research Council
Find out more...NBIS/SciLifeLab long-term bioinformatics support (WABI)
Swedish Rescue Program for P. mnemosyne through the local administrative board (Länsstyrelsen) of Blekinge
History
Publisher
Uppsala UniversityContact email
niclas.backstrom@ebc.uu.seSciLifeLab acknowledgement
- National Genomics Infrastructure unit
- Bioinformatics platform (NBIS)
- SciLifeLab Data Centre