SciLifeLab
Browse

Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis

dataset
posted on 2025-03-21, 13:19 authored by Jonas Ravn, Amanda Sörensen Ristinmaa, Scott Mazurkewich, Guilherme Borges DiasGuilherme Borges Dias, Johan Larsbrink, Cecilia Geijer

This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis.

The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.

The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.

File description

  • bmokoenaii_annotation.gff
    This file contains the gene models predicted for B. mokoenaii (GCA_003705765.3).
  • billinoisensis_annotation.gff
    This file contains the gene models predicted for B. illinoisensis (GCA_003705765.3).
  • bmalaysiensis_annotation.gff
    This file contains the gene models predicted for B. malaysiensis (GCA_030558815.1).

Gene annotation methods

Repeat Masking

Prior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.


$ RepeatModeler -database ${DB} -engine ncbi -pa 16
$ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta

Structural Annotation


Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11).


$ braker.pl --genome="$genome" \

--prot_seq=${protein} --workingdir=${PWD} \
--gff3 --threads=16 --verbosity=3 \
--nocleanup --species=${i}

Functional Annotation


The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation nextflow pipeline v2.0.0 (https://github.com/NBISweden/pipelines-nextflow). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.

tRNAs and rRNAs


Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.

$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta
$ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff

Annotation integration

Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.

$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff

Funding

Tailormade glucuronoxylan-yeast cell factory by CRISPR engineering for precision fermentation of xylan waste streams into valuable bio-products

Novo Nordisk Foundation

Find out more...

History

Publisher

Chalmers University of Technology

SciLifeLab acknowledgement

  • Bioinformatics platform (NBIS)
  • National Genomics Infrastructure unit
  • SciLifeLab Data Centre

Usage metrics

    Science for Life Laboratory

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC