SciLifeLab
Browse

Gene annotation of <i>Blastobotrys mokoenaii</i>, <i>Blastobotrys illinoisensis</i>, and <i>Blastobotrys malaysiensis</i>

dataset
posted on 2025-03-21, 13:19 authored by Jonas Ravn, Amanda Sörensen Ristinmaa, Scott Mazurkewich, Guilherme Borges DiasGuilherme Borges Dias, Johan Larsbrink, Cecilia Geijer
<p dir="ltr">This dataset contains the gene annotation data for three species of <i>Blastobotrys</i> yeats: <i>B. mokoenaii</i>, <i>B. illinoisensis</i>, and <i>B. malaysiensis</i>.</p><p dir="ltr">The genome assemblies for <i>B.</i><i> mokoenaii </i>(NRRL Y-27120)<i> </i>and <i>B. malaysiensis </i>(NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.</p><p dir="ltr">The genome assembly for <i>B. illinoisensis</i> (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.</p><h3><b>File description</b></h3><ul><li>bmokoenaii_annotation.gff<br>This file contains the gene models predicted for <i>B. mokoenaii </i>(GCA_003705765.3).</li><li>billinoisensis_annotation.gff<br>This file contains the gene models predicted for <i>B. illinoisensis </i>(GCA_003705765.3).</li><li>bmalaysiensis_annotation.gff<br>This file contains the gene models predicted for <i>B. malaysiensis </i>(GCA_030558815.1).</li></ul><h3><b>Gene annotation methods</b></h3><h3>Repeat Masking</h3><p dir="ltr">Prior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.</p><p dir="ltr"><br>$ RepeatModeler -database ${DB} -engine ncbi -pa 16<br>$ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta</p><h3>Structural Annotation</h3><p dir="ltr"><br>Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at <a href="https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11" rel="noreferrer" target="_blank">https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11</a>).</p><p dir="ltr"><br>$ braker.pl --genome="$genome" \</p><p dir="ltr">--prot_seq=${protein} --workingdir=${PWD} \<br>--gff3 --threads=16 --verbosity=3 \<br>--nocleanup --species=${i}<br></p><h3>Functional Annotation</h3><p><br></p><p dir="ltr">The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) <code>functional_annotation</code> nextflow pipeline<code> </code>v2.0.0 (<a href="https://github.com/NBISweden/pipelines-nextflow" rel="noreferrer" target="_blank">https://github.com/NBISweden/pipelines-nextflow</a>). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.</p><h3>tRNAs and rRNAs</h3><p><br></p><p dir="ltr">Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap<code> </code>v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.</p><p dir="ltr">$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta<br>$ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff</p><h3>Annotation integration</h3><p dir="ltr">Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.</p><p dir="ltr">$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff</p>

Funding

Tailormade glucuronoxylan-yeast cell factory by CRISPR engineering for precision fermentation of xylan waste streams into valuable bio-products

Novo Nordisk Foundation

Find out more...

History

Publisher

Chalmers University of Technology

SciLifeLab acknowledgement

  • Bioinformatics platform (NBIS)
  • National Genomics Infrastructure unit
  • SciLifeLab Data Centre

Usage metrics

    Science for Life Laboratory

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC