Gene annotation of Blastobotrys mokoenaii, Blastobotrys illinoisensis, and Blastobotrys malaysiensis
This dataset contains the gene annotation data for three species of Blastobotrys yeats: B. mokoenaii, B. illinoisensis, and B. malaysiensis.
The genome assemblies for B. mokoenaii (NRRL Y-27120) and B. malaysiensis (NRRL Y-6417) were publicly available on the National Center for Biotechnology Information (NCBI) under accessions GCA_003705765.3 and GCA_030558815.1, respectively.
The genome assembly for B. illinoisensis (NRRL YB-1343) was generated by SciLifeLab's National Genomics Infrastructure (NGI) using PacBio long-read data and deposited in the European Nucleotide Archive (ENA) under accession GCA_965113335.1.
File description
- bmokoenaii_annotation.gff
This file contains the gene models predicted for B. mokoenaii (GCA_003705765.3). - billinoisensis_annotation.gff
This file contains the gene models predicted for B. illinoisensis (GCA_003705765.3). - bmalaysiensis_annotation.gff
This file contains the gene models predicted for B. malaysiensis (GCA_030558815.1).
Gene annotation methods
Repeat Masking
Prior to annotation, a repeat library was built for each species using RepeatModeler2 v2.0.2 and the genomes were soft-masked using RepeatMasker v4.1.5.
$ RepeatModeler -database ${DB} -engine ncbi -pa 16
$ RepeatMasker -dir . -gff -u -no_is -xsmall -e ncbi -lib ${LIBRARY} -pa 16 genome.fasta
Structural Annotation
Structural annotation was performed on the soft-masked genomes using Braker3 v3.0.3 incorporating external evidence in the form of all fungal proteins from OrthoDB v11 (available at https://bioinf.uni-greifswald.de/bioinf/partitioned_odb11).
$ braker.pl --genome="$genome" \
--prot_seq=${protein} --workingdir=${PWD} \
--gff3 --threads=16 --verbosity=3 \
--nocleanup --species=${i}
Functional Annotation
The predicted genes were functionally annotated using the National Bioiformatics Infrastructure Sweden (NBIS) functional_annotation
nextflow pipeline
v2.0.0 (https://github.com/NBISweden/pipelines-nextflow). Briefly, this pipeline performs similarity searches between the annotated proteins and the UniProtKB/Swiss-Prot database (downloaded on 2023-12) using the Basic Local Alignment Search Tool (BLAST). Then it uses InterProScan to query the proteins against InterPro v59-91 databases, and merges results using AGAT v1.2.0.
tRNAs and rRNAs
Transfer RNA (tRNA) and ribosomal RNA (rRNA) genes were annotated using tRNAscan-SE v2.0.12 and barrnap
v0.9, respectively. Other ncRNAs, such as SRP RNA, RNase P RNA, spliceosomal ncRNAs etc. have not been predicted. Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.
$ tRNAscan-SE -E --gff ${output}_trnas.gff --thread 16 ${genome}.fasta
$ barrnap --kingdom euk --threads 6 ${genome}.fasta > ${output}_rrna.gff
Annotation integration
Finnally, the functionally annotated protein-coding genes, tRNAs, and rRNAs were combined into a single GFF file using AGAT v1.2.0.
$ agat_sp_complement_annotations.pl --ref ${protein_coding} --add ${trna} --add ${rrna} --out full_annotation.gff
Funding
Tailormade glucuronoxylan-yeast cell factory by CRISPR engineering for precision fermentation of xylan waste streams into valuable bio-products
Novo Nordisk Foundation
Find out more...History
Publisher
Chalmers University of TechnologyContact email
guilherme.dias@scilifelab.seSciLifeLab acknowledgement
- Bioinformatics platform (NBIS)
- National Genomics Infrastructure unit
- SciLifeLab Data Centre