Genomic annotations and comparative analysis
Scripts and additional data necessary for the analyses performed in the paper "Genome Evolution of a Symbiont Population for Pathogen Defence in Honeybees". Information about specific analyses can be found in the corresponding README file, when applicable.
Directories are named after the analysis in the paper:
Contains a pipeline written in Snakemake that takes a set of genome annotations in the GenBank format, groups all protein sequences with OrthoMCL, filters out genes predicted to be recombinant by Phipack, creates a trimmed concatenation of the remaining single-copy panorthologs, and reconstructs a phylogeny using IQ-TREE. The concatenated protein sequences used in the paper are included (singlecopypanorthologs.fasta).
Contains a script for analysing the species pangenome, using the OrthoMCL clustering created in fig02_phylogeny.
Contains a script for plotting the location of transposons within A. kunkeei genomes.
Contains scripts showing the workflow that was used to select plasmid assemblies in the study.
Contains a script for plotting the presence/absence-patterns of genes containing cell surface-binding LPxTG motifs in A. kunkeei strains.
Contains a script for plotting the presence/absence of extrachromosomal elements in A. kunkeei strains.
Contains a script showing the workflow used to classify two phage-plasmids present in the A. kunkeei population, using data from Pfeifer et al. (2021).
Contains a script that runs prokka twice, once with the standard databases and once using a manually curated A. kunkeei annotation, then combines the result.
Contains a script that compiles results from the analyses described above for each orthogroup (OrthoMCL results from fig02_phylogeny) and writes it as a table.
Contains a script for calculating average nucleotide identities between A. kunkeei genomes.
Contains a script for plotting growth curves for A. kunkeei strain H3B2-03M.
Contains a script for plotting the location of a genomic defence island in A. kunkeei, which either contains a CRISPR-CAS system of a restriction-modification system.
Contains a script for calculating 16S identities between A. kunkeei strains.
Contains results from EggNOG, and Phaster, as well as some of the output from the analyses above. Specifically, it contains the annotations from prokka_annotations in GenBank format, the OrthoMCL clustering and the phylogeny in Nexus file format from fig02_phylogeny, which are used as input in other analyses.