Genomic annotations and comparative analysis
Scripts and additional data necessary for the analyses performed in the paper "Genome Evolution of a Symbiont Population for Pathogen Defence in Honeybees". Information about specific analyses can be found in the corresponding README file, when applicable.
Directories are named after the analysis in the paper:
fig02_phylogeny
Contains a pipeline written in Snakemake that takes a set of genome annotations in the GenBank format, groups all protein sequences with OrthoMCL, filters out genes predicted to be recombinant by Phipack, creates a trimmed concatenation of the remaining single-copy panorthologs, and reconstructs a phylogeny using IQ-TREE. The concatenated protein sequences used in the paper are included (singlecopypanorthologs.fasta).
fig03_pangenome
Contains a script for analysing the species pangenome, using the OrthoMCL clustering created in fig02_phylogeny.
fig04_Sfig03_transposons
Contains a script for plotting the location of transposons within A. kunkeei genomes.
fig05_ExEs
Contains scripts showing the workflow that was used to select plasmid assemblies in the study.
fig06_LPxTG
Contains a script for plotting the presence/absence-patterns of genes containing cell surface-binding LPxTG motifs in A. kunkeei strains.
fig07_ExEs_growth
Contains a script for plotting the presence/absence of extrachromosomal elements in A. kunkeei strains.
phageplasmid_classification
Contains a script showing the workflow used to classify two phage-plasmids present in the A. kunkeei population, using data from Pfeifer et al. (2021).
https://doi.org/10.1093/nar/gkab064
prokka_annotations
Contains a script that runs prokka twice, once with the standard databases and once using a manually curated A. kunkeei annotation, then combines the result.
tableS4_orthogroups
Contains a script that compiles results from the analyses described above for each orthogroup (OrthoMCL results from fig02_phylogeny) and writes it as a table.
tableS5_ANI
Contains a script for calculating average nucleotide identities between A. kunkeei genomes.
Sfig02_growth
Contains a script for plotting growth curves for A. kunkeei strain H3B2-03M.
Sfig04_defence
Contains a script for plotting the location of a genomic defence island in A. kunkeei, which either contains a CRISPR-CAS system of a restriction-modification system.
table_16S
Contains a script for calculating 16S identities between A. kunkeei strains.
data
Contains results from EggNOG, and Phaster, as well as some of the output from the analyses above. Specifically, it contains the annotations from prokka_annotations in GenBank format, the OrthoMCL clustering and the phylogeny in Nexus file format from fig02_phylogeny, which are used as input in other analyses.