Supplemental material for Baltic Sea Genome Size project
In this repository you can find the supplemental figure, material and tables for the project "Genome size variation between pelagic and benthic communities across prokaryotic taxonomy and environmental gradients in the Baltic Sea"
Microbial genome size can be used as a predictor to explain the ecology and metabolism of Bacteria and Archaea across major biomes. Despite their ecological significance, the contribution of microbial genome size to differences in metabolic potential of benthic and pelagic prokaryotes are poorly studied. Here, we investigated how taxonomy and microbial genome size varies between benthic and pelagic habitats across environmental gradients of the brackish Baltic Sea. We also explored the relationships between that variation, the environmental heterogeneity, and microbial functions in these habitats. By analyzing Baltic metagenomes and MAGs, we observe that pelagic brackish Bacteria and Archaea present smaller genome sizes on average than pelagic marine and freshwater prokaryotes. Moreover, we found that prokaryotic genome sizes in Baltic sediments (3.47 Mbp) are significantly bigger than in the water column (2.96 Mbp). These differences in genome size persisted in Bacteria from the phyla level to the order level. For pelagic prokaryotes, the smallest genomes coded for a higher number of module steps per Mbp than bigger genomes for most of the functions, such as amino acid metabolism and central carbohydrate metabolism. However, we observed that nitrogen metabolism was almost absent in pelagic genomes and was mostly present in benthic genomes. Finally, we also show that Bacteria inhabiting Baltic sediments and water column not only differ in taxonomy, but also in their metabolic potential, such as the Wood-Ljungdahl pathway or presence of different hydrogenases.
Figure S1. Overview of the average genome size (AGS) of metagenomes across gradients of salinity and depth.
Supplementary Material 1. Statistical analysis of the effect of abiotic variables and their interactions on the average genome size (AGS) of full Baltic microbial communities. Tables show the results of ANOVA type II analysis for both sediments and water column.
Supplementary Material 2. Overview of the presence of module steps for several metabolic categories. We also provide a list of all functional categories used with the different module categories included, and the number of module steps used in the analysis.
Supplementary Material 3. Table with all Actinobacteriota genomes used to calculate marker genes to use in CheckM quality assessment. We include information on GenBank Accession, lake, tribe, assembly size (Mbp), GC content (%) and reference. Two figures are included. First a boxplot showing the differences on completeness of the of the 8 phyla with most bins in the StratfreshDB database. We have included the completeness estimations using both default CheckM parameters and specific set of marker genes in both phyla Patescibacteria and Actinobacteriota. Stars indicate significant differences p < 0.05 (Wilcoxon non-parametric test). We also provide a figure showing the markers used by CheckM with default parameters for the 8 analyzed phyla.
Supplementary Table 1. Table with information of all metagenomes used in this research project, including: sample run in NCBI/ENA, estimated average genome size (Mbp), depth (m), salinity (PSU), temperature (C) and oxygen concentration (mg/L), sample material processing (m), environment, latitude, longitude, study accession, online database, sample accession and reference.
Supplementary Table 2. Table with information of all metagenome assembled genomes (MAGs) used in this research project, including: bin id, mOTU, completeness (%), contamination (%), GC (%), coding density (%), bin length, estimated genome size (Mbp), environment, domain, phylum, class, order, family, genus and species.
Supplementary Table 3. Table with information of all pelagic metagenome assembled genomes (MAGs) used to compare genome size across major pelagic environments (marine, freshwater and brackish). All brackish MAGs belong (41), while the genomic information for marine and freshwater MAGs was retrieved from a previous study (2) that compiled them from (11,42). We include bin id, completeness (%), contamination (%), bin length, estimated genome size (Mbp), ecosystem type, domain, phylum, class, order, family, genus and species.