### General information Author: Sarahi L. Garcia Contact e-mail: sarahi.garcia@su.se DOI: 10.17044/scilifelab.19923161 License: CC BY-NC-ND 4.0 This readme file was last updated: 2022-06-01 Please cite as: Rodríguez-Gijón et al. 2022. Genome size variation between pelagic and benthic communities across prokaryotic taxonomy and environmental gradients in the Baltic Sea. bioRxiv. 10.1101/2022.10.20.512849 ### Dataset description This database is the supplemetal material for the project "Genome size variation between pelagic and benthic communities across prokaryotic taxonomy and environmental gradients in the Baltic Sea" Figure S1. Overview of the average genome size (AGS) of metagenomes across gradients of salinity and depth. Supplementary Material 1. Statistical analysis of the effect of abiotic variables and their interactions on the average genome size (AGS) of full Baltic microbial communities. Tables show the results of ANOVA type II analysis for both sediments and water column. Supplementary Material 2. Overview of the presence of module steps for several metabolic categories. We also provide a list of all functional categories used with the different module categories included, and the number of module steps used in the analysis. Supplementary Material 3. Table with all Actinobacteriota genomes used to calculate marker genes to use in CheckM quality assessment. We include information on GenBank Accession, lake, tribe, assembly size (Mbp), GC content (%) and reference. Two figures are included. First a boxplot showing the differences on completeness of the of the 8 phyla with most bins in the StratfreshDB database. We have included the completeness estimations using both default CheckM parameters and specific set of marker genes in both phyla Patescibacteria and Actinobacteriota. Stars indicate significant differences p < 0.05 (Wilcoxon non-parametric test). We also provide a figure showing the markers used by CheckM with default parameters for the 8 analyzed phyla. Supplementary Table 1. Table with information of all metagenomes used in this research project, including: sample run in NCBI/ENA, estimated average genome size (Mbp), depth (m), salinity (PSU), temperature (C) and oxygen concentration (mg/L), sample material processing (m), environment, latitude, longitude, study accession, online database, sample accession and reference. Supplementary Table 2. Table with information of all metagenome assembled genomes (MAGs) used in this research project, including: bin id, mOTU, completeness (%), contamination (%), GC (%), coding density (%), bin length, estimated genome size (Mbp), environment, domain, phylum, class, order, family, genus and species. Supplementary Table 3. Table with information of all pelagic metagenome assembled genomes (MAGs) used to compare genome size across major pelagic environments (marine, freshwater and brackish). All brackish MAGs belong (41), while the genomic information for marine and freshwater MAGs was retrieved from a previous study (2) that compiled them from (11,42). We include bin id, completeness (%), contamination (%), bin length, estimated genome size (Mbp), ecosystem type, domain, phylum, class, order, family, genus and species.