Amplicon sequence variants from the Insect Biome Atlas project
General information
The Insect Biome Atlas project was supported by the Knut and Alice Wallenberg Foundation (dnr 2017.0088). The project analyzed the insect faunas of Sweden and Madagascar, and their associated microbiomes, mainly using DNA metabarcoding of Malaise trap samples collected in 2019 (Sweden) or 2019–2020 (Madagascar).
Please cite this version of the dataset as: Miraldo A, Iwaszkiewicz-Eggebrecht E, Sundh J, Lokeshwaran M, Granqvist E, Andersson AF, Lukasik P, Roslin T, Tack A, Ronquist F. 2024. Dataset of amplicon sequence variants (ASVs) from the Insect Biome Atlas Project, version 5. https://doi.org/10.17044/scilifelab.25480681
Dataset description
This dataset contains amplicon sequence variants (ASVs) generated from high-throughput sequencing of the cytochrome c oxidase subunit I (COI) gene from Malaise trap samples (lysates, homogenates and preservative ethanol) and soil and litter samples. It includes ASV sequences and abundance information (number of reads) as well as metadata files that are needed to interpret and analyse the data further. Future versions of the dataset will include additional data. NB! All ASV files include ASVs that represent biological and synthetic spike-ins.
Methods
Samples were sequenced using Illumina technology. Raw data are available at the European Nucleotide Archive (ENA) under project PRJEB61109. The raw sequence data was preprocessed using a Snakemake workflow. Preprocessed reads were then used as input to the AmpliSeq Nextflow (v.2.1.0) pipeline to generate ASVs.
Available data
Two types of files are provided: ASV files and metadata files. Files marked with 'SE' and 'MG' contain data from Sweden and Madagascar, respectively.
The file shasum.txt contains checksums for each of the files.
ASV files
ASV sequences in fasta format are found in files CO1_asv_seqs_SE.fasta.gz and CO1_asv_seqs_MG.fasta.gz. Counts of ASVs in each sample are in CO1_asv_counts_SE.tsv.gz and CO1_asv_counts.MG.tsv.gz. The Swedish dataset contains 821,559 ASVs in 6,169 samples. The Madagascar dataset contains 701,769 ASVs in 2,286 samples.
Metadata files
Four types of metadata files are included:
- sequencing_metadata files with information about samples that were processed in the lab and sequenced
- samples_metadata files with information about samples that were collected in the field.
- sites_metadata files with information about sites where samples were collected.
- sipke-ins metadata files with information about spike-ins added to each malaise trap sample at the time of sample processing in the lab.
Sequencing metadata files
The two sequencing metadata files CO1_sequencing_metadata_SE.tsv and CO1_sequencing_metadata_MG.tsv contain information about samples that were sequenced. For details on the columns of these files, see the README.txt file.
Samples metadata files
Four samples_metadata files are included in this dataset with information about each sample that was collected in the field. For samples collected with malaise traps we have two files, one for each country: samples_metadata_malaise_SE.tsv and samples_metadata_malaise_MG.tsv. See the README.txt file for details about the columns of these files.
For arthropod samples collected from litter and soil we have two files, one for each country: samples_metadata_soil_litter_SE.tsv and samples_metadata_litter_MG.tsv. Note that for Madagascar we did not collect arthropod samples from soil. Also note that for Madagascar we collected four leaf litter samples at each trap location, one sample in each direction of the Malaise trap (front, back, left and right); whilst for Sweden we collected only one sample at each trap location. For details on the columns of these files, see the README.txt file.
Sites metadata files
There are two files that contain information about sampling sites, one for each country: sites_metadata_SE.tsv and sites_metadata_MG.tsv. See the README.txt file for more information.
Spike-ins metadata files
We provide three files with information about spike-ins used when processing samples in the lab: biological_spikes_taxonomy_SE.tsv and biological_spikes_taxonomy_MG.tsv contain taxonomic information on biological spike ins while the file synthetic_spikes_info.tsv has information on synthetic spike ins. See README.txt for more information.
Other complementary data files
We present complementary data on soil chemistry collected at each sampling location in both Sweden and Madagascar, stand characteristics collected at each sampling location in Madagascar and biomass/count data for a selected number of malaise trap samples from the Insect Biome Atlas project (n=24) and the Swedish Insect Inventory Project (n=224).
Soil chemistry data
We provide two datasets, one for each country, on soil chemistry (soil_chemistry_SE.tsv and soil_chemistry_MG.tsv) that store information on soil nutrients from soil samples collected at the same sampling sites as the arthropod communities. Topsoil (0-20cm) was sampled at 5 sites around each Malaise trap in both Sweden and Madagascar: one soil core (6 cm diameter) at the center of trap and one soil core on each of the four “sides” of the trap five meters away from the trap. Soil samples at each site were taken as composite samples from the five locations. Soil samples collected in Sweden were analysed at Eurofins in Sweden and the ones collected in Madagascar were analysed at the Laboratoire des Radioisotopes in Madagascar. As samples from each country were analysed at different laboratories the variables on soil nutrients presented in each dataset differ slightly. See the README.txt file for more information on the columns of each of these files.
Stand characteristics data
Standing characteristics were only measured in Madagascar as extensive data on landscape composition and vegetation structure at the sampling sites in Sweden had already been compiled as part of the National Inventory of Landscapes in Sweden (NILS) and data are publicly available here.
The file stand_characteristics_MG.tsv contains information on a set of standing characteristics from Madagascar related to tree density (DBH, shading, etc). Information about columns in this file is found in the README.txt file.
Biomass and count data
To allow an assessment of how the biomass of a Malaise trap sample translates to the number of specimens, we provide two files describing samples from Sweden, for which we measured the biomass and also counted all the specimens in the sample. The first set comprises 24 samples from the IBA field campaign (biomass_count_IBA.tsv), and the second set comprises 224 samples from a separate Swedish Malaise trapping campaign (Swedish Insect Inventory Project) in 2018–2019 (biomass_count_SIIP.tsv). For the latter dataset, we provide the site and sample metadata in the same file. Details about columns in these files are found in the README.txt file.
References:
Egnér, H., Riehm, H., & Domingo, W. (1960). Untersuchungen über die chemische Bodenanalyse als Grundlage für die Beurteilung des Nährstoffzustandes der Böden. II. Chemische Extraktionsmethoden zur Phosphor-und Kaliumbestimmung. Kungliga Lantbrukshögskolans Annaler, 26, 199–215.