SMHI IFCB Plankton Image Reference Library
This repository includes four datasets of manually annotated plankton images by phytoplankton experts at the Swedish Meteorological and Hydrological Institute (SMHI). These images can be used for training automatic image classifiers to identify various plankton species. The images were captured using an Imaging FlowCytobot (IFCB, McLane Research Laboratories) from different locations and seasons in the Skagerrak, Kattegat, and Baltic Proper. The specifics of the three datasets are as follows:
- smhi_ifcb_svea_baltic_proper: Images were gathered during monthly monitoring cruises from 2022 to 2024, utilizing an IFCB mounted as part of the underway FerryBox system on the R/V Svea. This collection consists of 27,118 annotated images across 61 different classes.
- smhi_ifcb_svea_skagerrak_kattegat: Images were also collected during the regular monitoring cruises from 2022 to 2024. This archive comprises of 5,086 annotated images from 83 distinct classes.
- smhi_ifcb_tångesund: In 2016, the IFCB was deployed in situ at depths between 3 and 18 meters, near a mussel farm in Tångesund, Mollösund (Skagerrak). This dataset contains 43,828 annotated images from 39 different classes.
- smhi_ifcb_iRfcb: This subset of the smhi_ifcb_svea_skagerrak_kattegat dataset can be used for user and unit tests for the iRfcb R package.
Datasets 1-3 comprises two zip archives: one (annotated_images) containing .png images organized into subfolders for each class, and another (matlab_files) including raw data files (.roi, .hdr, .adc) and .mat-files for developing a random forest image classifier using the MATLAB code from the ifcb-analysis repository. Dataset 4 only comprise of a MATLAB data package.
The images in this dataset undergo continuous quality control, and new images are regularly added. Consequently, this dataset will be updated on a regular basis. If you find any mislabeled images, please contact the authors.
Version history
- Version 4 (2024-11-04): 76,032 annotated images. Corrected class names to better match WoRMS, and continued quality control of images in the Tångesund dataset.
- Version 3 (2024-08-05): 72,086 annotated images. Added iRfcb dataset for user and unit testing.
- Version 2 (2024-06-03): 71,525 annotated images. Updated class names and corrected manual files in the Tångesund dataset. Continued quality control of images in the Tångesund dataset.
- Version 1 (2024-05-31): 65,435 annotated images