A FISH IMAGE DATASET WITH BUILT-IN IMAGE QUALITY MANAGEMENT SYSTEM
A new science discipline has emerged within the last decade at the intersection of informatics, computer science and biology – Imageomics. Like most other -omics fields, Imageomics also uses most recent technologies to analyze biological data but from the images. One of the most applied data analysis methods for image datasets is Machine Learning. In 2019, we started working on an NSF funded project, known as Biology Guided Neural Networks with the purpose of extracting information about biology by using neural networks and biological guidance. We have built a dataset composed of digitized fish specimens either directly from collections or from data repositories. With another NSF-Institute project, Imageomics, as continuum of the Biology Guided Neural Networks, we have added new functionalities to the dataset management system including image quality metadata, extended image metadata and batch metadata. Additional flexible database infrastructure with RDF framework allowed system to host different taxonomic groups with a variety of new metadata features. By the combination of these features, along with FAIR principles and reproducibility, we provide Artificial Intelligence Readiness (AIR) feature to the dataset. Currently, the dataset serves as the largest and the most detailed AI-Ready fish image dataset with integrated Image Quality Management System.