Preclinical Development – Biomolecular
2019 PharmSci 360
For scalable, effective and trustworthy data-driven science, we need to ensure that data is Findable, Accessible, Interoperable and Reusable (FAIR) by humans as well as by machines. Since their publication in 2016, the FAIR Principles of data management and stewardship have become pervasive in discussions, policies and implementations in and around technological and social infrastructure for research data (https://doi.org/10.1038/sdata.2016.18).
I outline a principled approach to data FAIRification rooted in the notions of experimental design, and whose main intent is to clarify the semantics of data matrices. Using two related metabolomics datasets associated to journal articles, we perform retrospective data and metadata curation and re-annotation, using community, open interoperability standards. The results are semantically-anchored data matrices, deposited in public archives, which are readable by software agents for data-level queries, and can support the reproducibility and reuse of the data underpinning the publications. The work is in press at Springer Nature's Scientific Data journal, and a pre-print is available at https://doi.org/10.5281/zenodo.3274257.