MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment

Authors

  • Amrapali Zaveri Institute of Data Science, Maastricht University, Netherlands
  • Wei Hu Nanjing University
  • Michel Dumontier Institute of Data Science, Maastricht University, Netherlands

DOI:

https://doi.org/10.15346/hc.v6i1.6

Keywords:

crowdsourcing, biomedical, metadata, data quality

Abstract

To reuse the enormous amounts of biomedical data available on the Web, there is an urgent need for good quality metadata. This is extremely important to ensure that data is maximally Findable, Accessible, Interoperable and Reusable. The Gene Expression Omnibus (GEO) allow users to specify metadata in the form of textual key: value pairs (e.g. sex: female). However, since there is no structured vocabulary or format available, the 44,000,000+ key: value pairs suffer from numerous quality issues. Using domain experts for the curation is not only time consuming but also unscalable. Thus, in our approach, MetaCrowd, we apply crowdsourcing as a means for GEO metadata quality assessment. Our results show crowdsourcing is a reliable and feasible way to identify similar as well as erroneous metadata in GEO. This is extremely useful for data consumers and producers for curating and providing good quality metadata.

Downloads

Published

2019-09-04

How to Cite

Zaveri, A., Hu, W., & Dumontier, M. (2019). MetaCrowd: Crowdsourcing Biomedical Metadata Quality Assessment. Human Computation, 6(1), 98-112. https://doi.org/10.15346/hc.v6i1.6

Issue

Section

Research