C. Zhou, M. Prabhushankar, and G. AlRegib
Crowdsourcing annotations from practitioners is common for labeling large datasets. However, label disagreement arises due to multiple annotation on the same samples. In specialized applications such as subsurface fault interpretation, the disagreement between domain experts and practitioners is prominent. Neural network generalizability varies when using training labels from different levels of expertise. Thus, it is important to characterize such labeling discrepancies. In this work, we use Amazon Mechanical Turk to elicit various labeling patterns between experts and practitioners, and propose a generative modeling approach to characterize such expert-practitioner label discrepancies.