Machine learning problems are often subjective or ambiguous. That is, humans solving the same problems might come to legitimate but completely different conclusions, based on their personal experiences and beliefs. In supervised learning, particularly when using crowdsourced training data, multiple annotations per data item are usually reduced to a single label representing ground truth. This hides a rich source of diversity and subjectivity of opinions about the labels. Label distribution learning associates for each data item a probability distribution over the labels for that item, thus can preserve the diversity that conventional learning hides or ignores. We introduce a strategy for learning label distributions with only five-to-ten labels per item by aggregating human-annotated labels over multiple, semantically related data items. Our results suggest that specific label aggregation methods can help provide reliable representative semantics at the population level.
CITATION STYLE
Liu, T., Bongale, P. S., Venkatachalam, A., & Homan, C. M. (2019). Learning to predict population-level label distributions. In The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019 (pp. 1111–1120). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308560.3317082
Mendeley helps you to discover research relevant for your work.