A transformer-based deep learning model for evaluation of accessibility of image descriptions

Raju Shrestha

Conference ProceedingsOPEN ACCESS

A transformer-based deep learning model for evaluation of accessibility of image descriptions

Shrestha R

ACM International Conference Proceeding Series (2022) 28-33

DOI: 10.1145/3529836.3529856

1Citations

7Readers

Abstract

Images have become an integral part of digital and online media and they are used for creative expression and dissemination of knowledge. To address image accessibility challenges to the visually impaired community, adequate textual image descriptions or captions are provided, which can be read through screen readers. These descriptions could be either human-authored or software-generated. It is found that most of the image descriptions provided tend to be generic, inadequate, and often unreliable making them inaccessible. There are tools, methods, and metrics used to evaluate the quality of the generated text, but almost all of them are word-similarity-based and generic. There are standard guidelines such as NCAM image accessibility guidelines to help write accessible image descriptions. However, web content developers and authors do not seem to use them much, possibly due to the lack of knowledge, undermining the importance of accessibility coupled with complexity and difficulty understanding the guidelines. To our knowledge, none of the quality evaluation techniques take into account accessibility aspects. To address this, a deep learning model based on the transformer, a most recent and most effective architecture used in natural language processing, which measures compliance of the given image description to ten NCAM guidelines, is proposed. The experimental results confirm the effectiveness of the proposed model. This work could contribute to the growing research towards accessible images not only on the web but also on all digital devices.

Author supplied keywords

Cite

CITATION STYLE

APA

Shrestha, R. (2022). A transformer-based deep learning model for evaluation of accessibility of image descriptions. In ACM International Conference Proceeding Series (pp. 28–33). Association for Computing Machinery. https://doi.org/10.1145/3529836.3529856

A transformer-based deep learning model for evaluation of accessibility of image descriptions

Abstract

Author supplied keywords

Cite

Register to see more suggestions