Abstract
Machine learning traditionally emphasizes developing models for given datasets, but real-world data is often messy, making model improvement insufficient for enhancing performance. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, leading to significant practical ML advancements. While experienced data scientists have manually refined datasets through trial-and-error and intuition, DCAI approaches data enhancement as a systematic engineering discipline. DCAI represents a shift from focusing on models to the underlying data used for training and evaluation. Despite the dominance of common model architectures and predictable scaling rules, building and using datasets remain labor-intensive and costly, lacking infrastructure and best practices. The DCAI movement aims to develop efficient, high-productivity open data engineering tools for modern ML systems. This workshop seeks to foster an interdisciplinary DCAI community to address practical data challenges, including data collection, generation, labeling, preprocessing, augmentation, quality evaluation, debt, and governance. By defining and shaping the DCAI movement, this workshop aims to influence the future of AI and ML, inviting interested parties to contribute through paper submissions.
Author supplied keywords
Cite
CITATION STYLE
Fu, Y., Liu, K., & Wang, D. (2024). DCAI: The 4th International Workshop on Data-Centric AI. In International Conference on Information and Knowledge Management, Proceedings (pp. 5584–5587). Association for Computing Machinery. https://doi.org/10.1145/3627673.3680118
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.