Abstract
Recent advances in automatic code generation have made tools like GitHub Copilot attractive for programmers, as they allow for the creation of code blocks by simply providing descriptive prompts to the AI. While researchers have studied the performance of these AI-based tools in general-purpose programming, their effectiveness in data analysis is understudied. Unlike general-purpose programming which focuses more on algorithm-driven tasks like building novel software, data analysis requires a data-driven approach to actually gain insights. It remains unclear how these tools could be utilized to help data scientists analyze real-world problems. In this paper, we conducted a qualitative user study with 5 participants to understand the use of GitHub Copilot in solving problems by scaffolding prompts at different levels of specificity among data scientists. We discovered that effective prompts require carefully selected terminology, properly arranged word order, and sufficiently established interaction between humans and GitHub Copilot. We also spot some potential flaws in GitHub Copilot that hinder data scientists from efficiently scaffolding prompts. Our work points out some improvement directions for both data scientists and GitHub Copilot in the future.
Author supplied keywords
Cite
CITATION STYLE
Zhou, H., & Li, J. (2023). A Case Study on Scaffolding Exploratory Data Analysis for AI Pair Programmers. In Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery. https://doi.org/10.1145/3544549.3583943
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.