Abstract
In cloud service systems, customers will report the service issues they have encountered to cloud service providers. Despite many issues can be handled by the support team, sometimes the customer issues can not be easily solved, thus raising customer incidents. Quick troubleshooting of a customer incident is critical. To this end, a customer incident should be assigned to its responsible team accurately in a timely manner. Our industrial experiences show that linking customer incidents with detected system incidents can help the customer incident triage. In particular, our empirical study on 7 real cloud service systems shows that with the additional information about the system incidents (i.e., incident reports generated by system monitors), the triage time of customer incidents can be accelerated 13.1× on average. Based on this observation, in this paper, we propose LinkCM, a learning based approach to automatically link customer incidents to monitor reported system incidents. LinkCM incorporates a novel learning-based model that effectively extracts related information from two resources, and a transfer learning strategy is proposed to help LinkCM achieve better performance without huge amount of data. The experimental results indicate that LinkCM is able to achieve accurate link prediction. Furthermore, case studies are presented to demonstrate how LinkCM can help the customer incident triage procedure in real production cloud service systems.
Author supplied keywords
Cite
CITATION STYLE
Gu, J., Wen, J., Wang, Z., Zhao, P., Luo, C., Kang, Y., … Zhang, D. (2020). Efficient customer incident triage via linking with system incidents. In ESEC/FSE 2020 - Proceedings of the 28th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 1296–1307). Association for Computing Machinery, Inc. https://doi.org/10.1145/3368089.3417061
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.