With the widely usage of open-source software, supply-chain-based vulnerability attacks, including SolarWind and Log4Shell, have posed significant risks to software security. Currently, people rely on vulnerability advisory databases or commercial software bill of materials (SBOM) to defend against potential risks. Unfortunately, these datasets do not provide finer-grained file-level vulnerability information, compromising their effectiveness. Previous works have not adequately addressed this issue, and mainstream vulnerability detection methods have their drawbacks that hinder resolving this gap. Driven by the real needs, we propose a framework that can trace the vulnerability-relevant file for each disclosed vulnera-bility. Our approach uses NVD descriptions with metadata as the inputs, and employs a series of strategies with a LLM model, search engine, heuristic-based text matching method and a deep learning classifier to recommend the most likely vulnerability-relevant file, effectively enhancing the completeness of existing NVD data. Our experiments confirm that the efficiency of the proposed framework, with CodeBERT achieving 0.92 AUC and 0.85 MAP, and our user study proves our approach can help with vulnerability-relevant file detection effectively. To the best of our knowledge, our work is the first one focusing on tracing vulnerability-relevant files, laying the groundwork of building finer-grained vulnerability-aware software bill of materials.
CITATION STYLE
Sun, J., Chen, J., Xing, Z., Lu, Q., Xu, X., & Zhu, L. (2024). Where is it? Tracing the Vulnerability-Relevant Files from Vulnerability Reports. In Proceedings - International Conference on Software Engineering (pp. 2469–2481). IEEE Computer Society. https://doi.org/10.1145/3597503.3639202
Mendeley helps you to discover research relevant for your work.