Information and Software Technology, Vol. 164, 2023.
Kalouptsoglou I, Siavvas M, Ampatzoglou A, Kehagias D, Chatzigeorgiou A.
Context: Software security is considered a major aspect of software quality as the number of discovered vulnerabilities in software products is growing. Vulnerability prediction is a mechanism that helps engineers to prioritize their inspection efforts focusing on vulnerable parts. Despite the recent advancements, current literature lacks a systematic mapping study on vulnerability prediction.
Objective: This paper aims to analyze the state-of-the-art of vulnerability prediction focusing on: (a) the goals of vulnerability prediction-related studies; (b) the data collection processes and the types of datasets that exist in the literature; (c) the mostly examined techniques for the construction of the prediction models and their input features; and (d) the utilized evaluation techniques.
Method: We collected 180 primary studies following a broad search methodology across four popular digital libraries. We mapped these studies to the variables of interest and we identified trends and relationships between the studies.
Results: The main findings suggest that: (i) there are two major study types, prediction of vulnerable software components and forecasting of the evolution of vulnerabilities in software; (ii) most studies construct their own vulnerability-related dataset retrieving information from vulnerability databases for real-world software; (iii) there is a growing interest for deep learning models along with a trend on textual source code representation; and (iv) F1-score was found to be the most widely used evaluation metric.
Conclusions: The results of our study indicate that there are several open challenges in the domain of vulnerability prediction. One of the major conclusions, is the fact that most studies focus on within-project prediction, neglecting the real-world scenario of cross-project prediction.